Skip to content
Snippets Groups Projects
faroo002's avatar
Farooq, Muhammad authored
7a087d2a
History
Name Last commit Last update
code
data
DEADJOE
README.md
environment.yml
setup.R

Steps to reproduce prior knowledge based genomic prediction analysis for growth related traits in Arabidopsis thaliana

Here we describe the general steps in reproducing the whole analysis. The 'setup.R' file in the home directory is the wrapper script calling other related scripts for all analysis in following steps.

1- Package Installations

There are two ways to install required R packages for this analysis;

  • Import the conda environment with dependency issues resolved already. We have tested it on a Ubuntu machine, whereas, other Linux/Windows OS may get some issues during dependency resolution.

    • conda env create -f environment.yml
  • Install packages individually on R v3.6.1

    if (!require("lme4")) { install.packages("lme4", dependencies = TRUE) library(lme4) }

    if (!require("lmerTest")) { install.packages("lmerTest", dependencies = TRUE) library(lmerTest) }

    if (!require("pbkrtest")) { install.packages("pbkrtest", dependencies = TRUE) library(pbkrtest) }

    if (!require("ggrepel")) { install.packages("ggrepel", dependencies = TRUE) library(ggrepel) }

    if (!require("RColorBrewer")) { install.packages("RColorBrewer", dependencies = TRUE) library(RColorBrewer) }

    if (!require("GO.db")) { install.packages("GO.db", dependencies = TRUE) library(GO.db) }

    if (!require("gplots")) { install.packages("gplots", dependencies = TRUE) library(gplots) }

    if (!require("ggExtra")) { install.packages("ggExtra", dependencies = TRUE) library(ggExtra) }

    if (!require("pheatmap")) { install.packages("pheatmap", dependencies = TRUE) library(pheatmap) }

    if (!require("splines")) { install.packages("splines", dependencies = TRUE) library(splines) }

    if (!require("ggpubr")) { install.packages("ggpubr", dependencies = TRUE) library(ggpubr) }

    if (!require("graphics")) { install.packages("graphics", dependencies = TRUE) library(graphics) }

    if (!require("stringr")) { install.packages("stringr", dependencies = TRUE) library(stringr) }

    if (!require("dplyr")) { install.packages("dplyr", dependencies = TRUE) library(dplyr) }

    if (!require("plyr")) { install.packages("plyr", dependencies = TRUE) library(plyr) }

    if (!require("backports")) { install.packages("backports", dependencies = TRUE) library(backports) }

    if (!require("devtools")) { install.packages("devtools", dependencies = TRUE) library(devtools) }

    if (!is.installed("qgg")) { options(devtools.install.args=" --no-multiargs") devtools::install_github("psoerensen/qgg") library(qgg) }

    if (!require("GO.db")) { install.packages("GO.db", dependencies = TRUE) library(GO.db) }

    if (!require("org.At.tair.db")) { install.packages("org.At.tair.db", dependencies = TRUE) library(org.At.tair.db) }

2- Phenotype Dataset Preparation

For Projected Leaf Area (PLA), we calculated Best Linear Unbiased Estimates (BLUEs) for genotypic means, used as phentoypes in the BLUP models using 'BLUES.R' script. The BLUEs for Phi-PSII are already available from Van Rooijen 2017.

3- Genotype Dataset Preparation

Genotype data of the 360 natural accessions in the core set of the Arabidopsis thaliana HapMap population, representing its global diversity, was obtained using Affymetrix 250k SNP array (Zhang and Borevitz, 2009;Baxter et al., 2010). The raw genotypes were in plink binary format and after MAF and LD filtering, recoded into compound genotypes using and subsequently, converted into allele count matrix (M) using 'make_M_matrix.sh'.

4- Prepare Genomic Relationship Matrix (GRM)

  • A GRM (G) was contructed based on all markers

  • and each GO/COEX groups The M matrix was centered and scaled to calculate Genomic Relationship Matrix (GRM) using Allele Frequency based method, proposed by (VanRaden, 2008). Sets of GO terms and coexpressed clusters of genes were loaded and GRMs were constructed for each group such that for each GO/COEX group; a GRM was constructed based on only of those markers within that set of genes and another was constructed based on rest of all markers.

6- GBLUP vs GFBLUP models

  • To run analysis based on GO terms: use the wrapper script 'run_go_models.R'
  • To run analysis based on GO terms: use the wrapper script 'run_coex_models.R'

7- Comparative analysis

  • To summarize and process the model outcomes: use the script 'aggregate_analysis.R'