Farooq, Muhammad
pub1

Repository



Steps to reproduce prior knowledge based genomic prediction analysis for growth related traits in Arabidopsis thaliana
Here we describe the general steps in reproducing the whole analysis. The 'setup.R' file in the home directory is the wrapper script calling other related scripts for all analysis in following steps.

1- Package Installations
There are two ways to install required R packages for this analysis;


Import the conda environment with dependency issues resolved already. We have tested it on a Ubuntu machine, whereas, other Linux/Windows OS may get some issues during dependency resolution.

conda env create -f environment.yml


Install packages individually on R v3.6.1
if (!require("lme4")) {
install.packages("lme4", dependencies = TRUE)
library(lme4)
}
if (!require("lmerTest")) {
install.packages("lmerTest", dependencies = TRUE)
library(lmerTest)
}
if (!require("pbkrtest")) {
install.packages("pbkrtest", dependencies = TRUE)
library(pbkrtest)
}
if (!require("ggrepel")) {
install.packages("ggrepel", dependencies = TRUE)
library(ggrepel)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(RColorBrewer)
}
if (!require("GO.db")) {
install.packages("GO.db", dependencies = TRUE)
library(GO.db)
}
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("ggExtra")) {
install.packages("ggExtra", dependencies = TRUE)
library(ggExtra)
}
if (!require("pheatmap")) {
install.packages("pheatmap", dependencies = TRUE)
library(pheatmap)
}
if (!require("splines")) {
install.packages("splines", dependencies = TRUE)
library(splines)
}
if (!require("ggpubr")) {
install.packages("ggpubr", dependencies = TRUE)
library(ggpubr)
}
if (!require("graphics")) {
install.packages("graphics", dependencies = TRUE)
library(graphics)
}
if (!require("stringr")) {
install.packages("stringr", dependencies = TRUE)
library(stringr)
}
if (!require("dplyr")) {
install.packages("dplyr", dependencies = TRUE)
library(dplyr)
}
if (!require("plyr")) {
install.packages("plyr", dependencies = TRUE)
library(plyr)
}
if (!require("backports")) {
install.packages("backports", dependencies = TRUE)
library(backports)
}
if (!require("devtools")) {
install.packages("devtools", dependencies = TRUE)
library(devtools)
}
if (!is.installed("qgg"))
{
options(devtools.install.args=" --no-multiargs")
devtools::install_github("psoerensen/qgg")
library(qgg)
}
if (!require("GO.db")) {
install.packages("GO.db", dependencies = TRUE)
library(GO.db)
}
if (!require("org.At.tair.db")) {
install.packages("org.At.tair.db", dependencies = TRUE)
library(org.At.tair.db)
}


2- Phenotype Dataset Preparation
For Projected Leaf Area (PLA), we calculated Best Linear Unbiased Estimates (BLUEs) for genotypic means, used as phentoypes in the BLUP models using 'BLUES.R' script. The BLUEs for Phi-PSII are already available from Van Rooijen 2017.

3- Genotype Dataset Preparation
Genotype data of the 360 natural accessions in the core set of the Arabidopsis thaliana HapMap population, representing its global diversity, was obtained using Affymetrix 250k SNP array (Zhang and Borevitz, 2009;Baxter et al., 2010). The raw genotypes were in plink binary format and after MAF and LD filtering, recoded into compound genotypes using and subsequently, converted into allele count matrix (M) using 'make_M_matrix.sh'.

4- Prepare Genomic Relationship Matrix (GRM)


A GRM (G) was contructed based on all markers


and each GO/COEX groups
The M matrix was centered and scaled to calculate Genomic Relationship Matrix (GRM) using Allele Frequency based method, proposed by (VanRaden, 2008).
Sets of GO terms and coexpressed clusters of genes were loaded and GRMs were constructed for each group such that for each GO/COEX group; a GRM was constructed based on only of those markers within that set of genes and another was constructed based on rest of all markers.


6- GBLUP vs GFBLUP models

To run analysis based on GO terms: use the wrapper script 'run_go_models.R'
To run analysis based on GO terms: use the wrapper script 'run_coex_models.R'


7- Comparative analysis

To summarize and process the model outcomes: use the script 'aggregate_analysis.R'