Skip to content
Snippets Groups Projects
Commit 08d30287 authored by altuna akalin's avatar altuna akalin
Browse files

small change to BSseq segmentation

parent 92057d31
No related branches found
No related tags found
No related merge requests found
...@@ -324,7 +324,7 @@ my.diffMeth3=calculateDiffMeth(sim.methylBase, ...@@ -324,7 +324,7 @@ my.diffMeth3=calculateDiffMeth(sim.methylBase,
The analysis of methylation dynamics is not exclusively restricted to differentially methylated regions across samples. Apart from this there is also an interest in examining the methylation profiles within the same sample. Usually, depressions in methylation profiles pinpoint regulatory regions like gene promoters that co-localize with CG-dense CpG islands. On the other hand, many gene-body regions are extensively methylated and CpG-poor [@Bock2012-oh]. These observations would describe a bimodal model of either hyper- or hypomethylated regions depending on the local density of CpGs [@Lovkvist2016-ky]. However, given the detection of CpG-poor regions with locally reduced levels of methylation (on average 30%) in pluripotent embryonic stem cells and in neuronal progenitors in both mouse and human, a different model also seems reasonable [@Stadler2011-iu]. These low-methylated regions (LMRs) are located distal to promoters, have little overlap with CpG islands, and are associated with enhancer marks such as p300 binding sites and H3K27ac enrichment. The analysis of methylation dynamics is not exclusively restricted to differentially methylated regions across samples. Apart from this there is also an interest in examining the methylation profiles within the same sample. Usually, depressions in methylation profiles pinpoint regulatory regions like gene promoters that co-localize with CG-dense CpG islands. On the other hand, many gene-body regions are extensively methylated and CpG-poor [@Bock2012-oh]. These observations would describe a bimodal model of either hyper- or hypomethylated regions depending on the local density of CpGs [@Lovkvist2016-ky]. However, given the detection of CpG-poor regions with locally reduced levels of methylation (on average 30%) in pluripotent embryonic stem cells and in neuronal progenitors in both mouse and human, a different model also seems reasonable [@Stadler2011-iu]. These low-methylated regions (LMRs) are located distal to promoters, have little overlap with CpG islands, and are associated with enhancer marks such as p300 binding sites and H3K27ac enrichment.
Now we are going to try to segment a portion for the H1 human embryonic stem cell line. MethylKit \index{R Packages!\texttt{methylKit}}uses change-point analysis to segment the methylome. In change-point analysis, the change-points of a genome-wide methylation signal are recorded and the genome is partitioned into regions between consecutive change points. CpGs in each segment are similar to each other more than the following segment. Now we are going to try to segment a portion for the H1 human embryonic stem cell line. MethylKit \index{R Packages!\texttt{methylKit}}uses change-point analysis to segment the methylome. In change-point analysis, the change-points of a genome-wide methylation signal are recorded and the genome is partitioned into regions between consecutive change points. CpGs in each segment are similar to each other more than the following segment.
After segmentation, methylKit function `methSeg()` identifies segments that are further clustered into segment classes using a mixture modeling approach. This clustering is based on only the average methylation level of the segments and allows the detection of distinct methylome features comparable to unmethylated regions (UMRs), lowly methylated regions (LMRs), and fully methylated regions (FMRs) mentioned in Stadler et al. [@Stadler2011-yv]. The code snippet below reads the methylation data from the H1 cell line as a `GRanges` object, and runs the segmentation with potentially up to classes of segments. Mixture modeling determines the optimal number of segments using a statistic called Bayesian information criterion (BIC). The BIC is a statistic based on model likelihood and helps us select the model that fits the data better. We have set the number of segment classes to try using the `G=1:4` argument. The `minSeg` arguments are related to the minimum number of CpGs in the segments. The function `methSeg()` outputs a diagnostic plot for segmentation. This plot is shown in Figure \@ref(fig:segDiag). It shows methylation values and lengths of segments in each segment class, as well as the BIC for different numbers of segments. After segmentation, methylKit function `methSeg()` identifies segments that are further clustered into segment classes using a mixture modeling approach. This clustering is based on only the average methylation level of the segments and allows the detection of distinct methylome features comparable to unmethylated regions (UMRs), lowly methylated regions (LMRs), and fully methylated regions (FMRs) mentioned in Stadler et al. [@Stadler2011-yv]. The code snippet below reads the methylation data from the H1 cell line as a `GRanges` object, and runs the segmentation with potentially up to 4 classes of segments. Mixture modeling determines the optimal number of segments using a statistic called Bayesian information criterion (BIC). The BIC is a statistic based on model likelihood and helps us select the model that fits the data better. We have set the number of segment classes to try using the `G=1:4` argument. The `minSeg` arguments are related to the minimum number of CpGs in the segments. The function `methSeg()` outputs a diagnostic plot for segmentation. This plot is shown in Figure \@ref(fig:segDiag). It shows methylation values and lengths of segments in each segment class, as well as the BIC for different numbers of segments.
```{r segDiag, fig.width=14,fig.height=8,fig.cap="Segmentation characteristics shown in different plots. Top left: Mean methylation values per segment in each segment class. Top middle: Length of each segment as boxplots for each segment class. Top right: Number of segments in each segment class. Bottom left: Distribution of segment methylation values. Bottom right: BIC for different number of segment classes",warning=FALSE,out.width="90%"} ```{r segDiag, fig.width=14,fig.height=8,fig.cap="Segmentation characteristics shown in different plots. Top left: Mean methylation values per segment in each segment class. Top middle: Length of each segment as boxplots for each segment class. Top right: Number of segments in each segment class. Bottom left: Distribution of segment methylation values. Bottom right: BIC for different number of segment classes",warning=FALSE,out.width="90%"}
# read methylation data # read methylation data
......
images/CompGen2019_A3_v2_final.png

1.26 MiB

...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
%\cleardoublepage\newpage %\cleardoublepage\newpage
\thispagestyle{empty} \thispagestyle{empty}
\begin{center} \begin{center}
\includegraphics{images/dedication.pdf} \includegraphics{images/dedicationOld.pdf}
\end{center} \end{center}
\setlength{\abovedisplayskip}{-5pt} \setlength{\abovedisplayskip}{-5pt}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment