diff --git a/03-StatsForGenomics.Rmd b/03-StatsForGenomics.Rmd
index 734cafc8949c8b06d300ee33c30efb3fdc83220f..542a4b937beda24197ecd82a3d4d346f5a34755d 100644
--- a/03-StatsForGenomics.Rmd
+++ b/03-StatsForGenomics.Rmd
@@ -302,7 +302,7 @@ with your critical value, quantiles or number of random numbers depending
 on which function you are using in the family.We will list some of those functions below.
 
 - `dbinom` is for binomial distribution \index{binomial disdistribution}. This distribution is usually used
-to model fractional data and binary data. Examples from genomics includes
+to model fractional data and binary data. Examples from genomics include
 methylation data.
 
 - `dpois` is used for Poisson distribution and `dnbinom` is used for
@@ -733,7 +733,7 @@ legend("bottomright",legend=c("q-value","FDR (BH)","Bonferroni"),
 ```
 
 ### moderated t-tests: using information from multiple comparisons
-In genomics, we usually do not do one test but many, as described above. That means we
+In genomics, we usually do not do one test but many, as described above. That means we\index{moderated t-test}
 may be able to use the information from the parameters obtained from all 
 comparisons to influence the individual parameters. For example, if you have many variances
 calculated for thousands of genes across samples, you can force individual 
@@ -743,12 +743,13 @@ estimates and therefore better performance in significance testing which
 depends on variance estimates. How much the values be shrunk towards a common
 value comes in many flavors. These tests in general are called moderated
 t-tests or shrinkage t-tests. One approach popularized by Limma software is
-to use so-called "Empirical Bayesian methods". The main formulation in these
+to use so-called "Empirical Bayes methods"
+\index{empirical Bayes methods}. The main formulation in these
 methods is $\hat{V_g} = aV_0 + bV_g$, where $V_0$ is the background variability  
 and $V_g$ is the individual variability. Then, these methods estimate $a$ and $b$
-in various ways to come up with shrunk version of variability, $\hat{V_g}$. In a Bayesian viewpoint,
-the prior knowledge is used to calculate the variability of an individual gene. In this
-case, $V_0$ would be the prior knowledge we have on variability of 
+in various ways to come up with shrunk version of variability, $\hat{V_g}$. Bayesian inference can make use of prior knowledge to make inference about properties of the data. In a Bayesian viewpoint,
+the prior knowledge, in this case variability of other genes, can be used to calculate the variability of an individual gene. In our
+case, $V_0$ would be the prior knowledge we have on the variability of 
 the genes and we
 use that knowledge to influence our estimate for the individual genes.
 
@@ -827,6 +828,7 @@ __Want to know more ?__
 ```
 
 
+
 ## Relationship between variables: linear models and correlation
 In genomics, we would often need to measure or model the relationship between 
 variables. We might want to know about expression of a particular gene in liver 
@@ -899,7 +901,7 @@ In this case, we will be fitting a plane rather than a line. However, the fittin
 process which we will describe in the later sections will not change. For our 
 gene expression problem. We can introduce one more histone modification, H3K27me3. We will then have a linear model with 2 explanatory variables and the 
 fitted plane will look like the one below in Figure \@ref(fig:histoneLm2chp3). The gene expression values are shown
-as dots below and above the fitted plane. Linear regression or linear models and their extensions which makes use of other distributions are central in computational genomics for statistical tests. We will see more of how regression is used in statistical hypothesis testing for computational genomics in Chapters \@ref(rnaseqanalysis) and  \@ref(bsseq).
+as dots below and above the fitted plane. Linear regression or linear models and their extensions which makes use of other distributions, generalized linear models,\index{generalized linear model} are central in computational genomics for statistical tests. We will see more of how regression is used in statistical hypothesis testing for computational genomics in Chapters \@ref(rnaseqanalysis) and  \@ref(bsseq).
 
 ```{r histoneLm2chp3,echo=FALSE,out.width='65%',warning=FALSE,message=FALSE,fig.cap="Association of Gene expression with H3K4me3 and H27Kme3 histone modifications."}
 set.seed(32)
@@ -1101,7 +1103,7 @@ linear regression is \index{linear regression} multiplication of $P(y_{i})$ for
 
 $$L=P(y_1)P(y_2)P(y_3)..P(y_n)=\prod\limits_{i=1}^n{P_i}$$
 
-This can be simplified to this by some algebra and taking logs (since it is 
+This can be simplified to the following equation by some algebra, assumption of normal distribution and taking logs (since it is 
 easier to add than multiply)
 
 $$ln(L) = -nln(s\sqrt{2\pi}) - \frac{1}{2s^2} \sum\limits_{i=1}^n{(y_i-(\beta_0 + \beta_1x_i))^2} $$
@@ -1116,6 +1118,9 @@ problem
 within the domain of statistics. This particular function has still to be optimized. This can be done with some calculus without the need for an 
 iterative approach. 
 
+The maximum likelihood approach also opens up other possibilities for regression. For the case above we assumed that the points around the mean are distributed by normal distribution. However, there are other cases where this assumption may not hold. For example, the for the count data the mean and variance relationship is not constant, the higher the mean counts the higher the variance. In this cases, the regression framework with maximum likelihood estimation can still be used. We simply change the underlying assumptions about the distribution and calculate the likelihood with a new distribution in mind,
+and maximize the parameters for that likelihood. This gives way to "generalized linear model"\index{generalized linear model} approach where errors for the response variables can have other distributions than normal distribution. We will see examples of these generalized linear models in Chapter \@ref(rnaseqanalysis) and \@ref(bsseq).
+
 
 
 
diff --git a/08-rna-seq-analysis.Rmd b/08-rna-seq-analysis.Rmd
index d9660e383ef5ce089cb2174be2275f996103649c..24210219cdba479f23130f01efaf34bcd29acad4 100644
--- a/08-rna-seq-analysis.Rmd
+++ b/08-rna-seq-analysis.Rmd
@@ -283,8 +283,8 @@ pheatmap(correlationMatrix,
 
 ### Differential expression analysis
 
-Differential expression analysis allows to test tens of thousands of hypotheses (one test for each gene) against the null hypothesis that the activity of the gene stays the same in two different conditions. There are multiple limiting factors that influence the power of detecting genes that have real changes between two biological conditions. Among these are the limited number of biological replicates, non-normality of the distribution of the read counts, and higher uncertainty of measurements for lowly expressed genes than highly expressed genes [@love_moderated_2014]. Tools such as `edgeR` and `DESeq2` address these limitations using sophisticated statistical models in order to maximize the amount of knowledge that can be extracted from such noisy datasets. In essence, these models assume that for each gene the read counts are generated by a negative binomial distribution. This is a popular distribution that is used for modeling count data. This distribution can be specified with a mean parameter, $m$, and a dispersion parameter, $\alpha$.The dispersion parameter $\alpha$ is directly related to the variance as the variance of this distribution is formulated as: $m+\alpha m^{2}$. Therefore, estimating these parameters are crucial for differential expression tests. The methods used in `edgeR` and `DESeq2` uses dispersion estimates from other genes with similar counts to precisely estimate the per-gene dispersion values. With accurate dispersion parameter estimate, one can estimate the variance more precisely which in turn
-improve the result of the differential expression test. Although statistical models are different, the process here is similar to moderated t-test we introduced in Chapter \@ref(stats). There, we calculated gene-wise variability and shrunk each gene-wise variability towards the median variability of all genes.
+Differential expression analysis allows to test tens of thousands of hypotheses (one test for each gene) against the null hypothesis that the activity of the gene stays the same in two different conditions. There are multiple limiting factors that influence the power of detecting genes that have real changes between two biological conditions. Among these are the limited number of biological replicates, non-normality of the distribution of the read counts, and higher uncertainty of measurements for lowly expressed genes than highly expressed genes [@love_moderated_2014]. Tools such as `edgeR` and `DESeq2` address these limitations using sophisticated statistical models in order to maximize the amount of knowledge that can be extracted from such noisy datasets. In essence, these models assume that for each gene the read counts are generated by a negative binomial distribution\index{negative binomial distribution}. This is a popular distribution that is used for modeling count data. This distribution can be specified with a mean parameter, $m$, and a dispersion parameter, $\alpha$.The dispersion parameter $\alpha$ is directly related to the variance as the variance of this distribution is formulated as: $m+\alpha m^{2}$. Therefore, estimating these parameters are crucial for differential expression tests. The methods used in `edgeR` and `DESeq2` uses dispersion estimates from other genes with similar counts to precisely estimate the per-gene dispersion values. With accurate dispersion parameter estimate, one can estimate the variance more precisely which in turn
+improve the result of the differential expression test. Although statistical models are different, the process here is similar to moderated t-test \index{moderated t-test} and qualifies as empirical Bayes method \index{empirical Bayes methods}we introduced in Chapter \@ref(stats). There, we calculated gene-wise variability and shrunk each gene-wise variability towards the median variability of all genes. In the case of RNA-seq the dispersion coefficient $\alpha$ is shrunk towards the value of dispersion from other genes with similar read counts.
 
 Now let us take a closer look at `DESeq2` \index{R Packages!\texttt{DESeq2}}workflow and how it calculates differential expression: 
 
@@ -292,7 +292,7 @@ Now let us take a closer look at `DESeq2` \index{R Packages!\texttt{DESeq2}}work
 2. For each gene, a dispersion estimate is calculated. The dispersion value computed by `DESeq2` is equal to the squared coefficient of variation (variation divided by the mean).  
 3. A line is fit across the dispersion estimates of all genes computed in 2) versus the mean normalized counts of the genes. 
 4. Dispersion values of each gene is shrunken towards the fitted line in 3). 
-5. A Generalized Linear Model is fitted which considers additional confounding variables related to the experimental design such as sequencing batches, treatment, temperature, patient's age, sequencing technology etc. and uses negative binomial distribution for fitting count data.
+5. A Generalized Linear Model\index{generalized linear model} is fitted which considers additional confounding variables related to the experimental design such as sequencing batches, treatment, temperature, patient's age, sequencing technology etc. and uses negative binomial distribution for fitting count data.
 6. For a given contrast (e.g. treatment type: drug-A versus untreated), a test for differential expression is carried out against the null hypothesis that the log fold change of the normalized counts of the gene in the given pair of groups is exactly zero. 
 7. Adjusts p-values for multiple-testing. 
 
diff --git a/11-multiomics-analysis.Rmd b/11-multiomics-analysis.Rmd
index 9afed7e5bb50ef6f28633963058ea4858f937a6c..9834af3a39156d640fdc14a0af86a63a6a93aabb 100644
--- a/11-multiomics-analysis.Rmd
+++ b/11-multiomics-analysis.Rmd
@@ -1,8 +1,4 @@
----
-output:
-  pdf_document: default
-  html_document: default
----
+
 # Multi-omics Analysis {#multiomics}
 
 ```{r setup_mo_seq, include=FALSE}
@@ -14,16 +10,17 @@ knitr::opts_chunk$set(echo      = TRUE,
                       fig.width = 5,
                       fig.align = 'center')
 ```
-Chapter Author: Jonathan Ronen
+Chapter Author: **Jonathan Ronen**
 
 
 
-\index{multi-omics}Living cells are a symphony of complex processes. Modern sequencing technology has lead to many comprehensive assays being routinely available to experimenters, giving us different ways to peek at the internal doings of the cells, each experiment revealing a different part of some underlying processes. As an example, most cells have the same DNA, but sequencing the genome of a cell allows us to find mutations and structural alterations that drive tumerogenesis in cancer. If we treat the DNA with bisulfite prior to sequencing, cytosine residues are converted to uracil, but 5-methylcytosine residues are unaffected. This allows us to probe the methylation patterns of the genome, or its methylome. By sequencing the mRNA molecules in a cell, we can calculate the abundance, in different samples, of different mRNA transcripts, or uncover its transcriptome. Performing different experiments on the same samples, for instance RNA-seq, DNA-seq, and BS-seq, results in multi-dimensional omics datasets, which enable the study of relationships between different biological processes, e.g. DNA methylation, mutations, and gene expression, and the leveraging of multiple data types to draw inferences about biological systems. This chapter provides an overview of some of the available methods for such analyses, focusing on matrix factorization approaches. In the examples in this chapter we will demonstrate how these methods are applicable to cancer subtyping, i.e. finding tumors which are driven by the same oncogenic processes.
+\index{multi-omics}Living cells are a symphony of complex processes. Modern sequencing technology has lead to many comprehensive assays being routinely available to experimenters, giving us different ways to peek at the internal doings of the cells, each experiment revealing a different part of some underlying processes. As an example, most cells have the same DNA, but sequencing the genome of a cell allows us to find mutations and structural alterations that drive tumerogenesis in cancer. If we treat the DNA with bisulfite prior to sequencing, cytosine residues are converted to uracil, but 5-methylcytosine residues are unaffected. This allows us to probe the methylation patterns of the genome, or its methylome. By sequencing the mRNA molecules in a cell, we can calculate the abundance, in different samples, of different mRNA transcripts, or uncover its transcriptome. Performing different experiments on the same samples, for instance RNA-seq, DNA-seq, and BS-seq, results in multi-dimensional omics datasets, which enable the study of relationships between different biological processes, e.g. DNA methylation, mutations, and gene expression, and the leveraging of multiple data types to draw inferences about biological systems. This chapter provides an overview of some of the available methods for such analyses, focusing on matrix factorization approaches. In the examples in this chapter we will demonstrate how these methods are applicable to cancer molecular subtyping, i.e. finding tumors which are driven by the same molecular processes.
 
 ### Use case: Multi-omics data from colorectal cancer
 
 \index{multi-omics}\index{colorectal cancer}The examples in this chapter will use the following data: a set of 121 tumors from the TCGA [@tcga_pan_cancer] cohorts of Colon and Rectum adenocarcinoma. The tumors have been profiled for gene expression using RNA-seq, mutations using Exome-seq, and copy number variations using genotyping arrays. Projects such as TCGA have turbocharged efforts to sub-divide cancer into subtypes. Although two tumors arise in the colon, they may have distinct molecular profiles, which is important for treatment decisions. The subset of tumors used in this chapter belong to two distinct molecular subtypes defined by the Colorectal Cancer Subtyping Consortium [@cmscc], _CMS1_ and _CMS3_. The following code snippets load this multi-omics data from the companion package, starting with gene expression data from RNA-seq (see Chapter \@ref(rnaseqanalysis)):
 
+**read RNA-seq data**:
 ```{r,moloadMultiomicsGE, tidy=FALSE}
 # read in the csv from the companion package as a data frame
 csvfile <- system.file("extdata", "multi-omics", "COREAD_CMS13_gex.csv", 
@@ -32,8 +29,13 @@ x1 <- read.csv(csvfile, row.names=1)
 # Fix the gene names in the data frame
 rownames(x1) <- sapply(strsplit(rownames(x1), "\\|"), function(x) x[1])
 # Output a table
-knitr::kable(head(t(head(x1))), caption="Gene expression data (head)")
+knitr::kable(head(t(head(x1))), caption="Example gene expression data (head)")
 ```
+Table \@ref(tab:moloadMultiomicsGE) shows the head of the gene expression matrix. The rows correspond to patients, referred to by their TCGA identifier as the first column of the table. Columns represent the genes, and values are RPKM expression values. The column names are the names or symbols of the genes.
+The details about how these expression values are calculated are in Chapter \@ref(rnaseqanalysis).
+
+
+**read mutation data**:
 ```{r,moloadMultiomicsMUT, tidy=FALSE}
 # read in the csv from the companion package as a data frame
 csvfile <- system.file("extdata", "multi-omics", "COREAD_CMS13_muts.csv", 
@@ -43,18 +45,21 @@ x2 <- read.csv(csvfile, row.names=1)
 # we only count one)
 x2[x2>0]=1
 # output a table
-knitr::kable(head(t(head(x2))), caption="Mutation data (head)")
+knitr::kable(head(t(head(x2))), caption="Example mutation data (head)")
 ```
+Table \@ref(tab:moloadMultiomicsMUT) shows the mutations of these tumors (mutations were introduced in Chapter \@ref(intro)). In the mutation matrix, each cell is a binary 1/0, indicating whether or not a tumor has a non-synonymous mutation in the gene indicated by the column. These types of mutations change the aminoacid sequence therefore they are likely to change the function of the protein.
+
+**read copy number data**:
 ```{r,moloadMultiomicsCNV, tidy=FALSE}
 # read in the csv from the companion package as a data frame
 csvfile <- system.file("extdata", "multi-omics", "COREAD_CMS13_cnv.csv", 
                        package="compGenomRData")
 x3 <- read.csv(csvfile, row.names=1)
 # output a table
-knitr::kable(head(t(head(x3))), caption="Copy number data (head)")
+knitr::kable(head(t(head(x3))), 
+             caption="Example copy number data for CRC samples")
 ```
-
-Table  \@ref(tab:moloadMultiomicsGE) shows the head of the gene expression matrix. The rows correspond to patients, referred to by their TCGA identifier. Columns are gene names, and values are RPKM expression values. Similarly, table  \@ref(tab:moloadMultiomicsMUT) shows the mutations of these tumors (mutations were introduced in Chapter \@ref(intro)). In the mutation matrix, each cell is a binary 1/0, indicating whether or not a tumor has a non-synonymous mutation in the gene indicated by the column. Finally, table \@ref(tab:moloadMultiomicsCNV) shows GISTIC scores for copy number alterations in these tumors. During transformation from healthy cells to cancer cells, the genome sometimes undergoes large-scale instability; large segments of the genome might be replicated or lost. This will be reflected in each segment's "copy number". In this matrix, each column corresponds to a chromosome segment, and the value of the cell is a real-valued score indicating if this segment has been amplified (copied more) or lost, relative to a non-cancer control from the same patient.
+Finally, table \@ref(tab:moloadMultiomicsCNV) shows GISTIC scores [@mermel2011gistic2] for copy number alterations in these tumors. During transformation from healthy cells to cancer cells, the genome sometimes undergoes large-scale instability; large segments of the genome might be replicated or lost. This will be reflected in each segment's "copy number". In this matrix, each column corresponds to a chromosome segment, and the value of the cell is a real-valued score indicating if this segment has been amplified (copied more) or lost, relative to a non-cancer control from the same patient.
 
 Each of the data types (gene expression, mutations, copy number variation) on its own, provides some signal which allows to somewhat separate the samples into the two different subtypes. In order to explore these relations, we must first obtain the subtypes of these tumors. The following code snippet reads these, also from the companion package:
 
@@ -112,7 +117,7 @@ The next section will describe latent variable models for multi-omics integratio
 
 ## Latent variable models for multi-omics integration
 
-\index{unsupervised learning}Unsupervised multi-omics integration methods are methods that look for patterns within and across data types, in a label-agnostic fashion, i.e. without knowledge of the identity or label of the analyzed samples (e.g. cell type, tumor/normal). This chapter focuses on latent variable models, a form of dimensionality reduction technique (see Chapter \@ref(unsupervisedLearning)). Latent variable models make an assumption that the high dimensional data we observe (e.g. counts of tens of thousands of mRNA molecules) arise from a lower dimension description. The variables in that lower dimensional description are termed _Latent Variables_, as they are believed to be latent in the data, but not directly observable through experimentation. Therefore, there is a need for methods to infer the latent variables from the data. For instance, (see Chapter 8, RNA-seq analysis) the relative abundance of different mRNA molecules in a cell is largely determined by the cell type. There are other experiments which may be used to discern the cell type of cells (e.g. looking at them under a microscope), but an RNA-seq experiment does not, directly, reveal whether the analyzed sample was taken from one organ or another. A latent variable model would set the cell type as a latent variable, and the observable abundance of mRNA molecules to be dependent on the value of the latent variable (e.g. if the latent variable is "Regulatory T-cell", we would expect to find high expression of CD4, FOXP3, and CD25).
+\index{unsupervised learning}Unsupervised multi-omics integration methods are methods that look for patterns within and across data types, in a label-agnostic fashion, i.e. without knowledge of the identity or label of the analyzed samples (e.g. cell type, tumor/normal). This chapter focuses on latent variable models, a form of dimensionality reduction technique (see Chapter \@ref(unsupervisedLearning)). Latent variable models make an assumption that the high dimensional data we observe (e.g. counts of tens of thousands of mRNA molecules) arise from a lower dimension description. The variables in that lower dimensional description are termed _Latent Variables_, as they are believed to be latent in the data, but not directly observable through experimentation. Therefore, there is a need for methods to infer the latent variables from the data. For instance, (see Chapter \@ref(rnaseqanalysis) for details of RNA-seq analysis) the relative abundance of different mRNA molecules in a cell is largely determined by the cell type. There are other experiments which may be used to discern the cell type of cells (e.g. looking at them under a microscope), but an RNA-seq experiment does not, directly, reveal whether the analyzed sample was taken from one organ or another. A latent variable model would set the cell type as a latent variable, and the observable abundance of mRNA molecules to be dependent on the value of the latent variable (e.g. if the latent variable is "Regulatory T-cell", we would expect to find high expression of CD4, FOXP3, and CD25).
   
 ## Matrix factorization methods for unsupervised multi-omics data integration
 
@@ -143,7 +148,7 @@ In Figure \@ref(fig:momatrixFactorization), the $5 \times 4$ data matrix $X$ is
 
 Figure \@ref(fig:moMFA) sketches a naive extension of PCA to a multi-omics context.
 
-```{r,moMFA,fig.cap="A naive extension of PCA to multi-omics; data matrices from different platforms are stacked, before applying PCA.",fig.align = 'center',out.width='75%',echo=FALSE}
+```{r,moMFA,fig.cap="A naive extension of PCA to multi-omics; data matrices from different platforms are stacked, before applying PCA.",fig.align = 'center',out.width='50%',echo=FALSE}
 knitr::include_graphics("images/mfa.png" )
 ```
 
@@ -203,7 +208,7 @@ ggplot2::geom_point() + ggplot2::ggtitle("Scatter plot of MFA")
 Figure \@ref(fig:momfascatterplot) shows remarkable separation between the cancer subtypes; it is easy enough to draw a line separating the tumors to CMS subtypes with good accuracy.
 
 Another way to examine the MFA factors, which is also useful for factor models with more than two components, is a heatmap, as shown in Figure \@ref(fig:momfaheatmap), generated by the following code snippet:
-```{r,momfaheatmap,fig.cap="A heatmap of the two MFA components shows separation between the cancer subtypes."}
+```{r,momfaheatmap,fig.cap="A heatmap of the two MFA components shows separation between the cancer subtypes.",fig.height=3}
 pheatmap::pheatmap(t(mfa.h)[1:2,], annotation_col = anno_col,
                   show_colnames = FALSE,
                   main="MFA for multi-omics integration")
@@ -510,10 +515,10 @@ ggplot2::geom_point() +
 ggplot2::ggtitle("Scatter plot of iCluster+ factors")
 ```
 
-```{r,moiclusterplusheatmap,fig.cap="iCluster+ factors shown in a heatmap separate tumors into their subtypes well.", echo=FALSE}
+```{r,moiclusterplusheatmap,fig.cap="iCluster+ factors, shown in a heatmap, separate tumors into their subtypes well.", echo=FALSE,fig.height=3}
 pheatmap::pheatmap(t(icp_df[,1:2]),
                    annotation_col = anno_col, 
-                   show_colnames = FALSE,
+                   show_colnames = FALSE,border_color = NA,
                    main="Heatmap of iCluster+ factors")
 ```
 
@@ -541,7 +546,7 @@ A specific clustering method for NMF data is to assume each sample is driven by
 
 The two rows are the two latent variables, and the columns are the 72 tumors. We can observe that most tumors are indeed driven mainly by one of the factors, and not a combination of the two. We can use this to assign each tumor a cluster label based on its dominant factor, shown in the following code snippet, which also produces the heatmap in Figure \@ref(fig:moNMFClustering).
 
-```{r,moNMFClustering,fig.cap="Joint NMF factors with clusters, and molecular sub-types. One-hot clustering assigns one cluser per dimension, where each sample is assigned a cluster based on its dominant component. The clusters largely recapitulate the CMS sub-types."}
+```{r,moNMFClustering,fig.cap="Joint NMF factors with clusters, and molecular sub-types. One-hot clustering assigns one cluser per dimension, where each sample is assigned a cluster based on its dominant component. The clusters largely recapitulate the CMS sub-types.",fig.height=3}
 # one-hot clustering in one line of code:
 # assign each sample the cluster according to its dominant NMF factor
 # easily accessible using the max.col function
@@ -559,7 +564,7 @@ anno_nmf_cl <- data.frame(
 pheatmap::pheatmap(t(nmf.h[order(nmf.clusters),]),
   cluster_cols=FALSE, cluster_rows=FALSE,
   annotation_col = anno_nmf_cl,
-  show_colnames = FALSE,
+  show_colnames = FALSE,border_color=NA,
   main="Joint NMF factors\nwith clusters and molecular subtypes")
 ```
 
@@ -571,7 +576,7 @@ The one-hot clustering method does not lend itself very well to the other method
 
 K-means clustering was introduced in Chapter \@ref(unsupervisedLearning). K-means is a special case of the EM algorithm, and indeed iCluster was originally conceived as an extension of K-means from binary cluster assignments to real-valued latent variables. The iCluster algorithm, as it is so named, calls for application of K-means clustering on its latent variables, after the inference step. The following code snippet shows how to pull K-means clusters out of the iCluster results, and produces the heatmap in Figure \@ref(fig:moiClusterHeatmap), which shows how well these clusters correspond to cancer subtypes.
 
-```{r,moiClusterHeatmap,fig.cap="K-means clustering on iCluster+ factors largely recapitulates the CMS sub-types."}
+```{r,moiClusterHeatmap,fig.cap="K-means clustering on iCluster+ factors largely recapitulates the CMS sub-types.",fig.height=3}
 # use the kmeans function to cluster the iCluster H matrix (here, z)
 # using 2 as the number of clusters.
 icluster.clusters <- kmeans(icluster.z, 2)$cluster
@@ -588,7 +593,7 @@ pheatmap::pheatmap(
   t(icluster.z[order(icluster.clusters),]), # order z by the kmeans clusters
   cluster_cols=FALSE, # use cluster_cols and cluster_rows=FALSE
   cluster_rows=FALSE, # as we want the ordering by k-means clusters to hold
-  show_colnames = FALSE,
+  show_colnames = FALSE,border_color=NA,
   annotation_col = anno_icluster_cl,
   main="iCluster factors\nwith clusters and molecular subtypes")
 ```
@@ -652,9 +657,9 @@ The hypergeometric enrichment test is also referred to as _Fisher's one-sided ex
 
 #### Example in R
 
-In R, we can do this analysis using the `enrichR` package, which gives us access to many gene set libraries. In the example below, we'll find the genes associated with preferentially NMF factor 1 or NMF factor 2, by the contribution of those genes' expression values to the factor. Then, we'll use `enrichR` to query the Gene Ontology terms which might be overlapping:
+In R, we can do this analysis using the `enrichR` package, which gives us access to many gene set libraries. In the example below, we will find the genes associated with preferentially NMF factor 1 or NMF factor 2, by the contribution of those genes' expression values to the factor. Then, we'll use `enrichR` to query the Gene Ontology terms which might be overlapping:
 
-```{r,moenrichr}
+```{r,moenrichr,message=FALSE,warning=FALSE,results='hide',error=FALSE}
 require(enrichR)
 
 # select genes associated preferentially with each factor
@@ -673,13 +678,18 @@ go.factor.2 <- enrichR::enrichr(genes.factor.2,
 ```
 
 The top GO terms associated with NMF factor 2 are shown in Table \@ref(tab:moNMFGOTerms):
-```{r,moNMFGOTerms,caption="Top GO-terms associated with NMF factor 2.",echo=FALSE}
+```{r,moNMFGOTerms,caption="Top GO-terms associated with NMF factor 2.",echo=FALSE,results='as.is'}
+library(kableExtra)
+
 go.factor.2$Genes <- gsub(";", "; ", go.factor.2$Genes)
 
-the.table <- knitr::kable(head(go.factor.2, 3)[,c("Term", "Adjusted.P.value", "Combined.Score", "Genes")],
-                 caption="GO-terms associated with NMF factor 2")
-the.table <- kableExtra::column_spec(the.table, 1, width="10em")
-the.table <- kableExtra::column_spec(the.table, 4, width="10em")
+the.table <- knitr::kable(head(go.factor.2, 3)[,c("Term", "Adjusted.P.value", "Combined.Score")],
+                 caption="GO-terms associated with NMF factor 2",
+                 format="latex")
+#the.table <- kableExtra::column_spec(the.table, 1, width="10em")
+
+the.table <- kableExtra::kable_styling(the.table ,latex_options = c( "scale_down"))
+#the.table <- kableExtra::column_spec(the.table, 4, width="10em")
 the.table
 ```
 
diff --git a/book.bib b/book.bib
index 828081955358749df2718b24d60b186f7b0076bf..049e3191a3895f07075d02dd9eff1e5e6b11202b 100755
--- a/book.bib
+++ b/book.bib
@@ -1,3 +1,14 @@
+@article{mermel2011gistic2,
+  title={GISTIC2. 0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers},
+  author={Mermel, Craig H and Schumacher, Steven E and Hill, Barbara and Meyerson, Matthew L and Beroukhim, Rameen and Getz, Gad},
+  journal={Genome biology},
+  volume={12},
+  number={4},
+  pages={R41},
+  year={2011},
+  publisher={Springer}
+}
+
 @article{morris2014rise,
   title={The rise of regulatory RNA},
   author={Morris, Kevin V and Mattick, John S},