A series of scripts were developed as part of the Rationally designed microbiomes project 2022
The core part of the MiMiC workflow make use of assembled metagenomic reads (e.g. metaspades1), followed by gene discovery using Prodigal2 and gene annotation using Hmmscan3 and the PFAM DB4. These steps are followed by MiMiC scripts 5 that calculates per metagenomic sample, which functions are present and how these functions can be covered by a small number of bacterial isolates.
Steps
01 - download metagenomic data public dataset
In this project, the starting point is metagenomic samples.
02-04 - Clean reads, assemble using metaspades and annotate using kraken
Reads are trimmed, filtered for host contamination and annotated to estimate the bacterial species composition using Kraken26. Finally, reads are assembled using metaspades.
05 - Conversion of Kraken results
Taxonomic results are converted into biom format for downstream processing
06 - Estimate taxonomic abundance and prevalence
Using a custom R-script, taxonomic abundances are plotted and a list of the most prevalent and abundant bacteria is exported.
07 - Download genomes NCBI
MiMiC requires a database of bacterial isolates. In this project, genomes are acquired from NBCI.
08 - MiMic part 1 (adjusted script)
Metagenomic assemblies are annotated using Prodigal, Hmmscan and the PFAM database. These tools are used in a similar way to annotate the bacterial isolate database.
09 - MiMic part 2 (adjusted script)
The metagenomic function analysis is summarized into a table. This is repeated for the bacterial isolate analysis.
10 - MiMic part 3 (adjusted script)
The results of the metagenomic analysis are converted in another format. This is repeated for the bacterial isolate analysis.
11 - MiMic part 4 (adjusted script)
Using both the metagenomic and bacterial data, calculations are performed using MiMiC functions to determine per metagenomic sample, which functions are present and how these functions can be covered by a small number of bacterial isolates. This is followed by a custom script to determine the minimal microbiome across all metagenomic samples and scripts that visualize the results.
12 - KEGG annotation (adjusted script)
Using KofamScan7, gene function is translated into KEGG function.
13 - Visualization KEGG pathways
In order to ensure key pathways are covered in terms of function, KEGG functions are visualized side by side for both the metagenomic samples as for the minimal microbiome.
Versioning: Version 0.1.0
References
- Sergey Nurk et al., 2017 https://github.com/ablab/spades
- Doug Hyatt et al., 2010; https://github.com/hyattpd/Prodigal
- Simon C Potter et al., 2018 https://github.com/EddyRivasLab/hmmer
- Jaina Mistry et al., 2021; http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/
- Neeraj Kumar et al., 2021; https://github.com/ClavelLab/MiMiC
- Derrick E. Wood et all., 2019; https://github.com/DerrickWood/kraken2
- Aramaki T. et al., 2019 ; https://github.com/takaram/kofam_scan
Additionally the following software is applied
Brian Bushnell, 2013; https://sourceforge.net/projects/bbmap/ Ole Tange, 2010; https://github.com/martinda/gnu-parallel S.M. Dabdoub, 2016; https://github.com/smdabdoub/kraken-biom
Software list
prodigal spades hmmer kraken2 bbmap parallel kraken-biom kofamscan sra-tools entrez-direct ncbi-genome-download
R packages
ade4, colorspace, ComplexHeatmap, cowplot, data.table, dplyr, ggcorrplot, ggplot2, ggsignif, ggtext, graph, gridExtra, inflection, KEGGgraph, microbiome, optparse, pathview, pheatmap, phyloseq, plyr, RColorBrewer, readr, tidyverse, vegan, XML