Skip to content
Snippets Groups Projects

Build Status


A series of scripts were developed as part of the Rationally designed microbiomes project 2022

The core part of the MiMiC workflow make use of assembled metagenomic reads (e.g. metaspades1), followed by gene discovery using Prodigal2 and gene annotation using Hmmscan3 and the PFAM DB4. These steps are followed by MiMiC scripts 5 that calculates per metagenomic sample, which functions are present and how these functions can be covered by a small number of bacterial isolates.


Steps

01 - download metagenomic data public dataset

In this project, the starting point is metagenomic samples.

02-04 - Clean reads, assemble using metaspades and annotate using kraken

Reads are trimmed, filtered for host contamination and annotated to estimate the bacterial species composition using Kraken26. Finally, reads are assembled using metaspades.

05 - Conversion of Kraken results

Taxonomic results are converted into biom format for downstream processing

06 - Estimate taxonomic abundance and prevalence

Using a custom R-script, taxonomic abundances are plotted and a list of the most prevalent and abundant bacteria is exported.

07 - Download genomes NCBI

MiMiC requires a database of bacterial isolates. In this project, genomes are acquired from NBCI.

08 - MiMic part 1 (adjusted script)

Metagenomic assemblies are annotated using Prodigal, Hmmscan and the PFAM database. These tools are used in a similar way to annotate the bacterial isolate database.

09 - MiMic part 2 (adjusted script)

The metagenomic function analysis is summarized into a table. This is repeated for the bacterial isolate analysis.

10 - MiMic part 3 (adjusted script)

The results of the metagenomic analysis are converted in another format. This is repeated for the bacterial isolate analysis.

11 - MiMic part 4 (adjusted script)

Using both the metagenomic and bacterial data, calculations are performed using MiMiC functions to determine per metagenomic sample, which functions are present and how these functions can be covered by a small number of bacterial isolates. This is followed by a custom script to determine the minimal microbiome across all metagenomic samples and scripts that visualize the results.

12 - KEGG annotation (adjusted script)

Using KofamScan7, gene function is translated into KEGG function.

13 - Visualization KEGG pathways

In order to ensure key pathways are covered in terms of function, KEGG functions are visualized side by side for both the metagenomic samples as for the minimal microbiome.

Versioning: Version 0.1.0

References

  1. Sergey Nurk et al., 2017 https://github.com/ablab/spades
  2. Doug Hyatt et al., 2010; https://github.com/hyattpd/Prodigal
  3. Simon C Potter et al., 2018 https://github.com/EddyRivasLab/hmmer
  4. Jaina Mistry et al., 2021; http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/
  5. Neeraj Kumar et al., 2021; https://github.com/ClavelLab/MiMiC
  6. Derrick E. Wood et all., 2019; https://github.com/DerrickWood/kraken2
  7. Aramaki T. et al., 2019 ; https://github.com/takaram/kofam_scan

Additionally the following software is applied

Brian Bushnell, 2013; https://sourceforge.net/projects/bbmap/ Ole Tange, 2010; https://github.com/martinda/gnu-parallel S.M. Dabdoub, 2016; https://github.com/smdabdoub/kraken-biom

Software list

prodigal spades hmmer kraken2 bbmap parallel kraken-biom kofamscan sra-tools entrez-direct ncbi-genome-download

R packages

ade4, colorspace, ComplexHeatmap, cowplot, data.table, dplyr, ggcorrplot, ggplot2, ggsignif, ggtext, graph, gridExtra, inflection, KEGGgraph, microbiome, optparse, pathview, pheatmap, phyloseq, plyr, RColorBrewer, readr, tidyverse, vegan, XML