Pipeline for annotation of functional amplicons targeting BGC domains
Pipeline for annotation of functional amplicons targeting BGC domains. The tool is designed to transfer annotation of amplicons based on their similarity to in silico amplicons from natural product databases.
Pre-parsed static version of the databases are provided. Beware: if you want to update the databases to a specific version computational time can be quite high.
An example of the command needed to run the pipeline is found in CMD_example.
An example of the command needed to run the pipeline is found in CMD_example.
Antismashdb gbk database is not provided
Mibig gbk database can be found at https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz
To generate the amplicons [and in silico amplicons]. Use hmmsearch tool with the HMM profile provided in this repo or a more recent/different one [if you do so you may need to tweak numbers in the parse_hmm.py script].
To generate the amplicons [and in silico amplicons], use hmmsearch tool with the HMM profile provided in this repo:
To generate the phylogeny tree you can use any tool capable of creating a newick file output from a MSA. [I used fasttree but you are welcome to use any other tool of your choice http://www.microbesonline.org/fasttree/]
To generate the phylogeny tree you can use any tool capable of creating a newick file output from a MSA. [I used fasttree but you are welcome to use any other tool of your choice http://www.microbesonline.org/fasttree/]
dom2BGC can also attempt regenerate the physical clustering of domains that is lost during the amplicon creation process using co-occurrence across different samples.
This putative clusters are should be considered predictions that need to be validated with dedicated experiments but can provide additional insight into biological mechanisms associated with their natural products.
Obviously, multiple samples and biological/technical replicates are needed in order to enable cooccurrence-based putative cluster reconstruction. If active, spearman cooccurrence patterns above a user-set threshold are used to generate a network.
Clustering of the network results in "highly cooccurring amplicon hubs". Amplicons within the same hub mapping on multiple domains of the same cluster [from antismash-database] result in a predicted cluster.
The network can be visualized in cytoscape where you can load annotation and clustering results. Predicted clusters are found in separate cluster files.
Database availability:
Antismashdb gbk database is not provided
Mibig gbk database can be found at https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz