Skip to content
Snippets Groups Projects
Commit bae0976e authored by Tracanna, Vittorio's avatar Tracanna, Vittorio
Browse files

Update README.md

parent 037f0854
No related branches found
No related tags found
No related merge requests found
# dom2BGC # dom2BGC
Pipeline for annotation of functional amplicons targeting BGC domains Pipeline for annotation of functional amplicons targeting BGC domains. The tool is designed to transfer annotation of amplicons based on their similarity to in silico amplicons from natural product databases.
Pre-parsed static version of the databases are provided. Beware: if you want to update the databases to a specific version computational time can be quite high.
An example of the command needed to run the pipeline is found in CMD_example. An example of the command needed to run the pipeline is found in CMD_example.
Antismashdb gbk database is not provided
Mibig gbk database can be found at https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz
To generate the amplicons [and in silico amplicons]. Use hmmsearch tool with the HMM profile provided in this repo or a more recent/different one [if you do so you may need to tweak numbers in the parse_hmm.py script]. To generate the amplicons [and in silico amplicons], use hmmsearch tool with the HMM profile provided in this repo:
`hmmsearch -o /path/to/hmmsearch/output/and/filename /path/to/hmm_profile.hmm /path/to/protein/sequences.faa`
Then run the parse_hmm.py script with the hmmsearch ouput file. Then run the parse_hmm.py script with the hmmsearch ouput file.
`python hmm_profiles/parse_hmm.py /path/to/hmmsearch/output/and/filename /path/to/parsed/output/and/filename.faa`
To generate the phylogeny tree you can use any tool capable of creating a newick file output from a MSA. [I used fasttree but you are welcome to use any other tool of your choice http://www.microbesonline.org/fasttree/] To generate the phylogeny tree you can use any tool capable of creating a newick file output from a MSA. [I used fasttree but you are welcome to use any other tool of your choice http://www.microbesonline.org/fasttree/]
`fasttree /path/to/parsed/output/and/filename.faa > /path/to/parsed/output/and/filename.tree`
dom2BGC can also attempt regenerate the physical clustering of domains that is lost during the amplicon creation process using co-occurrence across different samples.
This putative clusters are should be considered predictions that need to be validated with dedicated experiments but can provide additional insight into biological mechanisms associated with their natural products.
Obviously, multiple samples and biological/technical replicates are needed in order to enable cooccurrence-based putative cluster reconstruction. If active, spearman cooccurrence patterns above a user-set threshold are used to generate a network.
Clustering of the network results in "highly cooccurring amplicon hubs". Amplicons within the same hub mapping on multiple domains of the same cluster [from antismash-database] result in a predicted cluster.
The network can be visualized in cytoscape where you can load annotation and clustering results. Predicted clusters are found in separate cluster files.
Database availability:
Antismashdb gbk database is not provided
Mibig gbk database can be found at https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment