Update README.md

bae0976e · Tracanna, Vittorio · 037f0854 · bae0976e
Commit bae0976e authored 5 years ago by Tracanna, Vittorio
--- a/README.md
+++ b/README.md
 # dom2BGC
-Pipeline for annotation of functional amplicons targeting BGC domains
+Pipeline for annotation of functional amplicons targeting BGC domains. The tool is designed to transfer annotation of amplicons based on their similarity to in silico amplicons from natural product databases.
+Pre-parsed static version of the databases are provided. Beware: if you want to update the databases to a specific version computational time can be quite high.
 An example of the command needed to run the pipeline is found in CMD_example.
-Antismashdb gbk database is not provided
-Mibig gbk database can be found at https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz
-To generate the amplicons [and in silico amplicons]. Use hmmsearch tool with the HMM profile provided in this repo or a more recent/different one [if you do so you may need to tweak numbers in the parse_hmm.py script].
+To generate the amplicons [and in silico amplicons], use hmmsearch tool with the HMM profile provided in this repo:
+`hmmsearch -o /path/to/hmmsearch/output/and/filename /path/to/hmm_profile.hmm /path/to/protein/sequences.faa`
 Then run the parse_hmm.py script with the hmmsearch ouput file.
+`python hmm_profiles/parse_hmm.py /path/to/hmmsearch/output/and/filename /path/to/parsed/output/and/filename.faa`
 To generate the phylogeny tree you can use any tool capable of creating a newick file output from a MSA. [I used fasttree but you are welcome to use any other tool of your choice http://www.microbesonline.org/fasttree/]
+`fasttree /path/to/parsed/output/and/filename.faa > /path/to/parsed/output/and/filename.tree`
+dom2BGC can also attempt regenerate the physical clustering of domains that is lost during the amplicon creation process using co-occurrence across different samples. 
+This putative clusters are should be considered predictions that need to be validated with dedicated experiments but can provide additional insight into biological mechanisms associated with their natural products.
+Obviously, multiple samples and biological/technical replicates are needed in order to enable cooccurrence-based putative cluster reconstruction. If active, spearman cooccurrence patterns above a user-set threshold are used to generate a network. 
+Clustering of the network results in "highly cooccurring amplicon hubs". Amplicons within the same hub mapping on multiple domains of the same cluster [from antismash-database] result in a predicted cluster.
+The network can be visualized in cytoscape where you can load annotation and clustering results. Predicted clusters are found in separate cluster files.
+Database availability:
+Antismashdb gbk database is not provided
+Mibig gbk database can be found at https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz