Bioinformatically, mining (meta)genomes for Biosynthetic Gene Clusters (BGCs) encoding the production of Secondary Metabolites has become a key strategy for Naturel Product discovery. At the single-genome basis, this process is performed by tools such as antiSMASH.
When studying large sets of genomes and metagenomes, it becomes essential to perform analyses at a large scale. BiG-SCAPE (Biosynthetic Gene Similarity Clustering and Prospecting Engine) is a tool that calculates distances between BGCs in order to map the BGC diversity onto sequence similarity networks, which are then processed for automated reconstruction of Gene Cluster Families, groups of gene clusters that encode biosynthesis of highly similar or identical molecules. BiG-SCAPE's interactive visualizations of these similarity networks allows effective exploration of the diversity of BGCs, linking them to knowledge from reference data within the MIBiG repository
How does it work in a nutshell
BiG-SCAPE (recursively) reads BGC information stored as GenBank files from the input folder (which, preferrably, corresponds to identified gene clusters with a tool like antiSMASH).
BiG-SCAPE then uses the Pfam database and hmmscan from the HMMER suite to predict Pfam domains in each sequence, thus summarizing each BGC as a linear string of Pfam domains.
The distances for each cutoff value will be used to automatically define 'Gene Cluster Families' (GCFs) and 'Gene Cluster Clans' (GCCs).
By default, BiG-SCAPE uses the /product information of antiSMASH-processed GenBank files to separate the analysis into eight BiG-SCAPE classes. Each has different (tuned) sets of weights for the distance components. You can also choose to combine all BGC classes into a single network file (--mix) and deactivate the default classification (--no_classify). It is also possible to prevent analysis of any of the BiG-SCAPE classes by using the --banned_classes parameter.
Learn more about the BiG-SCAPE options with python bigscape.py -h or by going to the specific wiki page.
See the related pages in the wiki for more detailed information.