1. 01 May, 2018 1 commit
  2. 30 Apr, 2018 3 commits
  3. 26 Apr, 2018 1 commit
  4. 13 Apr, 2018 1 commit
  5. 09 Apr, 2018 2 commits
    • Jorge Navarro Muñoz's avatar
      Jorge Navarro Muñoz authored
      - hybrids mode is now default. Use --hybrids-off to turn it off
      - cutoffs now has a single default value: 0.3
      - clans clustering is now activated by default. Use --clans-off to turn
      it off
      - clan_cutoff is now 0.3 0.7 (i.e. will use GCFs defined at 0.3 and will
      form GCC using average distance between GCFs of 0.7 or less)
    • Jorge Navarro Muñoz's avatar
      MIBiG mode bugfix · a59c1eb9
      Jorge Navarro Muñoz authored
      Fixes issue #5. Some BGCs in MIBiG that don't have any domains were not
      being removed from the mibig_set
  6. 08 Apr, 2018 2 commits
  7. 05 Apr, 2018 1 commit
  8. 03 Apr, 2018 1 commit
  9. 01 Apr, 2018 1 commit
  10. 23 Mar, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      New Feature: Query BGC · 1415b5b2
      Jorge Navarro Muñoz authored
      Use a designated BGC with --query_bgc (not necessarily in your
      --inputdir) and search only for similar (up to max(cutoffs)) BGCs in
      your data set.
      - TODO: after the first round of QBGC vs all, delete distance
      information from all non-relevant distances
  11. 21 Mar, 2018 1 commit
  12. 08 Mar, 2018 1 commit
  13. 07 Mar, 2018 1 commit
  14. 05 Mar, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Flex. CDS overlap, minor improvmnts (glocal mode, compute resources) · 2fe7ca8b
      Jorge Navarro Muñoz authored
      - Added another criteria to start glocal expansion: the seed slice
      contains a core gene
      - Don't include bgcs with unwanted classes very eary on (so don't
      calculate domains, align them, keep their distances...)
      - If CDSs overlap, allow for an overlap of up to 10% of the shortest CDS
      (was causing trouble with some true positive overlapping CDSs)
  15. 01 Mar, 2018 1 commit
  16. 19 Feb, 2018 1 commit
  17. 16 Feb, 2018 1 commit
  18. 15 Feb, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Minor corrections on SVG output · 3e5426ea
      Jorge Navarro Muñoz authored
      - Disable gene colors (Arrower script would generate random colors for
      genes that were named)
      - Disable gene categories (depending on domains found in gene, Arrower
      script drew a shadow surrounding each gene). These two made the SVG
      output too busy and not very informative
      - Fix typo in BiG-SCAPE
  19. 09 Feb, 2018 3 commits
    • Jorge Navarro Muñoz's avatar
      Bugfix and better visualization alignment when overlap == 1 gene · 3a8ba69f
      Jorge Navarro Muñoz authored
      - Fix a bug that was introduced in the previous commit
      - If the Longest Common Subcluster is only one gene, and there are
      multiple 1-gene matches, choose the one with the highest number of
    • Jorge Navarro Muñoz's avatar
      Small improvement in visualization: better alignment of some BGCs · 21e0c562
      Jorge Navarro Muñoz authored
      If a pair of BGCs from the same GCF (exemplar + some other member) don't
      show a good positional alignment (it can happen when one of the BGCs is
      too short and there is a slight difference in domain content (perhaps
      one domain was not detected) so the alignment length from the LCS
      algorithm would be zero. For these (somewhat rare) cases, the pair of
      BGCs would be aligned using the first gene from each. Now, choose the
      gene with the most domain content
    • Jorge Navarro Muñoz's avatar
      Fix bug from previous commit · b8a0b4a4
      Jorge Navarro Muñoz authored
      If an overlapping CDS appeared more than once, it would be marked for
      deletion more than once as well. Changed the list of CDS to be deleted
      for a set.
  20. 08 Feb, 2018 1 commit
  21. 07 Feb, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Partial fix for failure when using BGCs with splicing events *WARNING* · 828361b6
      Jorge Navarro Muñoz authored
      If two or more CDSs share the same locus_tag (or gene_id if no locus_tag is
      present), BiG-SCAPE will only keep the longest CDS. This has some consequences:
      - If the user is re-using data that contains such splicing events (MIBiG
      dataset or eukaryotic BGCs), she will have to delete previous data and start a
      fresh run.
      - This will still cause trouble when dealing with GenBank files whose CDS
      features don't contain the locus_tag or gene_id qualifiers. A further fix will
      be implemented for that case
      - As the original GenBank file is used to draw the SVG figure, it will still
      contain all CDSs (but the ones not used will not show any domains)
  22. 24 Jan, 2018 1 commit
  23. 23 Jan, 2018 1 commit
  24. 18 Dec, 2017 1 commit
    • Jorge Navarro Muñoz's avatar
      GCFs: Singleton BGCs now put in their own family + dependencies + Clans bugfix · f00316e8
      Jorge Navarro Muñoz authored
      - BiG-SCAPE now uses networkx to separate connected subgraphs in the chosen
      cutoff. If the subgraph is smaller than 3 nodes, it will be assigned
      automatically to its own family (in the previous commit, scikit-learn's
      implementation of Affinity Propagation would put all singletons in the same
      family). New dependency for networkx
      - Bugfix for Gene Cluster Clan (GCCs) clustering.
      - GCC is also using scikit-learn's implementation of Affinity Propagation,
      which means that pySAPC is no longer required
  25. 06 Dec, 2017 3 commits
    • Jorge Navarro Muñoz's avatar
    • Jorge Navarro Muñoz's avatar
      Bugfixes · 6b18c7e9
      Jorge Navarro Muñoz authored
      - Pass correct total size of BGC to visualization (for multi-record files)
      - When 2 pairs don't share any gene (domain-wise), seed LCS will be 0. Correct
      length for visualization
    • Jorge Navarro Muñoz's avatar
      Fixed GCF visualization for some MIBiG BGCS. *BREAKS BACKWARD COMPATIBILITY* · 27989064
      Jorge Navarro Muñoz authored
      - Multi-record gbks are now parsed differently. Because every record had local
      coordinates, final positions could be mixed between ORFs of different records.
      This has been fixed by creating absolute positions for each record (second
      record will start 1kb after the end of the first one, etc.). The price to pay
      for this is that a large number of files in <outputfolder>/cache have to be
      reprocessed. There is now a warning for people that are reusing results from
      previous runs
      - Better alignment of BGCs whose LCS has the same length in both orientations.
      (uses the first gene in the forward orientation as a guide)
      - GCF visualization: warning, sometimes rooting at midpoint would crash
      (apparently when sequences are exactly the same and all distances == 0)
      - Fixed a bug that appeared when using Python 3 and a BGC with no predicted
      domains had to be deleted from the run (Thanks to Wouter Lockhorst)
  26. 04 Dec, 2017 3 commits
    • Jorge Navarro Muñoz's avatar
    • Jorge Navarro Muñoz's avatar
      Bugfix for Positional alignments + change to GCF trees · a620bf40
      Jorge Navarro Muñoz authored
      - Cases where the pairwise positional alignment reported a bgc inversion
      were corrected
      - GCF Trees are now rooted at midpoint
    • Jorge Navarro Muñoz's avatar
      Gene Cluster Family improved visualization and more · e9c1cfc8
      Jorge Navarro Muñoz authored
      - Visualization of GCFs also includes a) a phylogenetic tree using the "GCF
      core", comprised of the (aligned) sequences of all the most shared domains (that
      are also contained in the exemplar BGC). Concatenated domain sequences and
      Newick Tree files will be available in the <bgc class>/GCF_Trees folder
      - Tree is calculated using Fasttree, so this is a new requierement
      - BGCs are also aligned within the GCF visualization (initial work)
      - Clustering (at the GCF level) is done now using scikit-learn's implementation
      of Affinity Propagation, so sklearn is a new requirement (maybe temporary?).
      Clustering at the Clan level is still done using pySAPC
      - Samples mode is marked for deletion. No output will be generated when turning
      it on (and the paramter will dissappear altoghether soon)
      - GCC: Added similarity value for main diagonal (every GCF is equal to itself)
      - Changed run time to local time
      - Now outputs a full Networks Annotation file (so you can pinpoint in which
      class a BGC of interest might end up in )
      - --mix results are now stored in their own folder
  27. 29 Nov, 2017 3 commits
  28. 28 Nov, 2017 1 commit