1. 12 Nov, 2018 1 commit
  2. 05 Oct, 2018 1 commit
  3. 01 Oct, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Input sanitation and update to included MIBiG BGCs · 48579954
      Jorge Navarro Muñoz authored
      - Gene and protein names will have colons in their names substituted by
      underscores to avoid issues further on (colons are used internally to
      split information contained in the sequences' header)
      - Added a new bundle of MIBiG BGCs (version 1.4). Version 1.3 is still
      kept for reproducibility
      48579954
  4. 03 Sep, 2018 1 commit
  5. 06 Aug, 2018 1 commit
  6. 19 Jun, 2018 1 commit
  7. 05 Jun, 2018 1 commit
  8. 23 May, 2018 2 commits
  9. 22 May, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      New feature: domain whitelist · ddf670d0
      Jorge Navarro Muñoz authored
      - Include only BGCs that contain a user-defined list of domain accessions. In
      this case, the list is contained in the domain_whitelist.txt file (which
      already includes an example). Toggle with --domain_whitelist
      ddf670d0
  10. 04 May, 2018 1 commit
  11. 03 May, 2018 1 commit
  12. 01 May, 2018 2 commits
  13. 30 Apr, 2018 3 commits
  14. 26 Apr, 2018 1 commit
  15. 13 Apr, 2018 1 commit
  16. 09 Apr, 2018 2 commits
    • Jorge Navarro Muñoz's avatar
      CHANGES IN DEFAULT PARAMETERS · 15c0ce69
      Jorge Navarro Muñoz authored
      - hybrids mode is now default. Use --hybrids-off to turn it off
      - cutoffs now has a single default value: 0.3
      - clans clustering is now activated by default. Use --clans-off to turn
      it off
      - clan_cutoff is now 0.3 0.7 (i.e. will use GCFs defined at 0.3 and will
      form GCC using average distance between GCFs of 0.7 or less)
      15c0ce69
    • Jorge Navarro Muñoz's avatar
      MIBiG mode bugfix · a59c1eb9
      Jorge Navarro Muñoz authored
      Fixes issue #5. Some BGCs in MIBiG that don't have any domains were not
      being removed from the mibig_set
      a59c1eb9
  17. 08 Apr, 2018 2 commits
  18. 05 Apr, 2018 1 commit
  19. 03 Apr, 2018 1 commit
  20. 01 Apr, 2018 1 commit
  21. 23 Mar, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      New Feature: Query BGC · 1415b5b2
      Jorge Navarro Muñoz authored
      Use a designated BGC with --query_bgc (not necessarily in your
      --inputdir) and search only for similar (up to max(cutoffs)) BGCs in
      your data set.
      - TODO: after the first round of QBGC vs all, delete distance
      information from all non-relevant distances
      1415b5b2
  22. 21 Mar, 2018 1 commit
  23. 08 Mar, 2018 1 commit
  24. 07 Mar, 2018 1 commit
  25. 05 Mar, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Flex. CDS overlap, minor improvmnts (glocal mode, compute resources) · 2fe7ca8b
      Jorge Navarro Muñoz authored
      - Added another criteria to start glocal expansion: the seed slice
      contains a core gene
      - Don't include bgcs with unwanted classes very eary on (so don't
      calculate domains, align them, keep their distances...)
      - If CDSs overlap, allow for an overlap of up to 10% of the shortest CDS
      (was causing trouble with some true positive overlapping CDSs)
      2fe7ca8b
  26. 01 Mar, 2018 1 commit
  27. 19 Feb, 2018 1 commit
  28. 16 Feb, 2018 1 commit
  29. 15 Feb, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Minor corrections on SVG output · 3e5426ea
      Jorge Navarro Muñoz authored
      - Disable gene colors (Arrower script would generate random colors for
      genes that were named)
      - Disable gene categories (depending on domains found in gene, Arrower
      script drew a shadow surrounding each gene). These two made the SVG
      output too busy and not very informative
      - Fix typo in BiG-SCAPE
      3e5426ea
  30. 09 Feb, 2018 3 commits
    • Jorge Navarro Muñoz's avatar
      Bugfix and better visualization alignment when overlap == 1 gene · 3a8ba69f
      Jorge Navarro Muñoz authored
      - Fix a bug that was introduced in the previous commit
      - If the Longest Common Subcluster is only one gene, and there are
      multiple 1-gene matches, choose the one with the highest number of
      domains
      3a8ba69f
    • Jorge Navarro Muñoz's avatar
      Small improvement in visualization: better alignment of some BGCs · 21e0c562
      Jorge Navarro Muñoz authored
      If a pair of BGCs from the same GCF (exemplar + some other member) don't
      show a good positional alignment (it can happen when one of the BGCs is
      too short and there is a slight difference in domain content (perhaps
      one domain was not detected) so the alignment length from the LCS
      algorithm would be zero. For these (somewhat rare) cases, the pair of
      BGCs would be aligned using the first gene from each. Now, choose the
      gene with the most domain content
      21e0c562
    • Jorge Navarro Muñoz's avatar
      Fix bug from previous commit · b8a0b4a4
      Jorge Navarro Muñoz authored
      If an overlapping CDS appeared more than once, it would be marked for
      deletion more than once as well. Changed the list of CDS to be deleted
      for a set.
      b8a0b4a4
  31. 08 Feb, 2018 1 commit
  32. 07 Feb, 2018 1 commit
    • Jorge Navarro Muñoz's avatar
      Partial fix for failure when using BGCs with splicing events *WARNING* · 828361b6
      Jorge Navarro Muñoz authored
      If two or more CDSs share the same locus_tag (or gene_id if no locus_tag is
      present), BiG-SCAPE will only keep the longest CDS. This has some consequences:
      - If the user is re-using data that contains such splicing events (MIBiG
      dataset or eukaryotic BGCs), she will have to delete previous data and start a
      fresh run.
      - This will still cause trouble when dealing with GenBank files whose CDS
      features don't contain the locus_tag or gene_id qualifiers. A further fix will
      be implemented for that case
      - As the original GenBank file is used to draw the SVG figure, it will still
      contain all CDSs (but the ones not used will not show any domains)
      828361b6