Skip to content
Snippets Groups Projects
To find the state of this project's repository at the time of any of these versions, check out the tags.

All notable changes to Pantools will be documented in this file.

[UNRELEASED]

Added

  • Available resources are now added to the log files and validated for some processes (!224 (merged)).
  • Added error catching for a set of pangenome node parameters that can cause issues down the line (!225 (merged)).
  • Added check for whether MCScanX is installed before running pantools calculate_synteny (!244 (merged)).
  • Added parsing of location information from functional annotation GFF files and write to graph (!245 (merged)).

Changed

  • No longer accept tab characters in fasta input files (!232 (merged)).

Fixed

  • pantools blast no longer crashes if not all mrna nodes contain protein IDs (!226 (merged)).
  • pantools calculate_synteny now gives an exception if MCScanX runs out of memory (!226 (merged)).
  • pantools sequence_visualization now requires the "sequence" rule to be set for unphased pangenomes (!226 (merged)).
  • Fixed a number of bugs causing grouping results to slightly differ based on the order of genome or proteome input files (!223 (merged)).
  • Fixed pantools blast for TBLASTN and TBLASTX where the wrong output file was written (!241 (merged)).
  • Fixed pantools add_functions when protein names have special characters (!242 (merged)).
  • Fixed the KMC command needed for pantools add_genomes (!243 (merged)).

[4.3.1] - 08-12-2023

Changed

  • pantools pangenome_structure received a huge speed improvement and restored randomization (!216 (merged)).

Fixed

  • Fixed a bug where variable/informative positions were incomplete, affecting PanVA instances (!212 (merged)).
  • Fixed a bug that caused the pangenome growth curves to not show any randomization (!216 (merged)).

[4.3.0] - 23-11-2023

Added

  • pantools add_phasing for adding phasing information about a genome to the pangenome database (!148 (merged)).
  • pantools add_repeats and pantools repeat_overview for incorporating repeats in the pangenome database (!148 (merged)).
  • pantools calculate_synteny, pantools add_synteny, pantools synteny_overview for incorporating synteny in the pangenome database (!148 (merged)).
  • pantools calculate_dn_ds for calculating dn/ds values in a pangenome database (!148 (merged)).
  • pantools gene_retention and pantools sequence_visualization for visualizing gene retention and other visualizations of sequences in the pangenome (!148 (merged)).
  • pantools blast for doing a BLAST search against a pangenome database (!148 (merged)).
  • Added pal2nal, paml and r-cowplot dependencies to conda_linux.yaml and conda_macos.yaml (!148 (merged)).

Changed

  • pantools busco_protein, pantools gene_classification, pantools kmer_classification, pantools group_info, pantools rename_phylogeny, pantools create_tree_template, pantools core_phylogeny, pantools rename_matrix to be compatible with possible added phasing information in a pangenome database (!148 (merged)).
  • busco_protein is no longer functional with BUSCO v3 and odb9 datasets (!146 (merged)).
  • conda_linux.yaml and conda_macos.yaml now unified in one conda.yaml file (!203 (merged)).
  • Updated KMC version to >=3.1.0 (!203 (merged)).
  • Increased performance for calculating var/inf positions in pantools msa (!202 (merged)).
  • move_grouping is now named deactivate_grouping to better reflect its function (!201 (merged)).
  • Reorganized the documentation for Read the Docs (!201 (merged)).

Fixed

  • pantools core_phylogeny and pantools consensus_tree now ignore MSAs that could not be trimmed (!197 (merged)).
  • pantools ani with mode is now case-insensitive (!200 (merged)).
  • Fixed a bug where single occurrence (variable) positions in an alignment were counted as informative (!202 (merged)).
  • Fix the seed option for pantools pangenome_structure for variants included (!205 (merged)).
  • Fix the order of the phenotype overview for consistency purposes (!206 (merged)).

[4.2.3] - 22-09-2023

Added

  • pantools pangenome_structure now has a --seed option to override the random seed (!179 (merged)).
  • pantools msa now has a --trim-using-proteins options for correct phylogenetics using nucleotide alignments (!195 (merged)).

Changed

  • Pinned python to version 3.7 for CI/CD pipeline and highlighted this in developer docs (!183 (merged)).
  • pantools remove_functions now has --mode as optional argument instead of positional argument (!189 (merged)).
  • pantools core_phylogeny and pantools consensus_tree don't run MSA under the hood anymore to prevent unwanted behaviour (!195 (merged)).
  • pantools msa --align-nucleotide now aligns CDS sequences instead of full (unspliced) mRNA sequences (!195 (merged)).

Fixed

  • pantools add_pavs and pantools add_variants are now consistent in their creation of variant nodes (!176 (merged)).
  • pantools msa now writes non-redundant headers in sequences.info files (!180 (merged)).
  • Renamed remaining mentions of core_snp_tree to core_phylogeny, except the command alias (!181 (merged)).
  • Fixed the trimming length of protein sequences with pantools msa (!186 (merged)).
  • pantools group_info now correctly interprets signalP information (!190 (merged)).
  • Fixed kmc running on # cores regardless of given threads, added --threads option to pantools add_genomes (!191 (merged)).
  • Restored the possibility of trimming nucleotide sequences based on protein alignment (!195 (merged)).
  • Fixed distinct kmer count for a subset of genomes using pantools kmer_classification (!196 (merged)).

[4.2.2] - 23-06-2023

Changed

  • pantools group_info defaults to all homology groups (!166 (merged)).
  • pantools map no longer accepts a genome numbers file and uses --include/--exclude instead (!170 (merged)).

Fixed

  • pantools consensus_tree now properly checks if groups were excluded based on trimming (!168 (merged)).

[4.2.1] - 05-06-2023

Added

  • Check if bcftools and tabix are installed before running them (!159 (merged)).
  • Developer guide in the documentation (!160 (merged)).
  • Update to CI/CD pipeline and pre-commit hooks to make sure no warnings or errors are thrown for the documentation (!160 (merged)).

Changed

  • Simplified user guide for installing PanTools (!160 (merged)).
  • pantools group always requires to specify relaxation settings (!161 (merged)).

Fixed

  • Fixed issue where FastTree was told protein sequences are nucleotide sequences (!157 (merged)).

[4.2.0] - 08-05-2023

Added

  • New flag --ignore-invalid-features for add_annotations to ignore GFF features that do not match the fasta (!133 (merged)).
  • Subcommands for handling VCF and PAV information for genes in a pangenome: add_variants, add_pavs, remove_variants, remove_pavs and variation_overview (!128 (merged)).
  • Flag --variants to msa, core_phylogeny and consensus_tree for including consensus gene sequences for MSA (!128 (merged)).
  • Flag --pavs to gene_classification, pangenome_structure, msa, core_phylogeny and consensus_tree for indication presence/absence of genes (!128 (merged)).

Changed

  • Parameter -D for add_functions is changed to -F because of conflict with JVM settings (!129 (merged)).
  • Removed no longer used functional databases from code base (!129 (merged)).
  • add_annotations now validates all input files before adding annotations (!133 (merged)).
  • Changed the default behavior for interpreting GFF files without mRNA features for add_annotations (!132 (merged)).
  • Changed default behavior for msa: only one type of nucleotide/protein is aligned by default, depending on the database type (!142 (merged)).
  • The --phenotype argument in msa no longer requires an included value. Output is generated for all phenotype properties if the argument is included (!145 (merged)).

Fixed

  • msa is now backwards compatible with pangenome databases from before !60 (merged) (!134 (merged)).
  • The provided conda YAML files now work even when channel_priority: strict was set by the user (!135 (merged)).
  • Issue #54 (closed): functional annotations are now correctly linked with mRNA nodes for add_functions command (!136 (merged)).
  • Resolved issue in add_phenotype where all phenotypes values were recognized as a String (!145 (merged)).
  • Fixed issue where the ITOL tree templates from msa —method=multiple-groups did not match the gene trees (!145 (merged)).

Removed

  • MLSA no longer finds phenotype specific positions. This can still be done via msa (!145 (merged)).
  • Removed support for genbank input files from add_annotations (!133 (merged)).

[4.1.1] - 30-01-2023

Added

  • Added sphinx-lint to both pre-commit hooks and CI/CD pipeline to ensure correctness documentation (!112 (merged) !116 (merged)).
  • Option to add read group to alignment files produced by pantools map (!102 (merged)).
  • Added more BLOSUM and PAM protein scoring matrix options (!123 (merged)).

Changed

  • add_functions can now specify a directory where functional databases are stored (!117 (merged)).
  • Parameter -v is now required for change_grouping (!124 (merged)).
  • Scoring matrices are now stored in the resources directory as readable files (!123 (merged)).

Fixed

  • msa now works correctly with 'alt_id' properties of GO nodes (!118 (merged)).

[4.1.0] - 23-12-2022

Added

  • New log4j2 logger with console and file appender (!85 (merged)).
  • New globally accessible flags to regulate console logging output: silent, quiet, debug and trace (!85 (merged)).
  • New feature, export_pangenome, to export a pangenome to a number of files for comparison with other pangenomes (!108 (merged)).
  • Added CI validation test for build_pangenome and map_reads (!108 (merged)).

Changed

  • Optimized localization for build_pangenome by making localize_nodes() parallel; this code was sanity-checked with two small yeast datasets (!95 (merged)).
  • remove_phenotype was renamed to remove_phenotypes to be consistent with add_phenotypes (!111 (merged)).

Fixed

  • group_info now retrieves the correct homology groups with -H/-G (!110 (merged)).
  • Resolved issue where add_phenotypes incorrectly binned columns when not every value was numeric (!111 (merged)).

[4.0.0] - 21-12-2022

Added

  • CI/CD pipeline that for now only runs mvn test (merge request !41 (merged)).
  • It is now possible to only add a specific functional annotation with add_functions --label (merge request !51 (merged)).
  • Versioned documentation is included with readthedocs (merge request !50 (merged)).
  • New function remove_phenotype allows for removal of phenotype nodes or properties on these nodes (merge request !58 (merged)).
  • New package cli containing command line interface classes for each pantools subcommand (merge request !43 (merged)).
  • Picocli argument parsing in command line interface classes (merge request !43 (merged)).
  • New package cli.mixins for reusable picocli options --threads, --include and --exclude (merge request !43 (merged)).
  • Bean hibernate validation for command line arguments (merge request !43 (merged)).
  • Custom Bean validation constraints, validators and payloads in cli.validation, cli.validation.validators and cli.validation.payloads respectively (merge request !43 (merged)).
  • Renamed many option flags to more conventional naming practices (merge request !43 (merged)).
  • Required file parameters are now positional parameters and no longer have command line flags (merge request !43 (merged)).
  • New global option --manual opens the read the docs manual on local browser (merge request !43 (merged)).
  • New global options --force and --no-input to ignore user prompts (merge request !43 (merged)).
  • New function remove_functions allows removal off all function nodes and properties (merge request !43 (merged)).
  • New package for Junit5 unit tests (pantools.src.tests.java) (merge request !62 (merged)).
  • New Junit5 test class ProteomeLayerTest to test functions in ProteomeLayer.java (merge request !62 (merged)).
  • New custom assertion assertExits in ExitAssertions.java overriding System.exit for test purposes (merge request !62 (merged)).
  • New --node argument included for group_info (merge request !66 (merged), !69 (merged)).
  • New --node argument included for group_info (merge request !66 (merged)).
  • Added log4j2 dependencies and configuration options (merge request !68 (merged)).
  • New flags --debug and --quiet to manage console log levels (merge request !68 (merged)).
  • New Junit5 test class ConstraintTest to test custom bean constraints made for argument validation (merge request !74 (merged)).

Changed

  • htsjdk version has been updated to 2.24.1.
  • add_annotations uses htsjdk instead of a custom implementation for parsing gff3 (merge request !38 (merged)).
  • Updated ASTER commit to ASTER v1.3 and adjusted code accordingly (merge request !54 (merged)).
  • Improved recognition of file types in add_functions (merge request !55 (merged)).
  • Argument parsing and initial validation moved from Pantools.java to cli package (merge request !43 (merged)).
  • The default number of threads for functions that allow --threads is now the number of cores or 8, whichever is lower (merge request !43 (merged)).
  • Versioned documentation now matches the new command line interface (merge request !43 (merged)).
  • Function remove_nodes no longer removes function nodes in groups (see Added section) (merge request !43 (merged)).
  • Function remove_nodes now allows removal of all nodes, dangerous nodes require user confirmation (merge request !43 (merged)).
  • Split conda.yaml file in conda_linux.yml and conda_macos.yml (merge request !77 (merged)).
  • Removed ASTER submodule and add ASTER v1.3 to conda yml files (merge request !77 (merged)).
  • Grouping now requires setting the relaxation or all its sub-values (commit id 72bc5a8d).

Fixed

  • mlsa functionalities have been updated to work with the updated add_annotations implementation (merge request !44 (merged)).
  • Header names of kmer_classification_overview.csv are now correct (merge request !45 (merged)).
  • Translation of mRNA to protein is now correct (merge request !46 (merged)).
  • Inconsistency between the header and body of matrices generated in kmer_classification (merge request !61 (merged)).
  • add_annotations has an updated and more informative output (and runs faster on fragmented genomes) (merge request !60 (merged)).
  • add_antismash no longer crashes when identifiers of antiSMASH output do not match in the database (merge request !66 (merged)).
  • Resolved an issue where go_enrichment crashed if COG functions were included in the pangenome (merge request !66 (merged)).
  • Replaced method for Fisher exact test to correctly deal with larger values (merge request !66 (merged)).
  • go_enrichment no longer crashes when antiSMASH geneclusters were added to the pangenome (merge request !78 (merged)).
  • add_annotations now handles co-features correctly (merge request !79 (merged)).
  • Distinguish between homology groups as file or as list for msa, core_snp_tree and consensus_tree (merge request !86 (merged)).
  • build_panproteome no longer creates inconsistent 'header' and 'protein_ID' properties (merge request !89 (merged)).

[3.4.0] - 2022-05-04

Added

  • Version and commit ID are reported when PanTools is initialized.
  • --allow-polytomies argument for consensus_tree.
  • Included option to bin numerical values in add_phenotypes.

Changed

  • msa_group,msa_of_multiple_groups, msa_of_regions are reorganised in new function msa.
  • pangenome_structure uses a colorblindfriendly palette.
  • create_tree_templates uses a colorblindfriendly palette when using 8 phenotypes or less.
  • add_antismash now only works with Antismash versions >= 6.0.

Fixed

  • Changed the orientation of the 'has_busco' relationship

[3.3.0] - 2021-12-23

Changed

  • Migrate to Maven
  • Executable .jar file moved from pantools/dist to pantools/target

Fixed

  • Reading gzip-compressed input files

[3.2.0] - 2021-11-25

Added

  • busco_protein can now also use busco v5.
  • add_functions can now read custom functional annotation files
  • add_functions can now also read SignalP 5.0 output
  • consensus_tree new method to create a consensus tree by combining gene trees with ASTRAL-properties
  • reroot_phylogeny, new function that can reroot trees using the Ape package

Changed

  • The interpro.xml file must now first be downloaded manually when adding InterproScan output to the pangenome
  • Improved the k-mer classification method to scale to a higher number of genomes

[3.1.0] - 2021-03-31

Added

  • core_snp_tree can now be run on protein sequences.
  • group can be run using only the longest transcript of a gene
  • add_annotations is able to use GFF files that only have 'CDS' properties

Changed

  • --version argument instead --reference in change_grouping and remove_grouping.
  • File names and extension of several output files

Fixed

  • Including -raf with map no longer results in a crash
  • Single-end read mapping with alignment mode of 0 or higher no longer results in a crash
  • 'accessory_combinations.csv' no longer misses the first group for a genome combination
  • Several issues that caused incorrect SAM output in map
  • Removed code and tools that were used for development

[3.0.0] - 2021-03-08

Added

  • Gene classification
  • Functional annotations
  • Phylogentic methods
  • Optimal homology grouping using BUSCO

Changed

Fixed

  • Improved similarity calculation in homology grouping

[2.0.0] - 2019-10-11

Added

  • Read mapping functionality