Skip to content

replace manual gff parsing by htsjdk gff parsing for add_annotations

Workum, Dirk-Jan van requested to merge add_gff_parser into master

Essentially what title says. This means updating htsjdk to 2.24.1 and replacing the complete old gff parser.

Some points BEFORE we merge this:

  • we should remove the old (commented) code; I left this there for testing, but should be removed once it is final
  • we should extensively check all downstream functions
  • we should check in the neo4j browser whether it is identical (one thing I can already say that is different is the way attributes are displayed: with [] around them; this is because only the ID in a gff attribute is unique, everything else can be present multiple times in principle...)
  • add informative print statements
  • look at why "Protein ID = asdf is not started/ended with start/stop codon." is given for all proteins
  • add informative information to the two log files (pangenome_DB/annotation_overview.txt and pangenome_DB/log/annotation.log)
Edited by Jonkheer, Eef

Merge request reports