replace manual gff parsing by htsjdk gff parsing for add_annotations
Essentially what title says. This means updating htsjdk to 2.24.1 and replacing the complete old gff parser.
Some points BEFORE we merge this:
-
we should remove the old (commented) code; I left this there for testing, but should be removed once it is final -
we should extensively check all downstream functions -
we should check in the neo4j browser whether it is identical (one thing I can already say that is different is the way attributes are displayed: with [] around them; this is because only the ID in a gff attribute is unique, everything else can be present multiple times in principle...) -
add informative print statements -
look at why "Protein ID = asdf is not started/ended with start/stop codon." is given for all proteins -
add informative information to the two log files (pangenome_DB/annotation_overview.txt and pangenome_DB/log/annotation.log)
Edited by Jonkheer, Eef