Improving kmer index I/O speed by reading into memory
Since the size of the kmer index is not too large (typically less than 10GB and for huge pangenomes still <100GB), let's try to read the entire index into memory instead of relying on I/O.
Overview of the indices in order of size:
- [UNCHANGED] Prefix index from KMC stores kmer prefixes, this was already stored in memory since it is a tiny index.
- [CHANGED] Suffix index from KMC stores kmer suffices, this was a read-only
MappedByteBuffer
but is now a read-onlyByteBuffer
which is loaded fully into memory. - [CHANGED] Pointer index from PanTools
build_pangenome
which stores the kmer -> node relationships, this was a read/writeMappedByteBuffer
but is now a read/writeMappedByteBuffer
which is fully loaded into memory.
Possible foreseen issues:
- Not enough memory to store the indices into memory. In principle the indices should be loaded when opening a connection to a database and closed at the very end; therefore, the out of memory exception by the index not fitting into memory can be caught before any crucial things happen.
Edited by Workum, Dirk-Jan van