Skip to content

Improving kmer index I/O speed by reading into memory

Since the size of the kmer index is not too large (typically less than 10GB and for huge pangenomes still <100GB), let's try to read the entire index into memory instead of relying on I/O.

Overview of the indices in order of size:

  • [UNCHANGED] Prefix index from KMC stores kmer prefixes, this was already stored in memory since it is a tiny index.
  • [CHANGED] Suffix index from KMC stores kmer suffices, this was a read-only MappedByteBuffer but is now a read-only ByteBuffer which is loaded fully into memory.
  • [CHANGED] Pointer index from PanTools build_pangenome which stores the kmer -> node relationships, this was a read/write MappedByteBuffer but is now a read/write MappedByteBuffer which is fully loaded into memory.

Possible foreseen issues:

  • Not enough memory to store the indices into memory. In principle the indices should be loaded when opening a connection to a database and closed at the very end; therefore, the out of memory exception by the index not fitting into memory can be caught before any crucial things happen.
Edited by Workum, Dirk-Jan van

Merge request reports