Skip to content

Unexpected grouping behavior

We identified a strange behavior in grouping: when grouping proteins with relaxation settings 3, 5, 7 (in that order) and then grouping with relaxation setting 4, one gets a different result than starting with a clean pangenome/panproteome and running grouping at relaxation setting 4.

It turns out this potentially has to do with the fact that is_similar_to relationships are never removed from the graph. More of these relationships exist at more relaxed grouping settings, so if one were to group at more strict settings after using more relaxed settings, one unknowingly uses the intersection rate and similarity threshold of the more relaxed settings instead. Importantly, the clustering settings should not be affected. Optimal grouping is also not affected as it goes from strict to relaxed.