Skip to content

[feature] Evaluate sharded hashing through kun-peng #38

@cdiener

Description

@cdiener

The memory usage is becoming an ongoing issue especially with the newer databases and the self classification step during DB construction. Once recent strategy is the implementation by sharded hashes in Kun-Peng:

https://github.com/eric9n/Kun-peng

https://www.biorxiv.org/content/10.1101/2024.12.19.629356v1

Some initial tests looked good and the unique syncmer assignment also improved the MEDI classifications a bit. This issue tracks the addition of sharded hashing into the workflow.

Open Steps

  • change download scripts to also download the decoys previously contained in the Kraken2 standard DB (bacteria, archaea, virus, plasmid, vectors)
  • figure out how to layout the database to make it work with Kun-Peng
  • benchmark the DB constructions (slower with Kun-Peng)
  • decide on the level of fragmentation (shard size)
  • benchmark the classification speed with the sharded hash

Metadata

Metadata

Assignees

No one assigned

    Labels

    kraken2Issue related to bugs/instabilities in Kraken2.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions