vectome is a python package for deterministic vectorization of genomes.
You can install the precompiled version directly using pip.
$ pip install vectomeClone the repository, then cd into it. Then run:
$ pip install -e .vectome has a command-line interface.
$ vectome --helpYou can generate vector embeddings by species / strain name or taxon ID.
$ vectome embed <(printf "Mycobacterium tuberculosis\n83333\nEscherichia coli CFT073")The resulting vectors are based on MinHash sketches from sourmash, then folded into a 4096-vector using
the CountSketch method. You can make a shorter vector using e.g. -n 1024.
You can also deterministically project into a dense vector.
$ vectome embed <(printf "Mycobacterium tuberculosis H37Rv") --projection 16Change the seed with e.g. --seed 0.
If you need a more interpretable vector, you can generate one based on Jaccard distances to landmark species.
$ vectome embed <(printf "Mycobacterium tuberculosis H37Rv") --method landmarkSeveral landmark groups are available. You can set the group with --group 0, and
get information about each one with vectome info.
$ vectome info
vectome version 0.0.1:
group-0: {'landmarks': 113, 'manifest file': '.../vectome/vectome/data/landmarks/group-0/manifest.json', 'built': True}
group-1: {'landmarks': 4, 'manifest file': '.../vectome/vectome/data/landmarks/group-1/manifest.json', 'built': True}
group-2: {'landmarks': 1, 'manifest file': '.../vectome/vectome/data/landmarks/group-2/manifest.json', 'built': False}
meta: {'cache location': '.../vectome/vectome/data/landmarks', 'cache exists': True}Add to the issue tracker.
(To come at ReadTheDocs.)