Skip to content

scbirlab/vectome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

158 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 vectome

GitHub Workflow Status (with branch) PyPI - Python Version PyPI

vectome is a python package for deterministic vectorization of genomes.

Installation

The easy way

You can install the precompiled version directly using pip.

$ pip install vectome

From source

Clone the repository, then cd into it. Then run:

$ pip install -e .

Command-line interface

vectome has a command-line interface.

$ vectome --help

You can generate vector embeddings by species / strain name or taxon ID.

$ vectome embed <(printf "Mycobacterium tuberculosis\n83333\nEscherichia coli CFT073")

The resulting vectors are based on MinHash sketches from sourmash, then folded into a 4096-vector using the CountSketch method. You can make a shorter vector using e.g. -n 1024.

You can also deterministically project into a dense vector.

$ vectome embed <(printf "Mycobacterium tuberculosis H37Rv") --projection 16

Change the seed with e.g. --seed 0.

If you need a more interpretable vector, you can generate one based on Jaccard distances to landmark species.

$ vectome embed <(printf "Mycobacterium tuberculosis H37Rv") --method landmark

Several landmark groups are available. You can set the group with --group 0, and get information about each one with vectome info.

$ vectome info
vectome version 0.0.1:
        group-0: {'landmarks': 113, 'manifest file': '.../vectome/vectome/data/landmarks/group-0/manifest.json', 'built': True}
        group-1: {'landmarks': 4, 'manifest file': '.../vectome/vectome/data/landmarks/group-1/manifest.json', 'built': True}
        group-2: {'landmarks': 1, 'manifest file': '.../vectome/vectome/data/landmarks/group-2/manifest.json', 'built': False}
        meta: {'cache location': '.../vectome/vectome/data/landmarks', 'cache exists': True}

Issues, problems, suggestions

Add to the issue tracker.

Documentation

(To come at ReadTheDocs.)

About

🧬 Deterministic vector embedding of genomes.

Resources

License

Stars

Watchers

Forks

Contributors

Languages

Generated from scbirlab/py-template