BioSymphony GeneCluster is the control plane that lets your agent find biosynthetic gene clusters and assemble pathway evidence across genomes and transcriptomes. It decides which evidence route is defensible, prepares run contracts, validates outputs, and turns tool results into a reviewable evidence package. The same artifact contracts work whether a solo agent is running on a laptop or a multi-agent Linear DAG is fanning out across cloud GPUs.
genecluster_campaign_preflight.pyranks data readiness, relevance, novelty, and seed-query maturity.genecluster_species_scout.pysearches for plausible comparator species across NCBI, SRA, NGDC/GWH, KEGG hints, and local catalog memory.genecluster_source_scout.pyturns source availability into route-readable ledgers.genecluster_annotation_scout.pychooses annotation-direct, transcript-first, genome-context, synteny, transcriptome-only, rescue, or next-experiment-design routes.genecluster_preflight.pyvalidates manifests, ledgers, launch bundles, route claims, and generated artifacts.
- Candidate search: BLAST-style anchors, MMseqs2 iterative search, Foldseek/ProstT5 structural similarity.
- Genome context: GFF/proteome parsing, neighborhood extraction, Pfam and SwissProt annotation, coordinate-aware cluster windows.
- BGC callers: plantiSMASH, antiSMASH, DeepBGC, MIBiG cross-reference, and cblaster/clinker re-entry recipes.
- Comparative genomics: JCVI MCScan, synteny/dotplot outputs, OrthoFinder/GENESPACE-style normalized contracts.
- Enzyme/function: P450Rdb, KEGG/KAAS, EnzymeMap, DiffPaSS, DeepEC/ECPred, HIT-EC/CLEAN re-entry paths.
- Reporting: Quarto books, Cytoscape.js pathway viewers, igv-reports, pyGenomeTracks, workbook postprocessing.
- Local contracts for cheap validation and dry runs.
- Docker build contexts for GeneCluster runner images.
- Cloud-portable dispatch templates for RunPod, AWS, GCP, Vast.ai, and Lambda Labs.
- Provider handoff manifests that keep heavy compute outside source control while preserving versions, hashes, and expected outputs.
A mature run should produce:
source-ledger.tsvquery-resolution-ledger.tsvroute-decision.jsoncluster_calls.tsvbgc_consensus.tsvprotein_function_votes.tsvprotein_function_jury.tsvcomparative_atlas/review_surface_manifest.jsonclaim-ledger.tsv- Quarto HTML/PDF review surfaces
The current inventory is in biosymphony-tooling-status.md: 25 validated tools, 3 parked tools with re-entry recipes, 8 shelved-but-testable tools, and 2 gated tools with alternatives.
Use skills/genecluster-superpowers/SKILL.md when extending the atlas with a new tool so the work starts from the existing validation record instead of re-running discovery.