You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The MartsAdapter relies on live HTTP queries to Ensembl BioMart, which is a known source of flakiness — server outages, rate limiting, and malformed responses (see #103). This is especially problematic for downstream tools like nf-core/epitopeprediction, where epaa.py depends on MartsAdapter for two calls:
get_transcript_information() — CDS, strand, gene name retrieval (blocking — no peptides without it)
get_protein_ids_from_transcripts() — transcript → protein/RefSeq/UniProt ID mapping (annotation)
While #103 proposes migrating to the Ensembl REST API, that still requires live network queries and is subject to rate limits (15 req/s).
Proposal
Add a PyEnsemblAdapter that implements ADBAdapter using PyEnsembl (by OpenVax). PyEnsembl downloads Ensembl GTF + FASTA files once and indexes them into a local SQLite database — all subsequent queries are entirely offline.
Problem
The
MartsAdapterrelies on live HTTP queries to Ensembl BioMart, which is a known source of flakiness — server outages, rate limiting, and malformed responses (see #103). This is especially problematic for downstream tools like nf-core/epitopeprediction, whereepaa.pydepends onMartsAdapterfor two calls:get_transcript_information()— CDS, strand, gene name retrieval (blocking — no peptides without it)get_protein_ids_from_transcripts()— transcript → protein/RefSeq/UniProt ID mapping (annotation)While #103 proposes migrating to the Ensembl REST API, that still requires live network queries and is subject to rate limits (15 req/s).
Proposal
Add a
PyEnsemblAdapterthat implementsADBAdapterusing PyEnsembl (by OpenVax). PyEnsembl downloads Ensembl GTF + FASTA files once and indexes them into a local SQLite database — all subsequent queries are entirely offline.This covers the critical path:
get_transcript_information()transcript.coding_sequence,transcript.strand,gene.nameget_transcript_sequence()transcript.coding_sequenceget_product_sequence()genome.protein_sequence()What it does not cover:
Advantages over REST/BioMart
MartsAdapterScope
A new
epytope/IO/PyEnsemblAdapter.pyimplementing the threeADBAdapterabstract methods:get_product_sequence()get_transcript_sequence()get_transcript_information()PyEnsembl would be an optional dependency (e.g.
pip install epytope[pyensembl]).Notes