Skip to content

feat(g2p): add gget g2p module for the Genomics 2 Proteins portal (#138)#220

Merged
lauraluebbert merged 2 commits into
scverse:devfrom
Elarwei001:feature/g2p-module
Jun 21, 2026
Merged

feat(g2p): add gget g2p module for the Genomics 2 Proteins portal (#138)#220
lauraluebbert merged 2 commits into
scverse:devfrom
Elarwei001:feature/g2p-module

Conversation

@Elarwei001

Copy link
Copy Markdown
Contributor

Summary

Resolves #138. Adds a new module gget g2p that queries the Genomics 2 Proteins (G2P) portal (Broad Institute; Kwon, Safer, Nguyen et al., Nature Methods 2024) to link genes/proteins to residue-level structural and functional annotations.

The G2P REST API serves tab-separated values, which the module parses into a pandas DataFrame — fitting gget's "query a database → DataFrame in one line" idiom (cf. gget pdb, gget bgee).

What it does

gget.g2p(gene, uniprot_id, resource="features"|"map"|"alignment", isoform=None, save=False, verbose=True)

  • features (default): per-residue feature table (AlphaFold pLDDT, UniProt sites, secondary structure, predicted pockets, PTMs, …) — 140+ columns.
  • map: gene → transcript → protein isoform → structure map (UniProt / Ensembl / RefSeq / PDB identifiers).
  • alignment: residue-level sequence alignment between two isoforms (requires isoform; uniprot_id is the canonical isoform).
gget g2p BRCA1 -u P38398                 # per-residue features (JSON)
gget g2p BRCA1 -u P38398 -r map --csv    # isoform/structure map (CSV)
gget g2p LDLR -u P01130-1 -r alignment -i P01130-2

Changes

  • gget/gget_g2p.py — new module (direct REST via requests, TSV → DataFrame; no new heavy dependency, does not vendor the g2papi client).
  • gget/main.pyparser_g2p + dispatch (positional gene, -u/--uniprot_id, -r/--resource, -i/--isoform, -o/--out, -csv, -q).
  • gget/__init__.py — export g2p; gget/constants.pyG2P_API.
  • tests/test_g2p.py — live integration tests (assert on stable columns / identifiers, since the feature table is wide and its values can change) + network-free argument-validation tests.
  • docs/src/en/g2p.md + docs/src/en/updates.md.

Testing

All tests pass locally (Python 3.11), and the CLI was exercised for all three resources:

$ pytest tests/test_g2p.py -v
...
6 passed

Notes

  • Endpoints/fields were confirmed against the live G2P API (the g2papi README suggested JSON, but the API actually returns TSV — verified and handled accordingly).
  • uniprot_id is required; the help text and error message point users to gget info to find it. Auto-resolving UniProt IDs from a gene symbol could be a follow-up.
  • Rate limits are undocumented; the module makes a single request per call.

Elarwei001 and others added 2 commits June 16, 2026 21:13
New module querying the Genomics 2 Proteins (G2P) portal (https://g2p.broadinstitute.org/) for residue-level protein structure/function annotations. The API serves TSV, parsed into a pandas DataFrame.

- gget.g2p(gene, uniprot_id, resource='features'|'map'|'alignment', isoform=None, save=False, verbose=True)
- 'features': per-residue table (AlphaFold pLDDT, UniProt sites, pockets, PTMs); 'map': gene->transcript->isoform->structure identifiers; 'alignment': residue-level isoform alignment
- CLI parser + dispatch in main.py, export in __init__.py, G2P_API in constants.py
- Tests (live integration + network-free validation) + docs (g2p.md, updates.md)

Resolves scverse#138

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lauraluebbert lauraluebbert merged commit cd89d5c into scverse:dev Jun 21, 2026
1 of 4 checks passed
lauraluebbert added a commit that referenced this pull request Jun 21, 2026
Dev -> main: scverse packaging modernization (#215), gget g2p module (#220),
scverse URL migration (#219), tests/coverage badges, CI consolidation
(single pytest_results.txt, dynamic latest-Python gating), gget search NaN→None
normalization, gget mutate pyarrow-empty-slice guard, scanpy>=1.10 pin in the
cellxgene extra, gdrive backup overwrite, and updates.md notes.

Conflicts resolved:
- README.md: kept dev's dynamic tests/coverage badges; dropped the stale
  static `Coverage-83%` badge.
- docs/src/en/introduction.md: removed duplicate `# Welcome!` heading and
  whitespace-only conflict block.
- docs/src/es/introduction.md: same — removed duplicate `# ¡Bienvenidos!`.
- tests/pytest_results_py3.12.txt: accepted dev's delete (superseded by
  tests/pytest_results.txt under the new CI consolidation).
lauraluebbert added a commit that referenced this pull request Jun 21, 2026
PR #220 inserted a new import block before the existing alphabetical list
without removing the originals further down, leaving six modules
(alphafold/archs4/enrichr/gpt/pdb/setup) imported twice. Harmless at
runtime (Python deduplicates), but trips ruff F811 once the pre-commit
hooks land. Restore alphabetical order with g2p in its slot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants