feat(g2p): add gget g2p module for the Genomics 2 Proteins portal (#138) by Elarwei001 · Pull Request #220 · scverse/gget

Elarwei001 · 2026-06-16T13:14:58Z

Summary

Resolves #138. Adds a new module gget g2p that queries the Genomics 2 Proteins (G2P) portal (Broad Institute; Kwon, Safer, Nguyen et al., Nature Methods 2024) to link genes/proteins to residue-level structural and functional annotations.

The G2P REST API serves tab-separated values, which the module parses into a pandas DataFrame — fitting gget's "query a database → DataFrame in one line" idiom (cf. gget pdb, gget bgee).

What it does

gget.g2p(gene, uniprot_id, resource="features"|"map"|"alignment", isoform=None, save=False, verbose=True)

features (default): per-residue feature table (AlphaFold pLDDT, UniProt sites, secondary structure, predicted pockets, PTMs, …) — 140+ columns.
map: gene → transcript → protein isoform → structure map (UniProt / Ensembl / RefSeq / PDB identifiers).
alignment: residue-level sequence alignment between two isoforms (requires isoform; uniprot_id is the canonical isoform).

gget g2p BRCA1 -u P38398                 # per-residue features (JSON)
gget g2p BRCA1 -u P38398 -r map --csv    # isoform/structure map (CSV)
gget g2p LDLR -u P01130-1 -r alignment -i P01130-2

Changes

gget/gget_g2p.py — new module (direct REST via requests, TSV → DataFrame; no new heavy dependency, does not vendor the g2papi client).
gget/main.py — parser_g2p + dispatch (positional gene, -u/--uniprot_id, -r/--resource, -i/--isoform, -o/--out, -csv, -q).
gget/__init__.py — export g2p; gget/constants.py — G2P_API.
tests/test_g2p.py — live integration tests (assert on stable columns / identifiers, since the feature table is wide and its values can change) + network-free argument-validation tests.
docs/src/en/g2p.md + docs/src/en/updates.md.

Testing

All tests pass locally (Python 3.11), and the CLI was exercised for all three resources:

$ pytest tests/test_g2p.py -v
...
6 passed

Notes

Endpoints/fields were confirmed against the live G2P API (the g2papi README suggested JSON, but the API actually returns TSV — verified and handled accordingly).
uniprot_id is required; the help text and error message point users to gget info to find it. Auto-resolving UniProt IDs from a gene symbol could be a follow-up.
Rate limits are undocumented; the module makes a single request per call.

New module querying the Genomics 2 Proteins (G2P) portal (https://g2p.broadinstitute.org/) for residue-level protein structure/function annotations. The API serves TSV, parsed into a pandas DataFrame. - gget.g2p(gene, uniprot_id, resource='features'|'map'|'alignment', isoform=None, save=False, verbose=True) - 'features': per-residue table (AlphaFold pLDDT, UniProt sites, pockets, PTMs); 'map': gene->transcript->isoform->structure identifiers; 'alignment': residue-level isoform alignment - CLI parser + dispatch in main.py, export in __init__.py, G2P_API in constants.py - Tests (live integration + network-free validation) + docs (g2p.md, updates.md) Resolves scverse#138 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Dev -> main: scverse packaging modernization (#215), gget g2p module (#220), scverse URL migration (#219), tests/coverage badges, CI consolidation (single pytest_results.txt, dynamic latest-Python gating), gget search NaN→None normalization, gget mutate pyarrow-empty-slice guard, scanpy>=1.10 pin in the cellxgene extra, gdrive backup overwrite, and updates.md notes. Conflicts resolved: - README.md: kept dev's dynamic tests/coverage badges; dropped the stale static `Coverage-83%` badge. - docs/src/en/introduction.md: removed duplicate `# Welcome!` heading and whitespace-only conflict block. - docs/src/es/introduction.md: same — removed duplicate `# ¡Bienvenidos!`. - tests/pytest_results_py3.12.txt: accepted dev's delete (superseded by tests/pytest_results.txt under the new CI consolidation).

PR #220 inserted a new import block before the existing alphabetical list without removing the originals further down, leaving six modules (alphafold/archs4/enrichr/gpt/pdb/setup) imported twice. Harmless at runtime (Python deduplicates), but trips ruff F811 once the pre-commit hooks land. Restore alphabetical order with g2p in its slot.

Elarwei001 and others added 2 commits June 16, 2026 21:13

Merge branch 'dev' into feature/g2p-module

f06d6c7

lauraluebbert merged commit cd89d5c into scverse:dev Jun 21, 2026
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(g2p): add gget g2p module for the Genomics 2 Proteins portal (#138)#220

feat(g2p): add gget g2p module for the Genomics 2 Proteins portal (#138)#220
lauraluebbert merged 2 commits into
scverse:devfrom
Elarwei001:feature/g2p-module

Elarwei001 commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Elarwei001 commented Jun 16, 2026

Summary

What it does

Changes

Testing

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants