Skip to content

Fix download.yaml: metatraits GDrive entries + comment out inactive sources#528

Merged
realmarcin merged 4 commits intomasterfrom
download-yaml-inactive-sources
Mar 20, 2026
Merged

Fix download.yaml: metatraits GDrive entries + comment out inactive sources#528
realmarcin merged 4 commits intomasterfrom
download-yaml-inactive-sources

Conversation

@crocodile27
Copy link
Copy Markdown
Collaborator

@crocodile27 crocodile27 commented Mar 19, 2026

Summary

  • Metatraits fix: Replace two duplicate placeholder blocks (gdrive:<metatraits_folder_id> / metatraits.tsv) and a manual-placement comment with the three actual GDrive entries recovered from commit e7e2508 on metatraits-isolated:
    • metatraits/ncbi_species_summary.jsonl.gz (GDrive ID: 1vL9wujvty4Xh2ZG2HHquX0RE7968Dkbm)
    • metatraits/ncbi_genus_summary.jsonl.gz (GDrive ID: 1WL6VhD2I6MGuI3iHKbCQYTncG1UB-u6e)
    • metatraits/ncbi_family_summary.jsonl.gz (GDrive ID: 1pgOugA5jQG96GbCktkTJN9FGn6rypTOI)
  • Inactive sources: Comment out download entries for transforms not active in DATA_SOURCES: uniprot_proteomes.tar.gz, uniprot_human.tar.gz, CTD_chemicals_diseases.tsv.gz, fermentation_explorer.csv, disbiome.json, wallen_etal.xlsx
  • upa.owl remains active (used by OntologiesTransform)

Test results

  • 128 passed, 25 skipped, 5 warnings in 13.81s (poetry run pytest tests/ -q)
  • poetry run kg download runs without errors
  • data/raw/metatraits/ is populated with the three JSONL files after download (requires GDrive access via antheaguo@berkeley.edu)
  • Inactive sources are no longer downloaded

🤖 Generated with Claude Code

…e sources

- Replace duplicate metatraits placeholder blocks with the three actual GDrive
  entries for ncbi_species_summary.jsonl.gz, ncbi_genus_summary.jsonl.gz, and
  ncbi_family_summary.jsonl.gz (recovered from commit e7e2508)
- Comment out download entries for inactive transforms: uniprot_proteomes,
  uniprot_human, CTD, fermentation_explorer, disbiome, wallen_etal
- upa.owl (used by OntologiesTransform) remains active

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the project’s download.yaml configuration to align downloads with the currently active transforms and to restore the correct MetaTraits Google Drive file references.

Changes:

  • Replaced placeholder/duplicate MetaTraits Google Drive entries with three concrete GDrive IDs for species/genus/family JSONL summaries.
  • Commented out download entries for sources whose transforms are currently not enabled in DATA_SOURCES.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread download.yaml Outdated
@turbomam turbomam requested review from realmarcin and turbomam March 19, 2026 20:06
crocodile27 and others added 2 commits March 19, 2026 13:14
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The download.yaml entries use local_name: metatraits/ncbi_*_summary.jsonl.gz,
placing files in data/raw/metatraits/. Update the transform's input_base to
append the metatraits/ subdir, and update the test and runbook accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@crocodile27 crocodile27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in new commit " Fix metatraits transform to l
ook in data/raw/metatraits/ subdirectory"

Copy link
Copy Markdown
Collaborator

@realmarcin realmarcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more things are commented out in download.yaml now -- some are for the biomedical version of the transform. but its OK to have this smaller download set as the default, can revisit on next release when multiple KGs are built.

@realmarcin realmarcin merged commit ec1cc22 into master Mar 20, 2026
3 checks passed
@realmarcin realmarcin deleted the download-yaml-inactive-sources branch March 20, 2026 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants