Fix download.yaml: metatraits GDrive entries + comment out inactive sources#528
Merged
realmarcin merged 4 commits intomasterfrom Mar 20, 2026
Merged
Fix download.yaml: metatraits GDrive entries + comment out inactive sources#528realmarcin merged 4 commits intomasterfrom
realmarcin merged 4 commits intomasterfrom
Conversation
…e sources - Replace duplicate metatraits placeholder blocks with the three actual GDrive entries for ncbi_species_summary.jsonl.gz, ncbi_genus_summary.jsonl.gz, and ncbi_family_summary.jsonl.gz (recovered from commit e7e2508) - Comment out download entries for inactive transforms: uniprot_proteomes, uniprot_human, CTD, fermentation_explorer, disbiome, wallen_etal - upa.owl (used by OntologiesTransform) remains active Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the project’s download.yaml configuration to align downloads with the currently active transforms and to restore the correct MetaTraits Google Drive file references.
Changes:
- Replaced placeholder/duplicate MetaTraits Google Drive entries with three concrete GDrive IDs for species/genus/family JSONL summaries.
- Commented out download entries for sources whose transforms are currently not enabled in
DATA_SOURCES.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The download.yaml entries use local_name: metatraits/ncbi_*_summary.jsonl.gz, placing files in data/raw/metatraits/. Update the transform's input_base to append the metatraits/ subdir, and update the test and runbook accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
crocodile27
commented
Mar 19, 2026
Collaborator
Author
crocodile27
left a comment
There was a problem hiding this comment.
Fixed in new commit " Fix metatraits transform to l
ook in data/raw/metatraits/ subdirectory"
realmarcin
approved these changes
Mar 20, 2026
Collaborator
realmarcin
left a comment
There was a problem hiding this comment.
a few more things are commented out in download.yaml now -- some are for the biomedical version of the transform. but its OK to have this smaller download set as the default, can revisit on next release when multiple KGs are built.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gdrive:<metatraits_folder_id>/metatraits.tsv) and a manual-placement comment with the three actual GDrive entries recovered from commite7e2508onmetatraits-isolated:metatraits/ncbi_species_summary.jsonl.gz(GDrive ID:1vL9wujvty4Xh2ZG2HHquX0RE7968Dkbm)metatraits/ncbi_genus_summary.jsonl.gz(GDrive ID:1WL6VhD2I6MGuI3iHKbCQYTncG1UB-u6e)metatraits/ncbi_family_summary.jsonl.gz(GDrive ID:1pgOugA5jQG96GbCktkTJN9FGn6rypTOI)DATA_SOURCES:uniprot_proteomes.tar.gz,uniprot_human.tar.gz,CTD_chemicals_diseases.tsv.gz,fermentation_explorer.csv,disbiome.json,wallen_etal.xlsxupa.owlremains active (used by OntologiesTransform)Test results
poetry run pytest tests/ -q)poetry run kg downloadruns without errorsdata/raw/metatraits/is populated with the three JSONL files after download (requires GDrive access via antheaguo@berkeley.edu)🤖 Generated with Claude Code