Vocabulary + SHACL shapes for representing VCF files, headers, records, and per-sample calls in RDF.
This repository is intentionally VCF-centric (file + header metadata + row/call provenance), and is designed to link out to established ontologies for representing the sequence alteration itself.
- VCF is the de-facto interchange format for variant catalogs.
- Existing semantic models (e.g., SB/gvar) focus on variants as Linked Data, not a complete RDF rendering of VCF files.
We therefore model the VCF artifact, header lines, and call-level fields here, and enable alignment to SB/gvar (and optionally HERO).
Target persistent namespace:
https://w3id.org/vcf-rdfizer/vocab#
(You can use it immediately; later you can register it via w3id.org and configure redirects.)
Recommended base for VCF instance resources:
file://{vcfFilePath}
Recommended templates (also formalized in ontology via vcfr:iriTemplate):
VCFFile file://{vcfFilePath}
VCFHeader file://{vcfFilePath}#header
HeaderLine file://{vcfFilePath}#header/line/{lineId}
VCFRecord file://{vcfFilePath}#record/{recordId}
VariantCall file://{vcfFilePath}#call/{recordId}
SampleCall file://{vcfFilePath}#sample/{recordId}/{sampleId}
InfoFieldValue file://{vcfFilePath}#call/{recordId}/info/{fieldKey}
FormatFieldValue file://{vcfFilePath}#sample/{recordId}/{sampleId}/fmt/{fieldKey}
vcfr:VCFFile– a VCF file artifact (a dataset distribution)vcfr:VCFHeader– container for header lines- Header line types (subclasses of
vcfr:HeaderLine):vcfr:FileFormatHeaderLinefor##fileformatvcfr:FileDateHeaderLinefor##fileDatevcfr:SourceHeaderLinefor##sourcevcfr:ReferenceHeaderLinefor##referencevcfr:ContigHeaderLinefor##contigvcfr:INFOHeaderLinefor##INFO=<...>vcfr:FORMATHeaderLinefor##FORMAT=<...>vcfr:FILTERHeaderLinefor##FILTER=<...>vcfr:ALTHeaderLinefor##ALT=<...>
vcfr:VCFRecord– one row of a VCF (variant observation statement)vcfr:VariantCall– call-level representation (QUAL/FILTER/INFO/FORMAT + sample calls)vcfr:SampleCall– per-sample call values (GT/DP/AD/…)
This vocabulary:
- can link a
vcfr:VCFRecord/vcfr:VariantCallto SB/gvar’sso:0001059(SequenceAlteration) representation usingvcfr:asSequenceAlteration.
- Missing VCF token
.is modeled as a typed literal:"."^^vcfr:Null. - This avoids using plain
"."^^xsd:stringand keeps missingness explicit in RDF.
SB/gvar reference:
- Docs: https://swat4hcls-2025-genomic-variation.github.io/genomic-variant-schema/
- Schema source: https://github.com/swat4hcls-2025-genomic-variation/genomic-variant-schema/blob/main/gvar-schema.yaml
SHACL shapes are provided in shacl/vcf-rdfizer-vocabulary.shacl.ttl.
- Landing page (GitHub Pages): docs/index.html
- Vocabulary reference (classes, properties, external alignments): docs/ontology-reference.html
- Per-term HTML documentation pages: docs/terms/index.html
- Interactive relationship diagram: docs/ontology-graph.html
- Serialized graph data: docs/assets/ontology-graph-data.json
- Serialized relationship overview: docs/assets/ontology-relationships-overview.json
- Static graph export (SVG): docs/assets/ontology-graph-static.svg
Graph and export files are generated from ontology/vcf-rdfizer-vocabulary.ttl by scripts/build-ontology-graph-data.mjs and scripts/export-ontology-graph-svg.mjs (both invoked by scripts/sync-docs-assets.sh).
Per-term HTML pages are generated by scripts/build-term-pages.mjs (also invoked by scripts/sync-docs-assets.sh).
The formatted example graph (examples/example.ttl) is generated from examples/example.nt by scripts/convert-example-nt-to-ttl.mjs (also invoked by scripts/sync-docs-assets.sh).
See:
examples/example-headers.ttlexamples/example-minimal-record.ttlexamples/example.ttl(formatted fromexample.nt)examples/example.ntexamples/example.vcf
- Host this repo with GitHub/GitLab pages for HTML docs (optional).
- Register w3id redirect:
- Desired path:
/vcf-rdfizer/ - Redirect to your hosted ontology + docs.
- Desired path:
- CC BY 4.0 (see LICENSE)