Skip to content

ecrum19/VCF-RDFizer-vocabulary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCF-RDFizer-vocabulary

Vocabulary + SHACL shapes for representing VCF files, headers, records, and per-sample calls in RDF.

This repository is intentionally VCF-centric (file + header metadata + row/call provenance), and is designed to link out to established ontologies for representing the sequence alteration itself.

Why this exists

  • VCF is the de-facto interchange format for variant catalogs.
  • Existing semantic models (e.g., SB/gvar) focus on variants as Linked Data, not a complete RDF rendering of VCF files.
    We therefore model the VCF artifact, header lines, and call-level fields here, and enable alignment to SB/gvar (and optionally HERO).

Namespace

Target persistent namespace:

  • https://w3id.org/vcf-rdfizer/vocab#

(You can use it immediately; later you can register it via w3id.org and configure redirects.)

Canonical IRI Pattern

Recommended base for VCF instance resources:

  • file://{vcfFilePath}

Recommended templates (also formalized in ontology via vcfr:iriTemplate):

VCFFile          file://{vcfFilePath}
VCFHeader        file://{vcfFilePath}#header
HeaderLine       file://{vcfFilePath}#header/line/{lineId}
VCFRecord        file://{vcfFilePath}#record/{recordId}
VariantCall      file://{vcfFilePath}#call/{recordId}
SampleCall       file://{vcfFilePath}#sample/{recordId}/{sampleId}
InfoFieldValue   file://{vcfFilePath}#call/{recordId}/info/{fieldKey}
FormatFieldValue file://{vcfFilePath}#sample/{recordId}/{sampleId}/fmt/{fieldKey}

Key concepts

VCF file and headers

  • vcfr:VCFFile – a VCF file artifact (a dataset distribution)
  • vcfr:VCFHeader – container for header lines
  • Header line types (subclasses of vcfr:HeaderLine):
    • vcfr:FileFormatHeaderLine for ##fileformat
    • vcfr:FileDateHeaderLine for ##fileDate
    • vcfr:SourceHeaderLine for ##source
    • vcfr:ReferenceHeaderLine for ##reference
    • vcfr:ContigHeaderLine for ##contig
    • vcfr:INFOHeaderLine for ##INFO=<...>
    • vcfr:FORMATHeaderLine for ##FORMAT=<...>
    • vcfr:FILTERHeaderLine for ##FILTER=<...>
    • vcfr:ALTHeaderLine for ##ALT=<...>

VCF records and calls

  • vcfr:VCFRecord – one row of a VCF (variant observation statement)
  • vcfr:VariantCall – call-level representation (QUAL/FILTER/INFO/FORMAT + sample calls)
  • vcfr:SampleCall – per-sample call values (GT/DP/AD/…)

Alignment

This vocabulary:

  • can link a vcfr:VCFRecord / vcfr:VariantCall to SB/gvar’s so:0001059 (SequenceAlteration) representation using vcfr:asSequenceAlteration.

Missing values (.)

  • Missing VCF token . is modeled as a typed literal: "."^^vcfr:Null.
  • This avoids using plain "."^^xsd:string and keeps missingness explicit in RDF.

SB/gvar reference:

Validation

SHACL shapes are provided in shacl/vcf-rdfizer-vocabulary.shacl.ttl.

Documentation

Graph and export files are generated from ontology/vcf-rdfizer-vocabulary.ttl by scripts/build-ontology-graph-data.mjs and scripts/export-ontology-graph-svg.mjs (both invoked by scripts/sync-docs-assets.sh). Per-term HTML pages are generated by scripts/build-term-pages.mjs (also invoked by scripts/sync-docs-assets.sh). The formatted example graph (examples/example.ttl) is generated from examples/example.nt by scripts/convert-example-nt-to-ttl.mjs (also invoked by scripts/sync-docs-assets.sh).

Quick example

See:

  • examples/example-headers.ttl
  • examples/example-minimal-record.ttl
  • examples/example.ttl (formatted from example.nt)
  • examples/example.nt
  • examples/example.vcf

Publishing

  • Host this repo with GitHub/GitLab pages for HTML docs (optional).
  • Register w3id redirect:
    • Desired path: /vcf-rdfizer/
    • Redirect to your hosted ontology + docs.

License

  • CC BY 4.0 (see LICENSE)

About

A repository to contain the semantic vocabulary used for the VCF-RDFizer tool.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors