Skip to content

Analysis: Slot URI mapping coverage and alignment with Fairscape/ROCrate #130

@cmungall

Description

@cmungall

Summary

An analysis of how D4D schema slots map to standard vocabularies, and how those mappings align (or conflict) with the Fairscape/ROCrate Pydantic models that already have a D4D conversion layer (fairscape_models/conversion/mapping/d4d.py).

D4D Slot URI Coverage

Of ~414 domain-specific attributes across the D4D modules, approximately 28% are mapped to standard vocabulary URIs via slot_uri:

Module Mapped/Total Coverage
Distribution 3/3 100%
Ethics 7/7 100%
Motivation 8/9 88%
Base (core) 41/65 63%
Maintenance 7/13 53%
Uses 4/10 40%
Variables 9/27 33%
Composition 20/62 32%
Preprocessing 6/22 27%
Collection 5/20 25%
Data Governance 6/39 15%
Evaluation Summary 0/122 0%
Human 0/14 0%

Vocabularies used in D4D

  • dcterms: (Dublin Core) — 70 mappings (dominant)
  • schema: (Schema.org) — 33
  • dcat: (Data Catalog) — 9
  • prov:, skos:, qudt:, DUO: — 1 each

Notable gaps

  • Evaluation Summary (122 slots, 0 mapped) — the largest module with zero URI mappings
  • Human (14 slots, 0 mapped) — human subjects data with no standard vocab links
  • Data Governance (39 slots, 15%) — governance/consent terms could map to DUO/ODRL
  • No mappings to DATS, OBI, IAO, or other biomedical metadata standards

Fairscape URI Approach

Fairscape uses JSON-LD with @vocab: "https://schema.org/" as default namespace plus evi: "https://w3id.org/EVI#" for extensions and rai: for responsible AI fields. Key mappings in Fairscape:

Fairscape field Effective URI Standard
name schema:name Schema.org
description schema:description Schema.org
@id schema:identifier Schema.org
dateCreated schema:dateCreated Schema.org
dateModified schema:dateModified Schema.org
contentUrl schema:contentUrl Schema.org
license schema:license Schema.org
author schema:author Schema.org
contentSize schema:contentSize Schema.org
evi:formats https://w3id.org/EVI#formats EVI (custom)
rai:dataUseCases RAI namespace Custom
rai:dataBiases RAI namespace Custom

D4D ↔ Fairscape Alignment at the URI Level

Both schemas use Schema.org for core metadata. Where they overlap:

D4D slot_uri Fairscape JSON-LD Alignment
schema:name schema:name ✅ Exact
schema:description schema:description ✅ Exact
schema:identifier schema:identifier ✅ Exact
schema:license schema:license ✅ Exact
schema:url schema:url ✅ Exact
dcterms:created schema:dateCreated ⚠️ Same concept, different vocab
dcterms:modified schema:dateModified ⚠️ Same concept, different vocab
dcterms:creator schema:author ⚠️ Same concept, different vocab
dcat:downloadURL schema:contentUrl ⚠️ Same concept, different vocab

The core tension

D4D leans on Dublin Core (dcterms:) for provenance and dates, while Fairscape uses Schema.org for everything. Both are valid standard vocabularies, but this creates unnecessary mapping friction. For example:

  • dcterms:created vs schema:dateCreated — semantically identical
  • dcterms:creator vs schema:author — nearly identical
  • dcat:downloadURL vs schema:contentUrl — same concept

Fairscape's existing D4D mapping

fairscape_models/conversion/mapping/d4d.py already contains a ROCRATE_TO_D4D_MAPPING dict that maps D4D field names to ROCrate/Fairscape source keys. However, this mapping operates at the field name level, not at the URI level. A formal URI-level alignment (e.g., via SSSOM) would:

  1. Make the mapping vocabulary-aware and auditable
  2. Capture the dcterms↔schema.org equivalences explicitly
  3. Identify D4D slots that have no Fairscape equivalent (and vice versa)
  4. Enable automated interoperability tooling

Proposed Next Steps

  1. Add slot_uri mappings to the currently unmapped D4D modules, prioritizing Evaluation Summary (0%) and Human (0%)
  2. Consider harmonizing the dcterms vs schema.org choice — or at minimum, add exact_mappings cross-references between them
  3. Produce a formal SSSOM mapping between D4D slot URIs and Fairscape/ROCrate URIs
  4. Consider adding mappings to domain-relevant standards: DUO (consent), OBI (assays), IAO (information artifacts)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions