Skip to content

feat: Sample Chain Tracker — sampling relationships and lineage in the knowledge graph #201

@SimplicityGuy

Description

@SimplicityGuy

Overview

Overlay sampling relationships onto the existing knowledge graph, revealing the hidden connections between tracks. "Amen, Brother by The Winstons has been sampled 4,532 times — here's the family tree." Sampling is one of music's most fascinating relationship types and a natural extension of the graph model.

Since Discogs itself has limited sampling data, this feature integrates with external sources (WhoSampled, MusicBrainz recording relationships) and maps them onto existing graph nodes. The result: a new SAMPLED relationship type that enables sample chain exploration, breakbeat archaeology, and producer lineage tracking.

Data Sources

Primary: WhoSampled (web scraping / manual import)

  • Richest sampling database (~900K+ connections)
  • Maps: Track A sampled Track B, with timestamps and context
  • No public API — requires structured import (CSV/JSON) or community dataset
  • Licensing considerations: data used for graph relationships only, not reproduced verbatim

Secondary: MusicBrainz Recording Relationships

Tertiary: Community Contributions

  • Allow authenticated users to submit sampling relationships
  • Moderation queue before graph insertion
  • Links to evidence (timestamp in track, confirmation sources)

Graph Model Extension

New Relationship

(r1:Release)-[:SAMPLED {track_from: "string", track_to: "string", year: int, confirmed: bool, source: "string"}]->(r2:Release)

New Node (optional, for track-level granularity)

(:Track {title: "string", position: "string", duration: "string"})
-[:APPEARS_ON]->(:Release)

Track nodes are optional in Phase 1 — sample relationships can connect at the Release level initially, with track metadata stored as relationship properties.

Derived Relationships

  • (a1:Artist)-[:SAMPLED_ARTIST]->(a2:Artist) — materialized view for artist-level queries
  • Sample chain depth: how many generations of sampling separate two releases

Proposed Endpoints

API Endpoints (api/routers/samples.py)

Endpoint Description
GET /api/samples/{release_id} Samples used in this release and releases that sample it
GET /api/samples/{release_id}/chain Full sample chain — recursive upstream and downstream
GET /api/samples/{release_id}/tree Tree visualization data for sample lineage
GET /api/samples/most-sampled Most-sampled releases in the database
GET /api/samples/artist/{artist_id} All sampling relationships for an artist's discography
GET /api/samples/search Search for sampling connections between two entities
POST /api/samples/submit Submit a new sampling relationship (authenticated, queued for moderation)

Response Shape (example: release samples)

{
  "release_id": 456,
  "title": "It Takes a Nation of Millions to Hold Us Back",
  "artist": "Public Enemy",
  "samples_used": [
    {
      "release_id": 789,
      "title": "Funky Drummer",
      "artist": "James Brown",
      "track_from": "Bring the Noise",
      "track_to": "Funky Drummer",
      "source": "whosampled",
      "confirmed": true
    }
  ],
  "sampled_by": [
    {
      "release_id": 1011,
      "title": "...",
      "artist": "...",
      "track_from": "...",
      "track_to": "...",
      "year": 2005
    }
  ],
  "stats": {
    "total_samples_used": 45,
    "total_sampled_by": 12,
    "chain_depth_upstream": 3,
    "chain_depth_downstream": 2
  }
}

Response Shape (example: chain)

{
  "root": {
    "release_id": 456,
    "title": "Funky Drummer",
    "artist": "James Brown",
    "year": 1970
  },
  "generations": [
    {
      "depth": 1,
      "releases": [
        {"release_id": 789, "title": "It Takes a Nation...", "artist": "Public Enemy", "year": 1988},
        {"release_id": 790, "title": "3 Feet High and Rising", "artist": "De La Soul", "year": 1989}
      ]
    },
    {
      "depth": 2,
      "releases": [
        {"release_id": 1011, "title": "...", "artist": "...", "year": 2003}
      ]
    }
  ],
  "stats": {
    "total_descendants": 4532,
    "max_depth": 5,
    "most_active_decade": "1990s"
  }
}

Explore UI — "Samples" Pane

Add a Samples pane to the Explore sidebar.

Layout

  1. Sample Search — autocomplete search for a release, shows upstream (what it samples) and downstream (what sampled it) as a bidirectional list
  2. Sample Tree Visualization — D3.js tree/radial layout showing the full sample chain from a root release. Nodes colored by decade, sized by how many times they were sampled
  3. Most Sampled Leaderboard — ranked list of the most-sampled releases in the database, with sample count and genre
  4. Submit Sample (authenticated) — form to submit a new sampling relationship with source release, target release, track names, and evidence URL

Integration with Existing Explore Pane

  • When viewing a release/artist node in the graph, show a "Samples" badge if sampling relationships exist
  • "Show sample connections" toggle in the graph view to overlay sampling edges (distinct color/style)
  • Sample relationships included in path finder results

Integration Points

Implementation Notes

  • Phase 1: MusicBrainz recording relationships (free, API-accessible, integrates with feat: MusicBrainz API integration — enrich knowledge graph with external metadata and relationships #168)
  • Phase 2: Structured import from community datasets (CSV format for WhoSampled-like data)
  • Phase 3: User submissions with moderation queue
  • Sample chain queries use variable-length path matching: MATCH path = (r)-[:SAMPLED*1..5]->(root) — cap depth to prevent expensive traversals
  • Index on SAMPLED relationship for performant lookups
  • Precompute "most sampled" leaderboard and cache in Redis (refresh daily)
  • Community submissions stored in PostgreSQL moderation queue before graph insertion

Acceptance Criteria

  • SAMPLED relationship type added to Neo4j schema
  • MusicBrainz sampling relationships imported during enrichment
  • Sample chain endpoint returns recursive upstream/downstream with depth limit
  • Most-sampled leaderboard computed and cached
  • Tree visualization renders in D3.js with decade coloring
  • Sample relationships visible as overlay in main graph view
  • User submission form with moderation queue
  • Path finder traverses SAMPLED edges
  • MCP server exposes sample chain tools
  • ≥80% test coverage

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions