Skip to content

feat: Collection Storyteller — AI-generated narratives from collection patterns #202

@SimplicityGuy

Description

@SimplicityGuy

Overview

Use the MCP server and LLM integration to generate rich, personalized narratives about a user's record collection. Instead of raw statistics, tell the story: "Your collection reads like a love letter to the British post-punk scene. It starts with Joy Division's Unknown Pleasures in 1979 and follows every thread — the Factory Records catalog, the 4AD ethereal wave, the Creation Records jangle — until you land squarely in shoegaze territory by 1991."

This transforms cold data into engaging prose that helps collectors see their own taste from a new perspective. The knowledge graph provides the structure; the LLM provides the voice.

Narrative Types

1. Collection Origin Story

  • "How your collection began" — trace the earliest releases and infer the starting point
  • Identify the foundational genres/labels/artists and how taste evolved over time
  • Highlight pivotal releases that mark genre transitions

2. Taste Evolution Timeline

  • Decade-by-decade narrative of how the collection shifted
  • "In the '80s you were all about synth-pop, but by the '90s you'd pivoted to trip-hop and downtempo"
  • Identify the bridge releases that connected eras

3. Hidden Connections

  • Surface non-obvious relationships in the collection using graph paths
  • "You might not realize it, but your Aphex Twin and your Miles Davis records are connected through 3 hops: ambient → fourth world → fusion → modal jazz"
  • Use path finder and collaboration data to find surprising links

4. Collector Profile

  • A shareable "about me" paragraph for a collector
  • Genre affinities, era focus, format preferences, label loyalties
  • Written in an engaging, magazine-style voice

5. Label Deep Dive

  • Narrative about the user's relationship with a specific label
  • "You own 47 of Blue Note's 1,200+ releases, spanning from 1958 to 2019. Your picks lean hard into the hard bop era..."

6. Gap Narrative

  • Turn gap analysis data into actionable storytelling
  • "You're 3 releases away from owning every Boards of Canada album. Here's what you're missing and why each one matters."

Proposed Endpoints

API Endpoints (api/routers/stories.py)

Endpoint Description
GET /api/user/story/origin Collection origin story narrative
GET /api/user/story/evolution Taste evolution timeline narrative
GET /api/user/story/connections Hidden connections narrative
GET /api/user/story/profile Shareable collector profile paragraph
GET /api/user/story/label/{label_id} Label deep dive narrative
GET /api/user/story/gaps Gap narrative for nearest completions
POST /api/user/story/regenerate/{type} Force regeneration of a specific narrative

Response Shape

{
  "type": "origin",
  "narrative": "Your collection tells the story of someone who discovered music through...",
  "generated_at": "2026-03-25T10:30:00Z",
  "data_points": {
    "collection_size": 342,
    "earliest_release_year": 1967,
    "genres_covered": 18,
    "key_labels": ["Factory Records", "4AD", "Warp Records"],
    "key_artists": ["Joy Division", "Cocteau Twins", "Aphex Twin"],
    "era_focus": "1978-1995"
  },
  "shareable_url": "/story/abc123"
}

Architecture

Data Assembly Layer (api/stories/assembler.py)

Gathers structured data from existing endpoints to build the LLM context:

  • Collection stats, timeline, evolution (existing endpoints)
  • Taste fingerprint, blindspots (existing endpoints)
  • Gap analysis data (existing endpoints)
  • Graph paths between key collection nodes (path finder)
  • Label DNA for top labels (existing endpoint)

Narrative Generation Layer (api/stories/generator.py)

  • Constructs a structured prompt with assembled data
  • Calls LLM via configurable provider (Anthropic Claude API preferred)
  • Prompt templates per narrative type, stored as Jinja2 templates
  • Response parsing and validation

Caching Layer

  • Generated narratives cached in Redis (TTL: until next collection sync)
  • Invalidated on collection sync completion
  • Regeneration endpoint bypasses cache

Sharing Layer

  • Shareable narratives stored in PostgreSQL with unique tokens
  • Public endpoint to view shared narratives (no auth required)
  • Optional: OG meta tags for social media previews

LLM Integration Design

Provider Abstraction

class NarrativeProvider(Protocol):
    async def generate(self, prompt: str, context: dict) -> str: ...

class AnthropicProvider(NarrativeProvider): ...
class OpenAIProvider(NarrativeProvider): ...  # fallback

Prompt Strategy

  • System prompt establishes voice: knowledgeable music journalist, warm but not sycophantic
  • User prompt includes structured collection data as JSON
  • Prompt templates are versioned and stored in api/stories/prompts/
  • Temperature ~0.7 for creative but grounded output
  • Max tokens capped per narrative type (origin: 500, evolution: 800, connections: 400, profile: 200)

Cost Control

  • Narratives generated once per sync cycle, not on every request
  • Cached aggressively in Redis
  • Rate limit: 1 regeneration per narrative type per hour
  • Admin config to enable/disable feature and set provider

Explore UI — Stories Section

Add a My Story section to the Collection pane.

Layout

  1. Story Carousel — swipeable cards for each narrative type (origin, evolution, connections, profile)
  2. Active Story — full narrative display with pull-quotes and highlighted entities (clickable to explore in graph)
  3. Share Button — generate shareable link or copy-to-clipboard
  4. Regenerate Button — request fresh narrative (rate limited)
  5. Label Stories — dropdown to select a label for deep dive narrative

UI Details

  • Entity mentions in narratives are hyperlinked to the explore pane
  • Pull-quotes extracted from narrative for visual emphasis
  • Loading state shows "Crafting your story..." with animated writing indicator
  • Empty state for users without collections: "Connect your Discogs account to discover your collection's story"

Integration Points

Implementation Notes

  • LLM API key stored as environment variable (ANTHROPIC_API_KEY or OPENAI_API_KEY)
  • Feature gated behind config flag (STORIES_ENABLED=true)
  • Graceful degradation: if LLM unavailable, return structured data without narrative
  • Prompt injection prevention: collection data is structured JSON, never raw user input in prompt
  • Narrative quality validation: basic length/coherence checks before caching
  • Consider batching all narrative types in a single LLM call with structured output

Acceptance Criteria

  • Data assembler gathers structured context from existing endpoints
  • At least 3 narrative types implemented (origin, evolution, profile)
  • LLM provider abstraction supports Anthropic Claude
  • Narratives cached in Redis, invalidated on collection sync
  • Shareable narrative URLs with public read access
  • Entity mentions in narratives hyperlinked to explore pane
  • Rate limiting on regeneration endpoint
  • Feature gated behind config flag
  • Graceful degradation when LLM unavailable
  • MCP server exposes collection story tool
  • ≥80% test coverage

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions