CSPAN Booknotes

This project has two main components:

Dataset Creation: A Python-based data pipeline that crawls, parses, and organizes transcripts from the classic CSPAN show Booknotes
Interactive Chat App: A web application that lets users "resume" historical Booknotes conversations using AI

Project Structure

cspan-booknotes/
├── dataset/          # Python data pipeline & HuggingFace Hub management
└── app/              # Web application (Svelte + Bun + Neon DB)

Dataset

The /dataset directory contains tools for creating a unique dataset from the public archives of the wonderful CSPAN program Booknotes. The dataset includes transcripts of conversations between host Brian Lamb and his more than 800 guests.

Available Datasets

There are (3) datasets available:

programs: Information for ~809 episodes, including title, description and guest information
transcripts: Full conversation transcripts (~200 turns/conversation) between Brian Lamb and his guests
related_items: Related or recommended programs (~5) for each episode

The transcripts dataset is the key dataset, with the other 2 providing additional context and information about each episode. Using the sequence, speaker_role, and text fields, we can create a chat-like dataset representing very interesting conversations.

Working with the Dataset

cd dataset
uv sync                    # Install dependencies
uv run scripts/parse_programs.py    # Parse episodes
uv run scripts/upload_to_hf.py      # Upload to HuggingFace Hub

See dataset/README.md for detailed documentation.

Web Application

The /app directory contains an interactive web application that lets users select historical Booknotes episodes and "resume" the conversations using AI. The application simulates both Brian Lamb and his guests continuing their discussions.

Tech Stack

Frontend: SvelteKit
Backend: Bun + Hono
Database: Neon DB (PostgreSQL)
AI: Claude (Anthropic)
Deployment: Railway

Development

cd app
bun install           # Install dependencies
bun run dev           # Start both frontend and backend

The frontend runs on http://localhost:5173 and the backend API on http://localhost:3000.

See app/README.md for detailed documentation.

Source JSON Schema

To understand how the 3 tables work together, here's the source JSON schema for each program:

{
  "id": "51559-1",
  "url": "https://booknotes.c-span.org/Watch/51559-1",
  "title": "For the Sake of Argument",
  "guest": "Christopher Hitchens",
  "description": "Mr. Hitchens discussed the recent publication...",
  "book_isbn": "0860914356",
  "air_date": "October 17, 1993",
  "transcript": [
    {
      "sequence": 0,
      "speaker_role": "host",
      "speaker_name": "BRIAN LAMB, HOST:",
      "text": "Christopher Hitchens, author of For the Sake of Argument..."
    },
    {
      "sequence": 1,
      "speaker_role": "guest",
      "speaker_name": "CHRISTOPHER HITCHENS:",
      "text": "Yes."
    }
  ],
  "related": [
    {
      "id": "55567-1",
      "url": "https://booknotes.c-span.org/Watch/55567-1",
      "author": "John Corry",
      "title": "My Times: Adventures in the News Trade"
    }
  ]
}

License

See LICENSE.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
app		app
dataset		dataset
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
SCHEMA.md		SCHEMA.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSPAN Booknotes

Project Structure

Dataset

Available Datasets

Working with the Dataset

Web Application

Tech Stack

Development

Source JSON Schema

License

About

Uh oh!

Releases

Packages

Languages

License

cldixon/cspan-booknotes

Folders and files

Latest commit

History

Repository files navigation

CSPAN Booknotes

Project Structure

Dataset

Available Datasets

Working with the Dataset

Web Application

Tech Stack

Development

Source JSON Schema

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages