DSPy Reference Examples

Models:

Real-world DSPy workflows for pharma/medtech teams. This project provides a flexible multi-classification system for Ozempic-related text analysis. Currently supports three classification tasks:

AE vs PC Detection - Distinguish Adverse Events from Product Complaints
AE Category Classification - Categorize adverse events into specific medical categories
PC Category Classification - Categorize product complaints into specific quality categories

The framework shows how to:

Programmatically optimize prompts with DSPy
Support multiple classification tasks with dynamic signatures
Track experiments with MLflow (SQLite backend for easy querying)
Persist tuned artifacts to disk (separate from source)
Serve classifiers via FastAPI with typed Pydantic contracts

Architecture overview

Requirements

Python 3.13+
uv for env + dependency management (no pip/poetry)
OpenAI-compatible API key
- Default provider: OpenRouter using nvidia/nemotron-3-nano-30b-a3b:free
- Override via environment variables without touching code

Environment variables

Variable	Description	Default
`OPENROUTER_API_KEY`	Primary key for OpenRouter	—
`OPENAI_API_KEY` / `DSPY_API_KEY`	Override for OpenAI/custom key	—
`DSPY_MODEL_NAME`	Model ID	`nvidia/nemotron-3-nano-30b-a3b:free`
`DSPY_LOCAL_BASE`	Base URL for local provider	`http://localhost:8080/v1`
`DSPY_HTTP_HEADERS`	JSON blob for extra HTTP headers	`{}`
`OPENROUTER_HTTP_REFERER`, `OPENROUTER_APP_TITLE`	OpenRouter analytics headers	—
`DSPY_RUN_ID`	Training run identifier	auto-generated
`DSPY_ARTIFACT_AUTO_UPDATE`	Auto-update artifact model metadata on load	`false`

Copy .env.example and fill in whichever keys you need:

cp .env.example .env

Project Setup

uv sync --extra dev

source .venv/bin/activate

Generate training data for all classification types

uv run python scripts/datagen/adverse_event_sample_data.py
uv run python scripts/datagen/complaint_category_sample_data.py
uv run python scripts/datagen/ae_pc_classification_sample_data.py

This creates a clean layout:

.
├── artifacts/                           # Saved DSPy artifacts (git-tracked)
│   ├── ozempic_classifier_ae-pc_optimized.json
│   ├── ozempic_classifier_ae-category_optimized.json
│   └── ozempic_classifier_pc-category_optimized.json
├── data/                                # Synthetic train/test data organized by task
│   ├── ae-pc-classification/            # AE vs PC detection
│   │   ├── train.json
│   │   └── test.json
│   ├── ae-category-classification/      # AE category classification
│   │   ├── train.json
│   │   └── test.json
│   └── pc-category-classification/      # PC category classification
│       ├── train.json
│       └── test.json
├── mlflow/                              # MLflow experiment tracking (auto-created)
│   ├── mlflow.db                        # SQLite database for runs/metrics
│   └── artifacts/                       # Logged artifacts
├── scripts/
│   ├── datagen/                         # Data generation scripts
│   └── deploy/                          # Deployment scripts
├── src/
│   ├── api/                             # FastAPI app
│   ├── common/                          # Shared logic (config, datasets, classifier)
│   ├── pipeline/                        # Optimization pipeline
│   └── serving/                         # Pydantic request/response + helpers
└── inference_demo.py                    # Simple batch inference helper

Code Formatting

This project uses Ruff for both formatting and linting (line length: 120).

Format and fix all issues:

uv run ruff format .              # Format all Python files
uv run ruff check --fix .         # Fix all auto-fixable linting issues

Check for issues without fixing:

uv run ruff check .               # Check for linting issues
uv run ruff format --check .      # Check formatting without changing files

Note: Ruff's formatter preserves triple-quoted strings (""") as-is by design. For files with long triple-quoted strings (like data generation scripts), you may need to manually wrap them if desired.

VSCode users: Format on save is enabled by default using Ruff. Install the recommended extensions (Python, Ruff) when prompted.

1. Optimize / Refresh the Classifier

Train a classifier for a specific task using the --classification-type flag:

Train AE vs PC classifier (default)

uv run python -m src.pipeline.main --classification-type ae-pc

Train AE category classifier

uv run python -m src.pipeline.main --classification-type ae-category

Train PC category classifier

uv run python -m src.pipeline.main --classification-type pc-category

CLI Options

Flag	Short	Description
`--classification-type`	`-t`	Classification type: `ae-pc`, `ae-category`, `pc-category` (default: `ae-pc`)
`--verbose`	`-v`	Show detailed output (per-example evaluation, MIPROv2 progress)
`--inspect`	`-i`	Show DSPy prompts/responses after optimization completes

# Quiet output (default) - just key progress messages
uv run python -m src.pipeline.main -t ae-pc

# Verbose - see evaluation details and optimizer progress
uv run python -m src.pipeline.main -t ae-pc --verbose

# Inspect prompts after training
uv run python -m src.pipeline.main -t ae-pc --inspect

# Both verbose and inspect
uv run python -m src.pipeline.main -t ae-pc -v -i

The run will:

Configure DSPy with your provider settings.
Load the appropriate data/<type>-classification/train.json and test.json.
Evaluate the baseline classifier.
Optimize via MIPROv2 (with auto="medium").
Evaluate the optimized program.
Write the artifact to artifacts/ozempic_classifier_<type>_optimized.json.
Log params, metrics, and artifacts to MLflow (mlflow/mlflow.db).

Experiment Tracking with MLflow

Training runs are automatically tracked in a local SQLite database. Query your experiments:

List all runs with metrics

sqlite3 mlflow/mlflow.db "
SELECT 
    e.name as experiment,
    r.name as run_name,
    r.status,
    m.key,
    m.value
FROM runs r
JOIN experiments e ON r.experiment_id = e.experiment_id
LEFT JOIN metrics m ON r.run_uuid = m.run_uuid
ORDER BY r.start_time DESC;
"

Compare baseline vs optimized accuracy across runs

sqlite3 mlflow/mlflow.db "
SELECT 
    r.name,
    MAX(CASE WHEN m.key = 'baseline_accuracy' THEN m.value END) as baseline,
    MAX(CASE WHEN m.key = 'optimized_accuracy' THEN m.value END) as optimized,
    MAX(CASE WHEN m.key = 'improvement' THEN m.value END) as improvement
FROM runs r
JOIN metrics m ON r.run_uuid = m.run_uuid
GROUP BY r.run_uuid
ORDER BY r.start_time DESC;
"

Or launch the MLflow UI:

mlflow ui --backend-store-uri sqlite:///mlflow/mlflow.db

2. Serve the Classifier via FastAPI

uv run uvicorn src.api.app:app --reload

API Root: http://localhost:8000/ (shows available endpoints)
Swagger/OpenAPI UI: http://localhost:8000/docs
ReDoc UI: http://localhost:8000/redoc
Health endpoint: GET /health

Classification Endpoints

The API provides three classification endpoints:

POST /classify/ae-pc - Classify as Adverse Event or Product Complaint (first-stage classification)
POST /classify/ae-category - Classify adverse events into specific medical categories (e.g., Gastrointestinal disorders, Pancreatitis, Hypoglycemia)
POST /classify/pc-category - Classify product complaints into quality/defect categories (e.g., Device malfunction, Packaging defect)

Example: AE vs PC Classification

curl -X POST http://localhost:8000/classify/ae-pc \
     -H "Content-Type: application/json" \
     -d '{
           "complaint": "After injecting Ozempic I had severe hives and needed an EpiPen."
         }'

Response:

{
  "classification": "Adverse Event",
  "justification": "Describes a systemic allergic reaction following Ozempic use.",
  "classification_type": "ae-pc"
}

Example: AE Category Classification

curl -X POST http://localhost:8000/classify/ae-category \
     -H "Content-Type: application/json" \
     -d '{
          "complaint": "I experienced severe nausea and vomiting after taking Ozempic."
         }'

Example: PC Category Classification

curl -X POST http://localhost:8000/classify/pc-category \
     -H "Content-Type: application/json" \
     -d '{
          "complaint": "The pen arrived with a cracked dose dial."
         }'

If an artifact is missing, the API returns 503 Service Unavailable with instructions to rerun the pipeline.

3. Use the Pydantic Interface Directly

from src.common.config import configure_lm
from src.serving.service import ComplaintRequest, get_classification_function

configure_lm()
predict = get_classification_function()

payload = ComplaintRequest(complaint="Pen arrived with a broken dose dial.")
result = predict(payload)
print(result.classification, result.justification)

Pass model_path="artifacts/ozempic_classifier_optimized.json" (or another artifact) to pin a different tuned model per tenant or use-case.

Demo Script

uv run python inference_demo.py

Runs a few sample complaints through the classifier and shows the full DSPy prompt/response for each using dspy.inspect_history(). Useful for demos and understanding how DSPy translates to actual LLM requests.

4. Docker & Railway Deployment

Build & Run Locally

docker build -t dspy-reference .
docker run --rm \
  --env-file .env \
  -p 8080:8080 \
  -v "$(pwd)/data:/data" \
  dspy-reference

The image uses the pre-baked .venv from uv sync --frozen --no-dev and serves FastAPI on 0.0.0.0:8080.
Mount $(pwd)/data to /data when you need persistence (e.g., refreshed artifacts, uploads, sqlite files).
Override the port by passing -e PORT=9000; the default command reads PORT and falls back to 8080.
Run portability smoke checks for Railway-like and Foundry-like runtimes:
```
bash scripts/test_docker_portability.sh
```

Deploy to Railway

Push this repo (with the Dockerfile) to GitHub and create a Railway project using the Docker template.
In the Railway dashboard, set the required env vars (OPENROUTER_API_KEY, DSPY_MODEL_NAME, etc.). Railway automatically sets PORT; no build args are needed.
Attach a persistent volume mounted at /data if you need on-disk artifacts or databases.
Each deploy builds directly from the Dockerfile’s multi-stage workflow; use railway up or manual deploys after committing changes.

The container always starts via uvicorn src.api.app:app --host 0.0.0.0 --port ${PORT:-8080}, matching the local dev commands.

5. Foundry OpenAPI Compute Module Deployment

This repo is configured to support a Foundry-friendly workflow: ship a Docker image that embeds an OpenAPI contract (as the server.openapi image label), then import FastAPI routes as Foundry functions via Detect from OpenAPI specification.

More context + screenshots:

docs/foundry-auto-deploy.md

Generate and validate the Foundry-constrained OpenAPI artifact:

uv run python scripts/deploy/foundry_openapi.py --generate --spec-path openapi.foundry.json
uv run python scripts/deploy/foundry_openapi.py --spec-path openapi.foundry.json

The generated Foundry profile uses servers: [{"url":"http://localhost:5000"}].

Validate both the spec and a built image (checks linux/amd64, numeric non-root user, and server.openapi label):

uv run python scripts/deploy/foundry_openapi.py \
  --spec-path openapi.foundry.json \
  --image-ref "<registry>/<repo>/<image>:<tag>"

Full build/push/import sequence is in docs/foundry-openapi-runbook.md. GitHub workflow automation and required secrets/variables are documented in docs/deploy-ci.md.

Local LLM Server (llama.cpp)

To run a local LLM server using llama.cpp:

cd llama.cpp

# Build llama.cpp
cmake -B build
cmake --build build --config Release

# Download the model from Hugging Face (save to models directory)
# Visit the model page on HF for the curl command, e.g.:
# curl -L -o models/Nemotron-3-Nano-30B-A3B-UD-Q3_K_XL.gguf <HF_URL>

Start the server

./serve.sh -m ~/llama.cpp/models/Nemotron-3-Nano-30B-A3B-UD-Q3_K_XL.gguf

Then configure DSPy to use your local server by setting:

export DSPY_LOCAL_BASE=http://localhost:8080/v1
export DSPY_MODEL_NAME=local-model

Notes & Next Steps

Replace data/*-classification/*.json with real labeled datasets or update src/common/data_utils.py to read from your storage systems.
Add new classification types by:
1. Adding a new entry to CLASSIFICATION_CONFIGS in src/common/classifier.py
2. Adding a new entry to CLASSIFICATION_TYPES in src/common/paths.py
3. Creating training data scripts in scripts/datagen/
4. Training with --classification-type <new-type>
Add additional pipelines (extraction, severity grading, etc.) by following the same pattern: shared logic in src/common, tuning flows in src/pipeline, serving code in src/api/src/serving.
The LM client is OpenAI-compatible; switching to Anthropic, Azure OpenAI, or self-hosted proxies is just a matter of environment variables.

License

MIT – see LICENSE for details.

Author

Created by Anand Pant

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.claude		.claude
.github/workflows		.github/workflows
.vscode		.vscode
.zed		.zed
artifacts		artifacts
assets		assets
cloudflare		cloudflare
data		data
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
inference_demo.py		inference_demo.py
openapi.foundry.json		openapi.foundry.json
opencode.jsonc		opencode.jsonc
pyproject.toml		pyproject.toml
serve.sh		serve.sh
uv.lock		uv.lock

License

shpitdev/dspy-template-openapi-compute-modules-function

Folders and files

Latest commit

History

Repository files navigation

DSPy Reference Examples

Architecture overview

Requirements

Environment variables

Project Setup

Generate training data for all classification types

Code Formatting

1. Optimize / Refresh the Classifier

Train AE vs PC classifier (default)

Train AE category classifier

Train PC category classifier

CLI Options

Experiment Tracking with MLflow

List all runs with metrics

Compare baseline vs optimized accuracy across runs

2. Serve the Classifier via FastAPI

Classification Endpoints

Example: AE vs PC Classification

Example: AE Category Classification

Example: PC Category Classification

3. Use the Pydantic Interface Directly

Demo Script

4. Docker & Railway Deployment

Build & Run Locally

Deploy to Railway

5. Foundry OpenAPI Compute Module Deployment

Local LLM Server (llama.cpp)

Start the server

Then configure DSPy to use your local server by setting:

Notes & Next Steps

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages