Skip to content

Latest commit

 

History

History
120 lines (86 loc) · 2.29 KB

File metadata and controls

120 lines (86 loc) · 2.29 KB

Getting Started with Serving Cards

This guide walks you through creating, validating, and using your first serving card.

Installation

# Clone the repo (PyPI package coming soon)
git clone https://github.com/zenprocess/servingcard
cd servingcard/packages/python
pip install -e .

This installs the servingcard CLI and the Python library.

Your First Serving Card

A serving card captures the exact configuration and benchmark results for serving a model on specific hardware. Start by creating a YAML file:

servingcard: "1.0"
model: llama4-scout
variant: fp16-baseline
hardware: nvidia-rtx4090
framework: vllm
author: your-name
created: "2026-03-26"
method: manual

serving:
  tensor_parallel_size: 1
  gpu_memory_utilization: 0.90
  max_model_len: 32768

benchmark:
  single_stream:
    tok_s: 38.5
    ttft_ms: 420

Save this as my-first-card.yaml.

Validate

Run the validator to check your card:

servingcard validate my-first-card.yaml

If everything is correct, you will see:

VALID: my-first-card.yaml

If there are errors, the validator tells you exactly what to fix:

INVALID: my-first-card.yaml
  - Missing required field: author
  - Missing benchmark section

Inspect

View a summary of any serving card:

servingcard info my-first-card.yaml

Output:

Model:      llama4-scout
Variant:    fp16-baseline
Hardware:   nvidia-rtx4090
Framework:  vllm
Author:     your-name
Method:     manual

Benchmark:
  Single stream: 38.5 tok/s
  TTFT:          420 ms

Use in Code

Load a serving card in Python:

from servingcard.schema import ServingCard

card = ServingCard.from_yaml("my-first-card.yaml")

print(card.model)          # llama4-scout
print(card.hardware)       # nvidia-rtx4090

if card.benchmark and card.benchmark.single_stream:
    print(f"{card.benchmark.single_stream.tok_s} tok/s")

Browse the Registry

Search for existing configs:

servingcard search --model qwen3-coder
servingcard search --hardware nvidia-gb10

Next Steps