Extract dish data from a restaurant menu PDF and output it as structured, normalized JSON. Uses google-adk + Gemini for extraction.
Assuming you have Homebrew installed, install the local tooling:
brew bundleInstall the asdf python/pdm plugins:
asdf plugin add python
asdf plugin add pdmInstall the pinned Python / PDM versions (.tool-versions):
asdf installCreate and activate the venv, then install dependencies:
pdm venv create
source .venv/bin/activate
pdm install -G devCopy the example file and add your Gemini API key:
cp .env.example .envThen set GOOGLE_API_KEY Google AI Studio
Put one or more menu files (.pdf, .png, .jpg, .jpeg, .webp) into a directory —
each file is treated as one menu — then run:
pdm run extract # reads ./menus
pdm run extract <dir> # or a directory you chooseThe menus/ folder already ships with a sample menu (espn_bet (1).pdf), so
pdm run extract works out of the box with no extra setup.
Every menu is processed concurrently; the extracted dishes are written to
output/<filename>.json.
Each menu is a flat JSON array of Dish objects (src/schema.py):
dish_id— stable id; other dishes reference it from their options.category— menu section (e.g.BURGERS).dish_name/description— multi-line descriptions are merged;descriptionisnullif absent.price— afloat, ornullfor$Xplaceholders and price-less items (e.g. sauces).currency— a separate field, ISO 4217 code (e.g.USD);nullwhen there is no price.options— list ofOptionGroup({category, dishes}); each choice is a lightChoiceDish({dish_id, surcharge}).
Key idea: a single physical item (a side, a sauce) is stored once as a standalone
Dish; combos reference it by dish_id from their options. So "choice of side",
wings + sauces, and flights are modeled without duplicating data. The graph is
non-recursive (Dish → OptionGroup → ChoiceDish), which keeps it clean and usable as an
LLM output schema. Extra-cost choices carry a surcharge (e.g. "+$2").
- Tool: Built end-to-end with Claude Code (Claude) — scaffolding (pdm/asdf/ruff), the Pydantic schema, the google-adk agent + tool, the runner, the CLI entry point, and the agent instructions.
- Approach: google-adk + Gemini (
gemini-flash-latest). The agent reads the PDF (multimodal) and calls one tool,save_dishes, which validates each dish and writes it to session state; the CLI reads that state and dumps JSON, one file per menu. - Adapted / why: the tool takes a
list[str]of per-dish JSON (ADK reliably passes only simple types); per-dish validation + referential-integrity checks (optiondish_ids must already exist) so one bad item doesn't sink the batch; currency is normalized/validated to ISO codes; theDishJSON schema is injected into the tool description dynamically. - Assumptions / edge cases:
$Xand price-less items →price=null; sides/sauces/rubs are stored as standalone dishes so combos can reference them by id; implicit combos (e.g. a name like "8 wings & 8 sauces") are modeled as option groups after an instruction fix for the FLIGHTS case the model first missed. - Known gaps: no automated tests; validated against the single provided menu; output
quality depends on the model; most drink prices are
nullbecause the menu prints$X.