FDE Simulation

Hands-on simulations of the Forward Deployed Engineer role. Run a full customer engagement on synthetic but realistic case studies — identify problems, scope wedges, build working agent prototypes, ship eval suites, hand off to production. Then score every artifact against the reference, detect the traps a senior FDE would catch, defend your build under a 5-minute hostile grill, and bundle the result as a portfolio piece.

About

Forward Deployed Engineering (and its equivalents — Solutions Architect, AI Strategist, Agent Strategist, Deployed PM) is customer-embedded technical product work. The day-to-day is: scope an enterprise engagement, identify the highest-value workflow to automate, design a multi-agent system, build a prototype, validate with stakeholders, hand off to production.

There's almost no public material that simulates this end-to-end. Most "AI engineering" tutorials skip the customer-engagement work; most product courses skip the agent architecture. This repo fills the gap with runnable simulations grounded in the actual shape of the job.

What you can do with it:

Run a 4-week customer engagement end-to-end on two fictional cases (insurance + finance), with synthetic data and working Python agent prototypes
Identify and scope problems using six discovery frameworks on a real-feeling customer brief (12-stakeholder political maps, kill-criteria framing, 4-source convergence)
Build agents by extending the prototype scaffolds — 5-7 specialized agents per case, hybrid deterministic + LLM, examiner-readable audit traces
Ship eval suites with pass^k=5 production thresholds, weighted by failure cost, including adversarial cases
Practice the customer-facing craft via Claude-roleplay stakeholder interviews and live customer-simulation rounds
Score your work against the reference solution per phase via a structured rubric (0-3 × 5 dimensions) with a JSON sidecar so progress across attempts is trackable
Detect expert traps — the 5 places per case where a candidate's first instinct produces a defensible-looking artifact that a senior FDE would reject
Defend the build verbally under a 5-minute hostile oral grill that mirrors how OpenAI / Anthropic / Sierra / Palantir actually interview for these roles
Bundle the engagement into a single submittable Markdown pack (memos + prototype tree + key files + eval output + grade summary) ready to attach to a take-home or share as a portfolio sample

The same material doubles as comprehensive interview prep for FDE-style roles at frontier AI labs and AI workforce platforms. That use is documented but secondary; the primary purpose is hands-on role simulation.

Quickstart

90 seconds from clone to your first agent run: see QUICKSTART.md.

git clone https://github.com/kalyvask/fde-simulation.git
cd fde-simulation
bash demo.sh        # macOS / Linux / WSL
.\demo.ps1          # Windows

Runs both prototypes end-to-end at pass^k=5 production threshold. No API key required.

Driving the simulations with Claude Code

If you use Claude Code as your operating environment, the repo is designed for it: CLAUDE.md auto-loads on session start with repo structure, stakeholder personas, and behavioral conventions (do/don't ghostwrite, grade harshly, stay in character). See USING_CLAUDE_CODE.md for the 8 patterns you'll use most — opening an engagement, stakeholder role-play, per-phase grading, extending prototypes, running the 60-min take-home Review, client-simulation scenarios, engagement state coaching, and frameworks-applied review.

The credential layer — what turns practice into proof

Five tools sit on top of the simulations to convert "I ran the sim" into "I scored 82 on Calder, walked into 1 of 5 expert traps, and have a defendable bundle":

Tool	What it does	Where
Coverage map	One-page matrix: each sim format → which interview signals it exercises, plus which company / role each format calibrates against	`COVERAGE_MAP.md`
Verifiable scoring	CLI that grades a phase artifact against the reference using a structured rubric (0-3 × 5 dimensions); produces JSON sidecar + Markdown report; prompt-cached for cheap iteration	`scoring/`
Expert traps + post-mortem	5 traps per case that a senior FDE catches; Claude prompt that reads your artifacts and tells you which traps you walked into	`simulations/1-full-engagement/<case>/EXPERT_TRAPS.md` + `POST_MORTEM_PROMPT.md`
5-min hostile oral grill	Claude prompt that runs a hostile post-take-home defense round; 8-10 grill questions, 5 ground rules, scoring rubric	`simulations/1-full-engagement/<case>/ORAL_GRILL.md`
Artifact bundle exporter	Walks your portfolio directory and emits a single submittable Markdown pack (memos + prototype tree + key files + eval output + grade summary)	`scoring/bundle.py`

End-to-end loop: run the engagement, score each phase with scoring.grade, run the post-mortem to detect traps, fix the highest-leverage gap, re-grade, run the oral grill, bundle for portfolio submission. See scoring/README.md for the exact commands.

Two case studies — full 4-week engagements

Case	Domain	Customer (fictional)	Wedge
Calder Insurance	Personal-lines auto insurance / FNOL	$300M GWP, 14 NE states, 600 claims/day	First-Notice-of-Loss agent workforce; replace one BPO contract + cut LAE ratio by 140bps
Helix Capital	Long-short equity hedge fund	$2.3B AUM, 12 investment professionals, 80-name coverage book	Citation-grounded morning-after-earnings note drafter; 4h analyst task → 30min review

Each case includes:

CASE_BRIEF.html — consulting-style case-study reading page. Stakeholders, headline numbers, kill-criteria, working hypothesis. Open in browser; print-friendly.
START_HERE.md — in-character engagement kickoff. The customer's email arrives at a specific timestamp; engagement clock starts.
00_brief.md — formal customer brief in markdown.
4-week structure — discovery → solution → validation → handoff. Each week has reference solutions you compare against AFTER your own attempt.
Working Python prototype — 5-7 agents (Calder: 5; Helix: 7) running end-to-end on synthetic data. Hybrid deterministic + LLM. Falls back to mock mode without an API key.
Weighted eval suite — pass^k=5 production threshold. Adversarial cases per major risk. Both prototypes verified at 100% weighted pass.
Stakeholder role-play prompts — 8 (Calder) and 7 (Helix) Claude prompts you paste in to conduct interactive discovery interviews.

What's in the repo

Simulations

	Time	Output	Use
`1-full-engagement`	20-40 hrs over 2-4 weeks	Discovery memo, wedge proposal, working Python prototype, weighted eval suite, field memo, retrospective	The headline simulation. Run the FDE role end-to-end on Calder + Helix.
`2-take-home-5h`	6 hrs (5h build + 1h Review)	Repo + 4-slide deck + 5-min video + eval results	Compressed version. Build under a real clock, then defend the build live.
`3-recommendation-60min`	1 hr	Structured live conversation	The 60-min recommendation-only format. No artifact, just live structuring.
`4-client-simulation`	20-30 min × 5 scenarios	Live de-escalation transcript	Hostile-customer role-play (failing demo, SLA breach, scope dispute). Practice ownership language and composure.

Working prototypes

Both prototypes run end-to-end without an API key. Drafters fall back to deterministic mock output; set ANTHROPIC_API_KEY for real LLM calls.

# Calder
cd simulations/1-full-engagement/calder-insurance/02_week2_solution/prototype
python scripts/run_e2e.py    # processes one synthetic FNOL through 5 agents
python scripts/run_eval.py   # 5-case weighted eval at pass^k=5

# Helix
cd simulations/1-full-engagement/helix-finance/02_week2_solution/prototype
python scripts/run_e2e.py    # processes one synthetic earnings call through 7 agents
python scripts/run_eval.py   # 5-case weighted eval at pass^k=5

Each agent run produces an examiner-readable audit trace.

Frameworks (used across all simulations)

11 portable frameworks that show up in every FDE engagement:

4-source convergence — Buyer / Brief / Industry / Operator triangulation for discovery
3-lens scaffold — Customer / Product / Technical for any agent design; AI-capability-relationship segmentation
Outcome Risk Matrix — Value × Risk-of-irreversible-failure for wedge selection
Workflow decomposition — 5-step method for drawing agent boundaries from a manual process
Agent shapes catalog — 7 standard agent shapes (Extractor, Classifier, Synthesizer, Critic, Compliance critic, Router, Auditor)
Model-vs-application-layer — Tag every solution as model-layer vs application-layer vs both (sequenced)
4-dimensional testing — Static eval + Pass^k + Adversarial + Production observability
Behavioral story types — 5 required story templates for the customer-facing craft
Ownership language guide — Action verb + first person + specific outcome + business consequence
Company calibration — Employer archetypes + comp bands
Consulting & strategy frameworks — Trusted Advisor formula, the Delta Concept (Palantir-origin), Three Whys, Cost-of-Inaction, C.A.S.E. and DASME meta-structures

Tools

tools/agent_design_practice.html — interactive 3-lens whiteboard. Case-aware: select Calder or Helix to load the case study in-character with stakeholder + numbers populated. Or pick one of 5 generic practice prompts. Single-file HTML, opens offline.
- Deep-link: tools/agent_design_practice.html?case=calder or ?case=helix

Reading list

A curated set of books, papers, and blogs for deeper FDE craft: see READING.md. The Trusted Advisor (Maister), Pyramid Principle (Minto), Designing Data-Intensive Applications (Kleppmann), Good Strategy / Bad Strategy (Rumelt), plus papers (Attention, ReAct, Constitutional AI) and active blogs. Structured as a 30-day sprint or a 3-month / 6-month arc.

Repo structure

fde-simulation/
├── README.md                          # This file
├── QUICKSTART.md                      # 90-sec on-ramp
├── COVERAGE_MAP.md                    # Sim format <-> interview-signal matrix + role calibration
├── READING.md                         # Curated reading list
├── LICENSE                            # MIT
├── demo.sh / demo.ps1                 # One-command tour
│
├── scoring/                           # Credential-layer tooling (Python CLIs)
│   ├── grade.py                       # Numeric rubric grade per phase vs reference
│   ├── bundle.py                      # Walk portfolio dir, emit submittable Markdown pack
│   ├── rubrics.py                     # Machine-readable rubric definitions
│   └── README.md                      # Setup + end-to-end flow
│
├── simulations/
│   ├── 1-full-engagement/             # 20-40 hours, multi-week
│   │   ├── RETRO_TEMPLATE.md          # End-of-engagement retrospective
│   │   ├── PORTFOLIO_TEMPLATE.md      # How to package as a portfolio piece
│   │   ├── SKIP_AHEAD.md              # 15-hour senior path for second case
│   │   ├── GRADE_YOUR_WORK.md         # Paste-into-Claude grading prompts
│   │   ├── POST_MORTEM_PROMPT.md      # Trap-detection prompt; runs after all artifacts exist
│   │   ├── calder-insurance/          # Full 4-week engagement on the FNOL case
│   │   │   ├── EXPERT_TRAPS.md        # 5 traps a real FDE catches (do not read pre-attempt)
│   │   │   └── ORAL_GRILL.md          # 5-min hostile post-take-home defense Claude prompt
│   │   └── helix-finance/             # Same shape, with its own EXPERT_TRAPS + ORAL_GRILL
│   ├── 2-take-home-5h/                # 6 hours (5h build + 1h Review)
│   ├── 3-recommendation-60min/        # 1 hour live conversation
│   └── 4-client-simulation/           # Live customer-handling role-play
│
├── frameworks/                        # 11 portable FDE frameworks
└── tools/                             # Interactive 3-lens whiteboard

How the engagement actually plays out

For Calder (Helix follows the same shape with different numbers):

CASE_BRIEF.html opens in your browser — the case study. Maria's quote, 11 stakeholders, the 9-row metrics table, the working hypothesis.
START_HERE.md sets the clock — Maria's email arrives at 7:45 AM Tuesday. 9 AM kickoff in 70 minutes.
Pre-fill your 3-lens whiteboard — open tools/agent_design_practice.html?case=calder, draft your stakeholder map and wedge hypothesis in Practice mode.
Run the kickoff with Claude playing Maria — paste Prompt 1 from STAKEHOLDER_INTERVIEWS.md into a Claude chat. Maria opens the meeting; you conduct discovery.
Run the other 7 interviews through the week — Greg, Priya, Tom, Marcus, Rachel, frontline adjuster, Anil. Claude plays each in character.
Synthesize a discovery memo of your own. Score it: python -m scoring.grade phase1 my_discovery.md --case calder --json-output grades/phase1.json (or use the paste-into-Claude prompts in GRADE_YOUR_WORK.md).
Week 2-3: design + build — apply the 3-lens scaffold, score wedge candidates on the Outcome Risk Matrix, decompose the workflow into 5-7 agents, extend the prototype scaffold, ship a weighted eval suite. Grade each phase via scoring.grade phase2 and scoring.grade phase3.
Week 4: validate + handoff — run the 20-draft review with Rachel/Janet (Claude plays them), the hostile-review with Carmen/Tom, the operational handoff to Aditya/Anil. Write the field memo and grade it via scoring.grade phase4.
Post-mortem: run POST_MORTEM_PROMPT.md against your artifacts + the case's EXPERT_TRAPS.md. Claude tells you which of the 5 traps you walked into, with quoted evidence and a fix-first list ordered by leverage.
Defend it verbally: paste the case's ORAL_GRILL.md into a fresh Claude chat. 5 minutes, hostile tone, 6-8 grill questions. Score yourself against the 5-dimension rubric at the bottom.
Bundle the engagement: python -m scoring.bundle ~/my-fde-portfolio/calder/ --include-grades ~/my-fde-portfolio/calder/grades/ -o calder_bundle.md. The output is a single Markdown pack you can attach to a take-home or share as a portfolio sample.
End of week 4: fill in RETRO_TEMPLATE.md; package via PORTFOLIO_TEMPLATE.md.

The deliverable is real: a working agent workforce, a weighted eval suite, a discovery memo, a field memo, and a retrospective — all numerically scored, trap-checked, and bundled. All on synthetic data, all reusable as a portfolio piece.

Why this is useful

For people doing FDE work today: a reference set of frameworks and reusable scaffolds. Both prototype patterns (Calder + Helix) port to roughly any regulated-industry deployment.
For people learning the role: end-to-end simulations that show what the job actually involves, with reference solutions per phase that calibrate against senior performance.
For people interviewing for FDE roles: the credential layer (numeric scoring, trap detection, hostile oral grill, bundle exporter) converts the engagement output into a defendable portfolio piece. See scoring/README.md for the end-to-end flow and simulations/1-full-engagement/PORTFOLIO_TEMPLATE.md for how to package the work.

Contributing

Issues and PRs welcome. Particularly interested in:

Additional case studies in new domains (healthcare prior-auth, legal contract review, manufacturing change-management, customer-support automation)
Eval suite improvements for the prototype scaffolds (more adversarial cases, multi-tenant variance tests)
Better stakeholder personas for the discovery role-play
Cross-language translations of the frameworks

License

MIT. See LICENSE.

Acknowledgments

Built with Claude Code. The 3-lens framework draws on patterns common in product design and customer-discovery literature; this repo applies them to agent-workforce design. The case studies, prototype scaffolds, frameworks, and reference solutions are original to this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FDE Simulation

About

Quickstart

Driving the simulations with Claude Code

The credential layer — what turns practice into proof

Two case studies — full 4-week engagements

What's in the repo

Simulations

Working prototypes

Frameworks (used across all simulations)

Tools

Reading list

Repo structure

How the engagement actually plays out

Why this is useful

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
frameworks		frameworks
scoring		scoring
simulations		simulations
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
COVERAGE_MAP.md		COVERAGE_MAP.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
READING.md		READING.md
README.md		README.md
USING_CLAUDE_CODE.md		USING_CLAUDE_CODE.md
demo.ps1		demo.ps1
demo.sh		demo.sh

Folders and files

Latest commit

History

Repository files navigation

FDE Simulation

About

Quickstart

Driving the simulations with Claude Code

The credential layer — what turns practice into proof

Two case studies — full 4-week engagements

What's in the repo

Simulations

Working prototypes

Frameworks (used across all simulations)

Tools

Reading list

Repo structure

How the engagement actually plays out

Why this is useful

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages