Skip to content

kalyvask/fde-simulation

Repository files navigation

FDE Simulation

Hands-on simulations of the Forward Deployed Engineer role. Run a full customer engagement on synthetic but realistic case studies — identify problems, scope wedges, build working agent prototypes, ship eval suites, hand off to production. Then score every artifact against the reference, detect the traps a senior FDE would catch, defend your build under a 5-minute hostile grill, and bundle the result as a portfolio piece.

About

Forward Deployed Engineering (and its equivalents — Solutions Architect, AI Strategist, Agent Strategist, Deployed PM) is customer-embedded technical product work. The day-to-day is: scope an enterprise engagement, identify the highest-value workflow to automate, design a multi-agent system, build a prototype, validate with stakeholders, hand off to production.

There's almost no public material that simulates this end-to-end. Most "AI engineering" tutorials skip the customer-engagement work; most product courses skip the agent architecture. This repo fills the gap with runnable simulations grounded in the actual shape of the job.

What you can do with it:

  • Run a 4-week customer engagement end-to-end on two fictional cases (insurance + finance), with synthetic data and working Python agent prototypes
  • Identify and scope problems using six discovery frameworks on a real-feeling customer brief (12-stakeholder political maps, kill-criteria framing, 4-source convergence)
  • Build agents by extending the prototype scaffolds — 5-7 specialized agents per case, hybrid deterministic + LLM, examiner-readable audit traces
  • Ship eval suites with pass^k=5 production thresholds, weighted by failure cost, including adversarial cases
  • Practice the customer-facing craft via Claude-roleplay stakeholder interviews and live customer-simulation rounds
  • Score your work against the reference solution per phase via a structured rubric (0-3 × 5 dimensions) with a JSON sidecar so progress across attempts is trackable
  • Detect expert traps — the 5 places per case where a candidate's first instinct produces a defensible-looking artifact that a senior FDE would reject
  • Defend the build verbally under a 5-minute hostile oral grill that mirrors how OpenAI / Anthropic / Sierra / Palantir actually interview for these roles
  • Bundle the engagement into a single submittable Markdown pack (memos + prototype tree + key files + eval output + grade summary) ready to attach to a take-home or share as a portfolio sample

The same material doubles as comprehensive interview prep for FDE-style roles at frontier AI labs and AI workforce platforms. That use is documented but secondary; the primary purpose is hands-on role simulation.

Quickstart

90 seconds from clone to your first agent run: see QUICKSTART.md.

git clone https://github.com/kalyvask/fde-simulation.git
cd fde-simulation
bash demo.sh        # macOS / Linux / WSL
.\demo.ps1          # Windows

Runs both prototypes end-to-end at pass^k=5 production threshold. No API key required.

Driving the simulations with Claude Code

If you use Claude Code as your operating environment, the repo is designed for it: CLAUDE.md auto-loads on session start with repo structure, stakeholder personas, and behavioral conventions (do/don't ghostwrite, grade harshly, stay in character). See USING_CLAUDE_CODE.md for the 8 patterns you'll use most — opening an engagement, stakeholder role-play, per-phase grading, extending prototypes, running the 60-min take-home Review, client-simulation scenarios, engagement state coaching, and frameworks-applied review.

The credential layer — what turns practice into proof

Five tools sit on top of the simulations to convert "I ran the sim" into "I scored 82 on Calder, walked into 1 of 5 expert traps, and have a defendable bundle":

Tool What it does Where
Coverage map One-page matrix: each sim format → which interview signals it exercises, plus which company / role each format calibrates against COVERAGE_MAP.md
Verifiable scoring CLI that grades a phase artifact against the reference using a structured rubric (0-3 × 5 dimensions); produces JSON sidecar + Markdown report; prompt-cached for cheap iteration scoring/
Expert traps + post-mortem 5 traps per case that a senior FDE catches; Claude prompt that reads your artifacts and tells you which traps you walked into simulations/1-full-engagement/<case>/EXPERT_TRAPS.md + POST_MORTEM_PROMPT.md
5-min hostile oral grill Claude prompt that runs a hostile post-take-home defense round; 8-10 grill questions, 5 ground rules, scoring rubric simulations/1-full-engagement/<case>/ORAL_GRILL.md
Artifact bundle exporter Walks your portfolio directory and emits a single submittable Markdown pack (memos + prototype tree + key files + eval output + grade summary) scoring/bundle.py

End-to-end loop: run the engagement, score each phase with scoring.grade, run the post-mortem to detect traps, fix the highest-leverage gap, re-grade, run the oral grill, bundle for portfolio submission. See scoring/README.md for the exact commands.

Two case studies — full 4-week engagements

Case Domain Customer (fictional) Wedge
Calder Insurance Personal-lines auto insurance / FNOL $300M GWP, 14 NE states, 600 claims/day First-Notice-of-Loss agent workforce; replace one BPO contract + cut LAE ratio by 140bps
Helix Capital Long-short equity hedge fund $2.3B AUM, 12 investment professionals, 80-name coverage book Citation-grounded morning-after-earnings note drafter; 4h analyst task → 30min review

Each case includes:

  • CASE_BRIEF.html — consulting-style case-study reading page. Stakeholders, headline numbers, kill-criteria, working hypothesis. Open in browser; print-friendly.
  • START_HERE.md — in-character engagement kickoff. The customer's email arrives at a specific timestamp; engagement clock starts.
  • 00_brief.md — formal customer brief in markdown.
  • 4-week structure — discovery → solution → validation → handoff. Each week has reference solutions you compare against AFTER your own attempt.
  • Working Python prototype — 5-7 agents (Calder: 5; Helix: 7) running end-to-end on synthetic data. Hybrid deterministic + LLM. Falls back to mock mode without an API key.
  • Weighted eval suite — pass^k=5 production threshold. Adversarial cases per major risk. Both prototypes verified at 100% weighted pass.
  • Stakeholder role-play prompts — 8 (Calder) and 7 (Helix) Claude prompts you paste in to conduct interactive discovery interviews.

What's in the repo

Simulations

Time Output Use
1-full-engagement 20-40 hrs over 2-4 weeks Discovery memo, wedge proposal, working Python prototype, weighted eval suite, field memo, retrospective The headline simulation. Run the FDE role end-to-end on Calder + Helix.
2-take-home-5h 6 hrs (5h build + 1h Review) Repo + 4-slide deck + 5-min video + eval results Compressed version. Build under a real clock, then defend the build live.
3-recommendation-60min 1 hr Structured live conversation The 60-min recommendation-only format. No artifact, just live structuring.
4-client-simulation 20-30 min × 5 scenarios Live de-escalation transcript Hostile-customer role-play (failing demo, SLA breach, scope dispute). Practice ownership language and composure.

Working prototypes

Both prototypes run end-to-end without an API key. Drafters fall back to deterministic mock output; set ANTHROPIC_API_KEY for real LLM calls.

# Calder
cd simulations/1-full-engagement/calder-insurance/02_week2_solution/prototype
python scripts/run_e2e.py    # processes one synthetic FNOL through 5 agents
python scripts/run_eval.py   # 5-case weighted eval at pass^k=5

# Helix
cd simulations/1-full-engagement/helix-finance/02_week2_solution/prototype
python scripts/run_e2e.py    # processes one synthetic earnings call through 7 agents
python scripts/run_eval.py   # 5-case weighted eval at pass^k=5

Each agent run produces an examiner-readable audit trace.

Frameworks (used across all simulations)

11 portable frameworks that show up in every FDE engagement:

  1. 4-source convergence — Buyer / Brief / Industry / Operator triangulation for discovery
  2. 3-lens scaffold — Customer / Product / Technical for any agent design; AI-capability-relationship segmentation
  3. Outcome Risk Matrix — Value × Risk-of-irreversible-failure for wedge selection
  4. Workflow decomposition — 5-step method for drawing agent boundaries from a manual process
  5. Agent shapes catalog — 7 standard agent shapes (Extractor, Classifier, Synthesizer, Critic, Compliance critic, Router, Auditor)
  6. Model-vs-application-layer — Tag every solution as model-layer vs application-layer vs both (sequenced)
  7. 4-dimensional testing — Static eval + Pass^k + Adversarial + Production observability
  8. Behavioral story types — 5 required story templates for the customer-facing craft
  9. Ownership language guide — Action verb + first person + specific outcome + business consequence
  10. Company calibration — Employer archetypes + comp bands
  11. Consulting & strategy frameworks — Trusted Advisor formula, the Delta Concept (Palantir-origin), Three Whys, Cost-of-Inaction, C.A.S.E. and DASME meta-structures

Tools

  • tools/agent_design_practice.html — interactive 3-lens whiteboard. Case-aware: select Calder or Helix to load the case study in-character with stakeholder + numbers populated. Or pick one of 5 generic practice prompts. Single-file HTML, opens offline.
    • Deep-link: tools/agent_design_practice.html?case=calder or ?case=helix

Reading list

A curated set of books, papers, and blogs for deeper FDE craft: see READING.md. The Trusted Advisor (Maister), Pyramid Principle (Minto), Designing Data-Intensive Applications (Kleppmann), Good Strategy / Bad Strategy (Rumelt), plus papers (Attention, ReAct, Constitutional AI) and active blogs. Structured as a 30-day sprint or a 3-month / 6-month arc.

Repo structure

fde-simulation/
├── README.md                          # This file
├── QUICKSTART.md                      # 90-sec on-ramp
├── COVERAGE_MAP.md                    # Sim format <-> interview-signal matrix + role calibration
├── READING.md                         # Curated reading list
├── LICENSE                            # MIT
├── demo.sh / demo.ps1                 # One-command tour
│
├── scoring/                           # Credential-layer tooling (Python CLIs)
│   ├── grade.py                       # Numeric rubric grade per phase vs reference
│   ├── bundle.py                      # Walk portfolio dir, emit submittable Markdown pack
│   ├── rubrics.py                     # Machine-readable rubric definitions
│   └── README.md                      # Setup + end-to-end flow
│
├── simulations/
│   ├── 1-full-engagement/             # 20-40 hours, multi-week
│   │   ├── RETRO_TEMPLATE.md          # End-of-engagement retrospective
│   │   ├── PORTFOLIO_TEMPLATE.md      # How to package as a portfolio piece
│   │   ├── SKIP_AHEAD.md              # 15-hour senior path for second case
│   │   ├── GRADE_YOUR_WORK.md         # Paste-into-Claude grading prompts
│   │   ├── POST_MORTEM_PROMPT.md      # Trap-detection prompt; runs after all artifacts exist
│   │   ├── calder-insurance/          # Full 4-week engagement on the FNOL case
│   │   │   ├── EXPERT_TRAPS.md        # 5 traps a real FDE catches (do not read pre-attempt)
│   │   │   └── ORAL_GRILL.md          # 5-min hostile post-take-home defense Claude prompt
│   │   └── helix-finance/             # Same shape, with its own EXPERT_TRAPS + ORAL_GRILL
│   ├── 2-take-home-5h/                # 6 hours (5h build + 1h Review)
│   ├── 3-recommendation-60min/        # 1 hour live conversation
│   └── 4-client-simulation/           # Live customer-handling role-play
│
├── frameworks/                        # 11 portable FDE frameworks
└── tools/                             # Interactive 3-lens whiteboard

How the engagement actually plays out

For Calder (Helix follows the same shape with different numbers):

  1. CASE_BRIEF.html opens in your browser — the case study. Maria's quote, 11 stakeholders, the 9-row metrics table, the working hypothesis.
  2. START_HERE.md sets the clock — Maria's email arrives at 7:45 AM Tuesday. 9 AM kickoff in 70 minutes.
  3. Pre-fill your 3-lens whiteboard — open tools/agent_design_practice.html?case=calder, draft your stakeholder map and wedge hypothesis in Practice mode.
  4. Run the kickoff with Claude playing Maria — paste Prompt 1 from STAKEHOLDER_INTERVIEWS.md into a Claude chat. Maria opens the meeting; you conduct discovery.
  5. Run the other 7 interviews through the week — Greg, Priya, Tom, Marcus, Rachel, frontline adjuster, Anil. Claude plays each in character.
  6. Synthesize a discovery memo of your own. Score it: python -m scoring.grade phase1 my_discovery.md --case calder --json-output grades/phase1.json (or use the paste-into-Claude prompts in GRADE_YOUR_WORK.md).
  7. Week 2-3: design + build — apply the 3-lens scaffold, score wedge candidates on the Outcome Risk Matrix, decompose the workflow into 5-7 agents, extend the prototype scaffold, ship a weighted eval suite. Grade each phase via scoring.grade phase2 and scoring.grade phase3.
  8. Week 4: validate + handoff — run the 20-draft review with Rachel/Janet (Claude plays them), the hostile-review with Carmen/Tom, the operational handoff to Aditya/Anil. Write the field memo and grade it via scoring.grade phase4.
  9. Post-mortem: run POST_MORTEM_PROMPT.md against your artifacts + the case's EXPERT_TRAPS.md. Claude tells you which of the 5 traps you walked into, with quoted evidence and a fix-first list ordered by leverage.
  10. Defend it verbally: paste the case's ORAL_GRILL.md into a fresh Claude chat. 5 minutes, hostile tone, 6-8 grill questions. Score yourself against the 5-dimension rubric at the bottom.
  11. Bundle the engagement: python -m scoring.bundle ~/my-fde-portfolio/calder/ --include-grades ~/my-fde-portfolio/calder/grades/ -o calder_bundle.md. The output is a single Markdown pack you can attach to a take-home or share as a portfolio sample.
  12. End of week 4: fill in RETRO_TEMPLATE.md; package via PORTFOLIO_TEMPLATE.md.

The deliverable is real: a working agent workforce, a weighted eval suite, a discovery memo, a field memo, and a retrospective — all numerically scored, trap-checked, and bundled. All on synthetic data, all reusable as a portfolio piece.

Why this is useful

  • For people doing FDE work today: a reference set of frameworks and reusable scaffolds. Both prototype patterns (Calder + Helix) port to roughly any regulated-industry deployment.
  • For people learning the role: end-to-end simulations that show what the job actually involves, with reference solutions per phase that calibrate against senior performance.
  • For people interviewing for FDE roles: the credential layer (numeric scoring, trap detection, hostile oral grill, bundle exporter) converts the engagement output into a defendable portfolio piece. See scoring/README.md for the end-to-end flow and simulations/1-full-engagement/PORTFOLIO_TEMPLATE.md for how to package the work.

Contributing

Issues and PRs welcome. Particularly interested in:

  • Additional case studies in new domains (healthcare prior-auth, legal contract review, manufacturing change-management, customer-support automation)
  • Eval suite improvements for the prototype scaffolds (more adversarial cases, multi-tenant variance tests)
  • Better stakeholder personas for the discovery role-play
  • Cross-language translations of the frameworks

License

MIT. See LICENSE.

Acknowledgments

Built with Claude Code. The 3-lens framework draws on patterns common in product design and customer-discovery literature; this repo applies them to agent-workforce design. The case studies, prototype scaffolds, frameworks, and reference solutions are original to this repo.

About

Hands-on Forward Deployed Engineer role simulations: scope an enterprise engagement, build multi-agent prototypes, ship eval suites, score artifacts against a reference, detect expert traps, and defend the build under a timed oral grill.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors