Skip to content

feat(llm): context augmentation and position diagnosis#429

Merged
ejfn merged 7 commits into
mainfrom
ejfn/llm-context-augmentation
Jun 13, 2026
Merged

feat(llm): context augmentation and position diagnosis#429
ejfn merged 7 commits into
mainfrom
ejfn/llm-context-augmentation

Conversation

@ejfn

@ejfn ejfn commented Jun 13, 2026

Copy link
Copy Markdown
Owner

This PR introduces programmatic context augmentation and position diagnosis for the LLM strategic AI decision engine.

Changes Included:

  • Refactoring LLM State/Prompts: Feeds high-level tactical facts and point-flow diagnosis to the LLM (anchored on the 80-point threshold) instead of raw rules.
  • Position Diagnosis: Added llmPositionDiagnosis.ts to compute contextual metrics (such as point pressure, leading hand options, ruffing/trump-bleeding honesty, etc.).
  • Tests: Added tests for state signal construction under llmGamePrompt.test.ts.
  • Cleanup: Removed obsolete design documents/proposals.

ejfn and others added 7 commits June 13, 2026 13:10
The LLM bots played worse than the rule-based AI: the prompt baked in
prescriptive strategy ("conserve bosses", "duck low") that a compliant
model obeyed dogmatically, misplaying the cases the rules never covered
(e.g. preserving an off-suit Ace instead of capturing 20 pts as 4th seat).

Shift the contract to "code computes facts + diagnosis; the LLM decides".
TypeScript does the cognition a lite model is bad at (counting unseen
cards, lookahead, structure extraction, over-ruff survival) and lays out,
for each legal play, what it costs and yields in POINTS — then stops. It
never ranks, scores, or picks a card.

- add llmPositionDiagnosis.ts: per-option-class consequences framed in
  point-flow (capture/concede/bank/protect/control), equivalent cards
  collapsed, ruff size / point-card cost / over-ruff risk graded.
- llmGamePrompt.ts: drop localBuildSeatGuidance and the lead Rule Score;
  render the diagnosis as ## Your Options / ## Lead Options.
- llmPromptTemplates.ts: trim STATIC_LLM_GAME_RULES to objective mechanics
  (points, strength, combos, kitty multiplier, follow-legality laws);
  remove prescriptive strategy; keep the anti-hallucination guard and a
  light one-sentence reasoning field (no multi-step CoT).
- add __tests__/ai/llmGamePrompt.test.ts (point-framed options, pair
  collapsing, banking, and absence of any recommendation/score string).
- delete the superseded context-augmentation proposal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From a live gemini-2.5-flash-lite game: the model led scarce trump-rank
pairs (2♠2♠, 2♥2♥) early because the lead diagnosis editorialised every
trump combo as "forces opponents to follow / bleeds their trump" — a
tactic nudge, not a fact. It also produced two invalid plays (a pair when
a single was required; a hallucinated pair from a singleton).

- llmPositionDiagnosis.ts (leading): state trump leads as cost facts —
  "wins unless a higher trump <kind> remains; spends trump (your
  ruff/control)", flagging trump-rank/joker combos as scarce — instead of
  a bleed-trump slogan. De-editorialise off-suit lines too (drop
  "probe/bleed/test"; state what beats each).
- llmPositionDiagnosis.ts (following): prepend an exact-count + two-copies
  reminder so a pair is only played when two copies are held.
- llmAIStrategy.ts: sharpen the card-mapping retry hint with the
  two-copies rule so a retry can recover from a hallucinated pair.
- tests: assert factual trump-lead framing (no bleed/force) and the
  count/two-copies lead-in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… lead facts

Follow-ups from a second gemini-2.5-flash-lite game.

Role-objective inversion (defender feeding points): the following options
stated concessions point-neutrally ("adds N pts to the opponents' trick"), so
a lite model that half-inverts the defender's goal read feeding a King as
consistent with "I'm defending". Anchor every concession to the 80 threshold,
role-aware — a defender now sees "adds N pts to the attackers' total (toward
their 80)", an attacker "gives the defenders N pts — lost from your 80"; banks
likewise. Harden the Current State role line to state the win-condition
directionality as fact (no "never feed" rule).

Trump-strength over-leads (leading SJ/A♠ as "the strongest"): compute from
unseen-card accounting whether a trump single is the top live trump (wins) or
has a higher trump still out (beatable), instead of a vague "scarce" note.

Neutrality pass: winning a trick is never framed as intrinsically good — it is
stated as the points it captures plus what it costs. Drop "wins the lead"
reward-wording from trump leads, make empty-trick wins read as tempo-only, and
fix the off-suit "unbeatable" lines to "wins unless an opponent ruffs" (the
unbeatable flag is same-suit only, so a void opponent can still ruff).

Tests extended to pin all of the above.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ejfn ejfn enabled auto-merge (squash) June 13, 2026 06:22
@ejfn ejfn merged commit 75175a2 into main Jun 13, 2026
1 check passed
@ejfn ejfn deleted the ejfn/llm-context-augmentation branch June 13, 2026 06:23
ejfn added a commit that referenced this pull request Jun 13, 2026
…ndations (#430)

Follow-up to #429, from live gemini-2.5-flash-lite games. Re-centres every
lead/follow fact on POINT yield/risk + resource cost (not trick-winning), keeps
it all facts/diagnosis (the LLM still decides), and fixes several mis-framings.

Following:
- Concessions anchored to the 80 threshold, role-aware, so "defending" can't be
  read as "feed the attackers" (a defender now sees "adds N pts to the attackers'
  total"; an attacker "gives the defenders N pts — lost from your 80").
- A future-boss card (e.g. your Ace under a led Ace) is listed apart from the
  trash instead of collapsed into it, so it isn't dumped.
- 3rd-seat overtake split: overtaking a SAFE teammate = "no gain"; an UNSAFE one
  = "shield the N pts from {opp}" — a cost fact, not auto-help.
- A 5 is treated as a point card in disposal (split from higher non-points).

Leading:
- Beatable point-card leads (K/10) flagged as FEEDING points; non-point J/Q as
  tempo-only.
- Trump high single (BJ) = wins but ≈no points + burns your top trump; low
  single = cheap concede; strong pairs = drain framing gated by dominance context
  (pairs held / trump still out / pairs already led).
- Off-suit "unbeatable" softened to "wins unless ruffed" (the flag is same-suit
  only); duplicate single-leads collapsed.

Other:
- Void list caveat: confirmed only by an off-suit discard, so absence ≠ proof.
- Neutrality pass: removed all recommendation-creep; only legality laws and the
  task instruction remain directive.

Tests extended to pin all of the above (736 total).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant