feat(llm): context augmentation and position diagnosis by ejfn · Pull Request #429 · ejfn/Tractor

ejfn · 2026-06-13T06:22:29Z

This PR introduces programmatic context augmentation and position diagnosis for the LLM strategic AI decision engine.

Changes Included:

Refactoring LLM State/Prompts: Feeds high-level tactical facts and point-flow diagnosis to the LLM (anchored on the 80-point threshold) instead of raw rules.
Position Diagnosis: Added llmPositionDiagnosis.ts to compute contextual metrics (such as point pressure, leading hand options, ruffing/trump-bleeding honesty, etc.).
Tests: Added tests for state signal construction under llmGamePrompt.test.ts.
Cleanup: Removed obsolete design documents/proposals.

…egic LLM rules

The LLM bots played worse than the rule-based AI: the prompt baked in prescriptive strategy ("conserve bosses", "duck low") that a compliant model obeyed dogmatically, misplaying the cases the rules never covered (e.g. preserving an off-suit Ace instead of capturing 20 pts as 4th seat). Shift the contract to "code computes facts + diagnosis; the LLM decides". TypeScript does the cognition a lite model is bad at (counting unseen cards, lookahead, structure extraction, over-ruff survival) and lays out, for each legal play, what it costs and yields in POINTS — then stops. It never ranks, scores, or picks a card. - add llmPositionDiagnosis.ts: per-option-class consequences framed in point-flow (capture/concede/bank/protect/control), equivalent cards collapsed, ruff size / point-card cost / over-ruff risk graded. - llmGamePrompt.ts: drop localBuildSeatGuidance and the lead Rule Score; render the diagnosis as ## Your Options / ## Lead Options. - llmPromptTemplates.ts: trim STATIC_LLM_GAME_RULES to objective mechanics (points, strength, combos, kitty multiplier, follow-legality laws); remove prescriptive strategy; keep the anti-hallucination guard and a light one-sentence reasoning field (no multi-step CoT). - add __tests__/ai/llmGamePrompt.test.ts (point-framed options, pair collapsing, banking, and absence of any recommendation/score string). - delete the superseded context-augmentation proposal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

From a live gemini-2.5-flash-lite game: the model led scarce trump-rank pairs (2♠2♠, 2♥2♥) early because the lead diagnosis editorialised every trump combo as "forces opponents to follow / bleeds their trump" — a tactic nudge, not a fact. It also produced two invalid plays (a pair when a single was required; a hallucinated pair from a singleton). - llmPositionDiagnosis.ts (leading): state trump leads as cost facts — "wins unless a higher trump <kind> remains; spends trump (your ruff/control)", flagging trump-rank/joker combos as scarce — instead of a bleed-trump slogan. De-editorialise off-suit lines too (drop "probe/bleed/test"; state what beats each). - llmPositionDiagnosis.ts (following): prepend an exact-count + two-copies reminder so a pair is only played when two copies are held. - llmAIStrategy.ts: sharpen the card-mapping retry hint with the two-copies rule so a retry can recover from a hallucinated pair. - tests: assert factual trump-lead framing (no bleed/force) and the count/two-copies lead-in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… lead facts Follow-ups from a second gemini-2.5-flash-lite game. Role-objective inversion (defender feeding points): the following options stated concessions point-neutrally ("adds N pts to the opponents' trick"), so a lite model that half-inverts the defender's goal read feeding a King as consistent with "I'm defending". Anchor every concession to the 80 threshold, role-aware — a defender now sees "adds N pts to the attackers' total (toward their 80)", an attacker "gives the defenders N pts — lost from your 80"; banks likewise. Harden the Current State role line to state the win-condition directionality as fact (no "never feed" rule). Trump-strength over-leads (leading SJ/A♠ as "the strongest"): compute from unseen-card accounting whether a trump single is the top live trump (wins) or has a higher trump still out (beatable), instead of a vague "scarce" note. Neutrality pass: winning a trick is never framed as intrinsically good — it is stated as the points it captures plus what it costs. Drop "wins the lead" reward-wording from trump leads, make empty-trick wins read as tempo-only, and fix the off-suit "unbeatable" lines to "wins unless an opponent ruffs" (the unbeatable flag is same-suit only, so a void opponent can still ruff). Tests extended to pin all of the above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ndations (#430) Follow-up to #429, from live gemini-2.5-flash-lite games. Re-centres every lead/follow fact on POINT yield/risk + resource cost (not trick-winning), keeps it all facts/diagnosis (the LLM still decides), and fixes several mis-framings. Following: - Concessions anchored to the 80 threshold, role-aware, so "defending" can't be read as "feed the attackers" (a defender now sees "adds N pts to the attackers' total"; an attacker "gives the defenders N pts — lost from your 80"). - A future-boss card (e.g. your Ace under a led Ace) is listed apart from the trash instead of collapsed into it, so it isn't dumped. - 3rd-seat overtake split: overtaking a SAFE teammate = "no gain"; an UNSAFE one = "shield the N pts from {opp}" — a cost fact, not auto-help. - A 5 is treated as a point card in disposal (split from higher non-points). Leading: - Beatable point-card leads (K/10) flagged as FEEDING points; non-point J/Q as tempo-only. - Trump high single (BJ) = wins but ≈no points + burns your top trump; low single = cheap concede; strong pairs = drain framing gated by dominance context (pairs held / trump still out / pairs already led). - Off-suit "unbeatable" softened to "wins unless ruffed" (the flag is same-suit only); duplicate single-leads collapsed. Other: - Void list caveat: confirmed only by an off-suit discard, so absence ≠ proof. - Neutrality pass: removed all recommendation-creep; only legality laws and the task instruction remain directive. Tests extended to pin all of the above (736 total). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ejfn and others added 7 commits June 13, 2026 13:10

docs: define proposal for programmatic context augmentation and strat…

03947e9

…egic LLM rules

add a log

c379e3e

chore: rotate game logs to latest session state

e61b652

chore: remove obsolete game log file

2997777

ejfn enabled auto-merge (squash) June 13, 2026 06:22

ejfn merged commit 75175a2 into main Jun 13, 2026
1 check passed

ejfn deleted the ejfn/llm-context-augmentation branch June 13, 2026 06:23

ejfn mentioned this pull request Jun 13, 2026

fix(llm): points-centred leading/following facts (no recommendations) #430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): context augmentation and position diagnosis#429

feat(llm): context augmentation and position diagnosis#429
ejfn merged 7 commits into
mainfrom
ejfn/llm-context-augmentation

ejfn commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ejfn commented Jun 13, 2026

Changes Included:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant