Skip to content

Fix eval regression from PR#76: soften State Storage Rule#83

Open
pkosiec wants to merge 2 commits into
mainfrom
pkosiec/fix-regression
Open

Fix eval regression from PR#76: soften State Storage Rule#83
pkosiec wants to merge 2 commits into
mainfrom
pkosiec/fix-regression

Conversation

@pkosiec
Copy link
Copy Markdown
Member

@pkosiec pkosiec commented May 20, 2026

Summary

PR #76 introduced an aggressive State Storage Rule in skills/databricks-apps/SKILL.md that caused 16 app regressions in the May 19 nightly eval (8 high-impact with >0.3 score drop). Analytics apps like property_search_app, host_onboarding_checklist, and cb_brickhouse_advanced were incorrectly pushed toward Lakebase, dropping to 0.00.

Root Cause

The State Storage Rule auto-detected Lakebase need for any app mentioning state-like terms ("preferences", "bookmarks"), with forceful language ("Do not wait for the user to ask", "This is not optional") that removed user agency.

Changes (4 targeted edits in skills/databricks-apps/SKILL.md)

  1. Revert description metadata — remove "Auto-detects need for Lakebase when app stores state"
  2. Revert scaffolding phase reference — remove State Storage Rule mention from Required Reading table
  3. Replace the State Storage Rule with softer guidance:
    • Removed "preferences, bookmarks" from trigger list (too broad for analytics apps)
    • Changed "Do not wait for the user to ask" → "Ask the user" (restores user agency)
    • Removed "This is not optional" (was too forceful)
    • Added explicit exclusion for analytics/dashboard apps
    • Still recommends Lakebase for genuine CRUD/state storage needs
    • Still routes to Decision Gate for hybrid apps (analytics + state)
  4. Revert Decision Gate skip clause — restore original simpler wording

What's kept from PR#76

  • All databricks-lakebase/SKILL.md improvements (new references, JSON path table, pgvector)
  • All lakebase.md reference updates (Chat Persistence Pattern, onPluginsReady, naming conventions)
  • All model-serving.md changes (Model Serving apps actually improved)
  • Post-Deploy Verification section

Test plan

  • python3 scripts/skills.py validate passes
  • May 20 nightly eval confirms regression is fixed

Fixes: LKB-12991

This pull request and its description were written by Isaac.

pkosiec added 2 commits May 20, 2026 17:48
PR#76 introduced an aggressive State Storage Rule that auto-detected
Lakebase need for any app mentioning state-like terms (preferences,
bookmarks, etc.), causing 16 app regressions in the May 19 nightly eval.
Analytics apps like property_search_app and host_onboarding_checklist
were incorrectly pushed toward Lakebase, dropping to 0.00.

Changes:
- Replace aggressive auto-detect with softer guidance that asks the user
- Remove "preferences, bookmarks" from trigger list (too broad)
- Restore user agency ("Ask the user" vs "Do not wait for the user")
- Explicitly exclude analytics/dashboard apps from Lakebase push
- Revert description metadata and Decision Gate skip clause
- Still recommends Lakebase for genuine CRUD/state storage needs

Fixes: LKB-12991

Co-authored-by: Isaac
The previous commit removed the State Storage reference from the
Required Reading table entirely. This creates a discoverability gap:
agents jumping from the table to the Decision Gate skip the State
Storage Guidance section. Restore it with softer "review" language
(vs the old aggressive "evaluate the State Storage Rule") to preserve
the flow for CRUD apps that need Lakebase without re-triggering the
regression on analytics apps.

Co-authored-by: Isaac
@pkosiec pkosiec marked this pull request as ready for review May 20, 2026 16:07
@pkosiec pkosiec requested a review from a team as a code owner May 20, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant