You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today's success rate (81.0%) is the highest in the past 7 days — up from last week's ~80.4–80.7% plateau.
Prompt Categories and Success Rates
Category
Total
Merged
Closed
Success Rate
🧪 Test/Coverage
34
32
2
94%
📚 Documentation
39
33
6
84%
🐛 Bug Fix
250
203
47
81%
✨ Feature
234
190
43
81%
🔀 Other
254
207
47
81%
🔧 Update/Sync
55
42
13
76%
♻️ Refactor
134
102
32
76%
Key finding: Test and documentation PRs consistently outperform — while lower in volume they set a high bar for quality and scope clarity.
Prompt Body Length Analysis
Body Length
Total
Merged
Success Rate
Short < 500 chars
46
32
69%
Medium 500–2k chars
706
592
83% ✅
Long 2k–5k chars
163
128
78%
Very Long 5k+ chars
84
57
67% ⚠️
Average body size: merged PRs = 4,305 chars | closed PRs = 6,578 chars
The sweet spot is 500–2,000 characters (83% merge rate). Paradoxically, very long bodies (5k+) correlate with the lowest merge rate, suggesting over-specification, scope creep, or unresolved complexity.
✅ Successful Prompt Patterns
Characteristics of merged PR bodies:
Average body: ~4,305 chars (medium length)
Clear, specific scope with problem statement + change summary
Structured with headers and bullet points
References concrete files, functions, or test cases
View example successful prompts
PR #36494 — fix: correct GraphQL mutation name to markPullRequestReadyForReview → Merged
mark_pull_request_as_ready_for_review was calling markPullRequestAsReadyForReview (with "As") — a mutation that doesn't exist in the GitHub GraphQL schema — causing a hard failure on every invocation. ## Changes - mark_pull_request_as_ready_for_review.cjs — Rename mutation in the Graph...`
PR #36521 — Align max-turns integration test with current frontmatter schema semantics → Merged
CI was failing in pkg/workflow because TestMaxTurnsValidation still treated a string literal max-turns value as valid. Current schema rules reject string literals for max-turns and only accept numeric values (or expression-based forms in allowed contexts). - Problem alignment — Update...
applyTo was silently accepted by the parser's validFields allowlist but absent from the JSON schema and undocumented — any workflow with applyTo: would pass parser validation without ever being schema-validated. Changes: pkg/parser/include_processor.go — Remove "applyTo": true...
Common traits: Specific root cause identification, concrete file paths, before/after behavior described, structured change list.
❌ Unsuccessful Prompt Patterns
Characteristics of closed PR bodies:
Average body: ~6,578 chars (notably longer than merged!)
Often contain [WIP] prefix in title
Very long descriptions that signal unresolved scope or complexity
Broad refactoring without clear boundaries
View example closed prompts
PR #36512 — Fix CJS typecheck regression by removing nullable sdkPrompt flow in Copilot harness → Closed
Body length: 1,619 chars. Despite specific framing, this was closed — likely superseded or replaced by a different approach.
Body length: 56,610 chars — the longest closed PR. Massive auto-generated content inflates body without adding clarity.
Common traits: WIP status, very large auto-generated diff descriptions, broad refactoring scope, version bumps with lengthy changelogs.
Key Insights
Test PRs lead with 94% success — prompts that add test coverage or fix test failures are highly targeted and rarely fail. When writing prompts, framing a change as "add a regression test for X" signals clear, bounded scope.
The body length paradox — closed PRs average 53% more characters than merged PRs (6,578 vs 4,305 chars). Longer bodies often signal either scope creep, over-engineered solutions, or auto-generated noise. Aim for 500–2,000 characters.
Refactoring and updates need tighter scope — both categories show 76% success vs the 81% baseline. Broad refactoring PRs ("fix all lint violations") are significantly more likely to be closed than targeted ones ("extract buildX helper from compileY").
Recommendations
DO: Aim for 500–2,000 character bodies. State the problem, root cause, and key changes concisely. Avoid auto-generated verbose changelogs.
DO: Frame scope narrowly — reference specific files, functions, or test names. "Fix markPullRequestReadyForReview mutation name in update_activation_comment.cjs" outperforms "fix GraphQL issues".
DO: Add a regression test in the same PR as a bug fix — test+fix combos land in the highest-performing category (94% merge rate).
AVOID: [WIP] prefixes — historically correlate with ~47% merge rate vs 82% for non-WIP.
AVOID: Very long bodies (5k+ chars). If your description exceeds 5,000 characters, the PR scope is likely too broad. Split it.
Historical Trends (Last 7 Days)
Date
PRs
Success Rate
Top Category
2026-06-02
1,000
81.0% ⬆️
other/general
2026-06-01
1,000
80.4%
bug_fix
2026-05-30
1,000
80.5%
bug_fix
2026-05-28
1,000
80.5%
bug_fix
2026-05-27
1,000
80.7%
bug_fix
2026-05-26
1,000
80.6%
bug_fix
2026-05-25
1,000
80.6%
bug_fix
Trend: Success rate has been remarkably stable at ~80.5–81.0% for two weeks. Today's slight uptick (81.0%) is encouraging. Bug fix prompts dominate volume, but test and docs prompts continue to lead on quality/success rate.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analysis Period: Last 30 days | Run: §26849433675
Today's success rate (81.0%) is the highest in the past 7 days — up from last week's ~80.4–80.7% plateau.
Prompt Categories and Success Rates
Key finding: Test and documentation PRs consistently outperform — while lower in volume they set a high bar for quality and scope clarity.
Prompt Body Length Analysis
Average body size: merged PRs = 4,305 chars | closed PRs = 6,578 chars
The sweet spot is 500–2,000 characters (83% merge rate). Paradoxically, very long bodies (5k+) correlate with the lowest merge rate, suggesting over-specification, scope creep, or unresolved complexity.
✅ Successful Prompt Patterns
Characteristics of merged PR bodies:
View example successful prompts
PR #36494 — fix: correct GraphQL mutation name to markPullRequestReadyForReview → Merged
PR #36521 — Align max-turns integration test with current frontmatter schema semantics → Merged
PR #36309 — Remove vestigial
applyTofrom parser validFields allowlist → MergedCommon traits: Specific root cause identification, concrete file paths, before/after behavior described, structured change list.
❌ Unsuccessful Prompt Patterns
Characteristics of closed PR bodies:
[WIP]prefix in titleView example closed prompts
PR #36512 — Fix CJS typecheck regression by removing nullable
sdkPromptflow in Copilot harness → ClosedPR #36054 — Fix pkg/workflow function length violations (lint-monster) → Closed
PR #33490 — chore: bump default agentic CLI/MCP versions and regenerate compiled workflow artifacts → Closed
Common traits: WIP status, very large auto-generated diff descriptions, broad refactoring scope, version bumps with lengthy changelogs.
Key Insights
Test PRs lead with 94% success — prompts that add test coverage or fix test failures are highly targeted and rarely fail. When writing prompts, framing a change as "add a regression test for X" signals clear, bounded scope.
The body length paradox — closed PRs average 53% more characters than merged PRs (6,578 vs 4,305 chars). Longer bodies often signal either scope creep, over-engineered solutions, or auto-generated noise. Aim for 500–2,000 characters.
Refactoring and updates need tighter scope — both categories show 76% success vs the 81% baseline. Broad refactoring PRs ("fix all lint violations") are significantly more likely to be closed than targeted ones ("extract
buildXhelper fromcompileY").Recommendations
DO: Aim for 500–2,000 character bodies. State the problem, root cause, and key changes concisely. Avoid auto-generated verbose changelogs.
DO: Frame scope narrowly — reference specific files, functions, or test names. "Fix
markPullRequestReadyForReviewmutation name inupdate_activation_comment.cjs" outperforms "fix GraphQL issues".DO: Add a regression test in the same PR as a bug fix — test+fix combos land in the highest-performing category (94% merge rate).
AVOID:
[WIP]prefixes — historically correlate with ~47% merge rate vs 82% for non-WIP.AVOID: Very long bodies (5k+ chars). If your description exceeds 5,000 characters, the PR scope is likely too broad. Split it.
Historical Trends (Last 7 Days)
Trend: Success rate has been remarkably stable at ~80.5–81.0% for two weeks. Today's slight uptick (81.0%) is encouraging. Bug fix prompts dominate volume, but test and docs prompts continue to lead on quality/success rate.
References: §26849433675
Beta Was this translation helpful? Give feedback.
All reactions