[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-05-30 #35903
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis. A newer discussion is available at Discussion #36103. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
NLP clustering analysis of 1,000 Copilot agent PRs in
github/gh-awfrom the last 30 days (2026-05-06 → 2026-05-24). Prompts were extracted from PR bodies, cleaned (markdown/code/firewall-warning stripped), TF-IDF vectorized (uni+bigram), and grouped with K-means. Cluster countk=8was selected by silhouette score.Key Findings
[WIP] Fix failing GitHub Actions jobPRs are a distinct low-value cluster. 27 near-duplicate auto-generated attempts (74% merge, 0.9 reviews) — many are throwaway retries against the same failing lint/test/build jobs.Methodology & Limitations
/tmp/gh-aw/agent/prompt-cache/pr-full-data/pr-*.json(1,000 indexed PRs with full body/commits/reviews).[!WARNING] Firewall rules blocked...boilerplate that appears in many bodies (would otherwise dominate TF-IDF). 3 PRs dropped for <30 chars of usable text.TfidfVectorizer(max_features=600, ngram_range=(1,2), min_df=3, max_df=0.6, stop_words='english').aw_info.jsonworkflow metrics (turns/duration/cost) were not joined — these PRs originate from many different workflows and don't map 1:1 to retrievable run logs in this context. Commits, reviews, files-changed, and diff size are used as complexity proxies instead.Cluster Analysis
All 8 clusters (sorted by size)
Cluster 5 — General feature + test changes (catch-all)
emojifrontmatter field", context-propagation fixes).Cluster 4 — Workflow recompile / lock-file / shared imports
.lock.ymlregeneration (e.g. "Recompile workflows..." +21,746). Frequently superseded → below-average merge.Cluster 2 — Bug fixes
Cluster 1 — Prompt / agent / experiment tuning ⭐ highest success
Cluster 0 — PR-review / sous-chef bots 🔁 most iterative
Cluster 7 — AWF / firewall / version & golden-file bumps⚠️ lowest success
Cluster 6 — Observability / OTLP spans & model config
Cluster 3 —
[WIP] Fix failing GitHub Actions jobSuccess Rate by Cluster
Sample PRs by cluster (top 3 by diff size)
create-check-runsafe output typeaw-failure-investigatoraw_contextfallbacks for prompt context${{ experiments.* }}in runtime-importemojifrontmatter fieldRecommendations
[WIP] Fix failing GitHub Actions jobPRs (Cluster 3). Near-identical retries against the same job add noise (0.9 reviews, 26% closed). Gate these on "no existing open WIP PR for this job" before opening a new one.aw_info.jsonmetrics would let us correlate iteration count with prompt cluster directly, replacing the commit/review proxies used here.References: §26681361951
Beta Was this translation helpful? Give feedback.
All reactions