campaign(2k-stars): community health + broader positioning + docs/ + GitHub Action#30
campaign(2k-stars): community health + broader positioning + docs/ + GitHub Action#30hoainho wants to merge 5 commits into
Conversation
…Action
- README: badges row, broader-positioning hero ('regression testing for LLM
agents'), hero GIF placeholder + docs/* learn-more links.
- .github/: CODE_OF_CONDUCT.md, SECURITY.md, FUNDING.yml, issue templates
(bug / feature / case-recipe), PR template, ISSUE_TEMPLATE/config.yml.
- docs/: concepts.md (4 core ideas), comparison.md (vs promptfoo /
DeepEval / Ragas / OpenAI Evals), runners.md (runner abstraction +
langgraph/claude-agent-sdk roadmap), why-not-promptfoo.md (direct
head-to-head), docs/README.md index, docs/assets/demo.tape (vhs script).
- .github/actions/eval-harness/: composite GitHub Action (action.yml +
README.md) — installs jq/yq/opencode/eval-harness, runs against
changed skills, posts job summary with 6-field FAIL, uploads runs/
artifact, exit 12 on regression. Action README with quickstart +
inputs/outputs + examples + pinning + marketplace publishing guide.
- .github/workflows/eval-example.yml: example PR/push integration.
No source-code changes. No test impact.
Part of the campaign/2k-stars roadmap. See [internal doc].
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive repository configuration, templates, and documentation, including issue templates, a pull request template, a Code of Conduct, a Security Policy, and detailed conceptual and comparison guides. It also adds a composite GitHub Action (.github/actions/eval-harness/action.yml) to run behavior-regression testing in CI. The review feedback highlights critical improvements for the GitHub Action: preventing pipeline failures when no skills are changed by replacing grep with awk filtering, handling empty BASE_SHA values on new branch pushes to avoid git diff crashes, and explicitly setting EVAL_STATE_DIR to ensure evaluation runs are written to the local workspace for step summaries and artifact uploads.
| CHANGED_SKILLS=$(git diff --name-only "$BASE_SHA" "$HEAD_SHA" \ | ||
| | grep -E '^.opencode/skills/[^/]+/' \ | ||
| | awk -F/ '{print $3}' \ | ||
| | sort -u \ | ||
| | paste -sd "," -) |
There was a problem hiding this comment.
Because set -euo pipefail is enabled, if grep finds no matching lines (which happens whenever a commit/PR does not touch any files in .opencode/skills/), it will exit with status 1, causing the entire pipeline and the workflow step to fail. We can avoid this and simplify the pipeline by performing the path filtering directly inside awk.
CHANGED_SKILLS=$(git diff --name-only "$BASE_SHA" "$HEAD_SHA" \\
| awk -F/ '$1 == ".opencode" && $2 == "skills" && $3 != "" {print $3}' \\
| sort -u \\
| paste -sd "," -)| if [[ "${{ github.event_name }}" == "pull_request" ]]; then | ||
| BASE_SHA="${{ github.event.pull_request.base.sha }}" | ||
| HEAD_SHA="${{ github.event.pull_request.head.sha }}" | ||
| else | ||
| BASE_SHA="${{ github.event.before }}" | ||
| HEAD_SHA="${{ github.sha }}" | ||
| fi |
There was a problem hiding this comment.
In a push event, if a new branch is pushed, github.event.before is set to 0000000000000000000000000000000000000000. Passing this to git diff will cause a fatal error (fatal: bad object ...) and crash the workflow. We should handle this by falling back to HEAD~1 if it exists, or to HEAD_SHA itself (resulting in an empty diff).
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
BASE_SHA="${{ github.event.pull_request.base.sha }}"
HEAD_SHA="${{ github.event.pull_request.head.sha }}"
else
BASE_SHA="${{ github.event.before }}"
HEAD_SHA="${{ github.sha }}"
if [[ "$BASE_SHA" == "0000000000000000000000000000000000000000" || -z "$BASE_SHA" ]]; then
if git rev-parse --verify HEAD~1 >/dev/null 2>&1; then
BASE_SHA="HEAD~1"
else
BASE_SHA="$HEAD_SHA"
fi
fi
fi| env: | ||
| ANTHROPIC_API_KEY: ${{ inputs.anthropic-api-key }} | ||
| EVAL_BUDGET_USD: ${{ inputs.budget-usd }} | ||
| EVAL_CI: "1" |
There was a problem hiding this comment.
The action expects the evaluation runs to be written to the local ./runs/ directory for step summary generation and artifact uploading. However, since EVAL_STATE_DIR is not set in the environment, eval-harness defaults to writing runs to $HOME/.config/opencode/eval-harness/runs/. This causes the action to find no runs in the workspace, leading to empty step summaries and missing artifacts. Setting EVAL_STATE_DIR to . resolves this.
env:
ANTHROPIC_API_KEY: ${{ inputs.anthropic-api-key }}
EVAL_BUDGET_USD: ${{ inputs.budget-usd }}
EVAL_CI: "1"
EVAL_STATE_DIR: "."…dies, handoff doc - .campaign/posts/01..07: HN Show post, 3 Reddit posts (LocalLLaMA, ClaudeAI, mlops), 3 blog posts (4-class attribution, 6-field FAIL, flaky-tests), 8-tweet X thread. Each includes a response playbook for likely comments. - .campaign/awesome-pr-bodies/: per-list step-by-step submission guides + PR body templates. Pre-flight check found 4 dead/wrong-fit lists (Hannibal046/Awesome-LLM unmerged since 2025-07; visenger/awesome-mlops unmerged since 2024; e2b-dev/awesome-sdks-for-ai-agents dead since 2023; e2b-dev/awesome-ai-agents redirects tools elsewhere). 3 PRs opened today to active lists. - scripts/eval/tools/stars-kpi.sh: read-only weekly KPI snapshot. Captures stars, forks, watchers, contributors, unique authors (30d), traffic (views/clones 14d), top referrers, top paths. Appends to ~/.eval-harness/kpi-history.ndjson. Prints awesome-list star-floor milestone tracker (target deferred PR thresholds). - .campaign/README.md: layout + sequencing. - .campaign/CAMPAIGN.md: 12-month handoff. Critical path day-by-day for first 14 days, weekly cadence months 1-6, success milestones at 100/500/1000/2000 stars, anti-patterns to avoid, what the agent can do in follow-up sessions and what only humans can do. Awesome-list PRs opened today: - taishi-i/awesome-ChatGPT-repositories #150 - tensorchord/Awesome-LLMOps #538 - steven2358/awesome-generative-ai #830 No source-code changes.
…+ verified merge-rate table Signed-off-by: Hoài Nhớ <nhoxtvt@gmail.com>
…ct_ai 297-line example) Signed-off-by: Hoài Nhớ <nhoxtvt@gmail.com>
What does this PR do?
End-to-end execution of Phase 1 (foundation) + Phase 2 (awesome-list PR wave) + Phase 3 partial (launch content + GitHub Action + KPI script) of the 2000-star contributor-attraction campaign.
Pure additions + a README pitch rewrite — no source-code changes, no test impact.
Why?
The repo has world-class internals but very few discovery surfaces. This PR adds the surfaces (badges, docs, action, issue templates, KPI tracking) and broadens the README pitch from "opencode-skill testing" → "behavior-regression testing for LLM agents (opencode runner today, more coming)".
What's in this PR (2 commits, 35 files, ~3000 lines)
Commit 1 — community health + docs + GitHub Action
llm-evaluation,ai-agents,regression-testing,opencode,llmops,claude,anthropic, ...)docs/assets/demo.tape(vhs script) + README GIF placeholder.github/actions/eval-harnesscomposite action + example workflowCommit 2 — launch content + KPI + awesome-PR plans + handoff
.campaign/posts/.campaign/awesome-pr-bodies/scripts/eval/tools/stars-kpi.sh.campaign/CAMPAIGN.mdAwesome-list PRs opened today (outside this PR, under
nano-steporg)How is it tested?
docs-only+config-only+action-only+tooling-only.The existing 20 test suites under
scripts/eval/tests/are untouched and still green (verify locally withfor t in scripts/eval/tests/*.sh; do bash "$t"; done).The GitHub Action itself is a thin composite over the existing CLI; end-to-end smoke happens when consumed by a downstream repo.
The KPI script was test-run against the live repo and produced expected output (4 stars, 17 views/14d baseline).
Before / after evidence
See the table above. The repo went from "0 surfaces for discovery" → "every surface a stranger needs to evaluate, contribute to, or integrate with eval-harness."
Issues created alongside this PR
good first issues: Add anexpect_exact_linesvariant tokind: shell#31 (expect_exact_lines), Addeval-harness doctor— preflight diagnostics command #32 (doctor command, pinned), Document the case YAML with a JSON Schema (for editor autocomplete) #33 (JSON Schema), Pretty-printeval-harness statusoutput as a colored table #34 (status TTY table), Add--sincefilter toeval-harness trend(e.g. --since=7d, --since=2026-05-01) #35 (--since for trend)langgraph-noderunner for regression-testing LangGraph agents #36 (pinned,help wanted,runnerlabel)good first issue/help wanted/security/portability/ci-integration/documentationRecommended next step
Merge this PR. Then follow the 14-day critical path in
.campaign/CAMPAIGN.md. The campaign is set up; the next 12 months are execution.Checklist
score.shorattribute.shchanges — GNU/BSD grep check N/A