diff --git a/.agents/skills/autoreview/SKILL.md b/.agents/skills/autoreview/SKILL.md new file mode 100644 index 0000000..85d1939 --- /dev/null +++ b/.agents/skills/autoreview/SKILL.md @@ -0,0 +1,150 @@ +--- +name: autoreview +description: "Autoreview closeout: local dirty changes, PR branch vs main, parallel tests." +--- + +# Autoreview + +Run Codex's built-in code review as a closeout check. This is code review (`codex review`), not Guardian `auto_review` approval routing. + +Codex native review mode performs best and is recommended. Non-Codex reviewers are fallback/second-opinion paths that receive a generated diff prompt, not the full Codex review-mode runtime. + +Use when: + +- user asks for Codex review / autoreview / second-model review +- after non-trivial code edits, before final/commit/ship +- reviewing a local branch or PR branch after fixes + +## Contract + +- Treat review output as advisory. Never blindly apply it. +- Verify every finding by reading the real code path and adjacent files. +- Read dependency docs/source/types when the finding depends on external behavior. +- Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase. +- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class. +- Keep going until the selected review path returns no accepted/actionable findings. +- If a review-triggered fix changes code, rerun focused tests and rerun the review helper. +- Default to Codex review with no fallback. Prefer Codex for final closeout because it uses native review mode; non-Codex reviewers use a Codex-inspired generated diff prompt. Use `--fallback-reviewer auto|claude|pi|opencode|droid|copilot` only when a second-model fallback is explicitly wanted and authenticated. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior. +- Stop as soon as the review command/helper exits 0 with no accepted/actionable findings. Do not run an extra direct `codex review` just to get a nicer "clean" line, a second opinion, or clearer closeout wording. +- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse. +- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know. +- If creating or updating a PR while rejecting any autoreview finding, record the rejected finding and reason in the PR description so later reviewers can distinguish intentional design decisions from missed review output. +- Do not push just to review. Push only when the user requested push/ship/PR update. +- For OpenClaw maintainers, keep autoreview validation Crabbox/Testbox-aware when maintainer validation mode is enabled (`OPENCLAW_TESTBOX=1` or `AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1`). A review pass may inspect files and run cheap non-Node probes, but it must not start local `pnpm`, Vitest, `tsgo`, `npm test`, or `node scripts/run-vitest.mjs` from a Codex/worktree review unless the operator explicitly requested local proof. For runtime proof, use existing evidence or route through Crabbox/Testbox and report the id. Do not apply this rule to ordinary contributors who do not have maintainer Testbox access. + +## Pick Target + +Dirty local work: + +```bash +codex review --uncommitted +``` + +Use this only when the patch is actually unstaged/staged/untracked in the +current checkout. For committed, pushed, or PR work, point Codex at the commit +or branch diff instead; do not force `--mode local` / `--uncommitted` just +because the helper docs mention dirty work first. A clean `--uncommitted` review +only proves there is no local patch. + +Branch/PR work: + +```bash +git fetch origin +codex review --base origin/main +``` + +Do not pass any prompt with `--base`, `--commit`, or `--uncommitted`. Codex CLI +review targets and custom review prompts are mutually exclusive: target modes +generate their own review prompt internally. Use plain target review for native +Codex closeout, or use custom prompt review (`codex review -`) only when you +intentionally want a generated diff prompt instead of native target review. + +If an open PR exists, use its actual base: + +```bash +base=$(gh pr view --json baseRefName --jq .baseRefName) +codex review --base "origin/$base" +``` + +Committed single change: + +```bash +codex review --commit HEAD +``` + +or with the helper: + +```bash +.agents/skills/autoreview/scripts/autoreview --mode commit --commit HEAD +``` + +Use commit review for already-landed or already-pushed work on `main`. Reviewing +clean `main` against `origin/main` is usually an empty diff after push. For a +small stack, review each commit explicitly or review the branch before merging +with `--base`. + +## Parallel Closeout + +Format first if formatting can change line locations. Then it is OK to run tests and review in parallel: + +```bash +.agents/skills/autoreview/scripts/autoreview --parallel-tests "" +``` + +Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation. + +## Context Efficiency + +Codex review is usually noisy. Default to a subagent filter when subagents are available. Ask it to run the review and return only: + +- actionable findings it accepts +- findings it rejects, with one-line reason +- exact files/tests to rerun + +Run inline only for tiny changes or when subagents are unavailable. + +## Helper + +Bundled helper: + +```bash +.agents/skills/autoreview/scripts/autoreview --help +``` + +The helper: + +- chooses dirty `--uncommitted` first +- otherwise uses current PR base if `gh pr view` works +- otherwise uses `origin/main` for non-main branches +- auto-runs `PNPM_CONFIG_PM_ON_FAIL=ignore PNPM_CONFIG_VERIFY_DEPS_BEFORE_RUN=false PNPM_CONFIG_OFFLINE=true pnpm run check` in parallel when a repo has `package.json`, `pnpm-lock.yaml`, `node_modules`, and a `check` script; disable with `AUTOREVIEW_AUTO_TESTS=0` +- use `--mode commit --commit ` for already-committed work, especially clean `main` after landing +- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing +- supports `--reviewer codex|claude|pi|opencode|droid|copilot|auto`; `auto` means Codex first +- supports `--fallback-reviewer auto|claude|pi|opencode|droid|copilot|none`; default is `none` +- falls back only when Codex is unavailable or exits nonzero, not when Codex reports findings +- writes only to stdout unless `--output` or `AUTOREVIEW_OUTPUT` is set +- supports `--dry-run`, `--parallel-tests`, and commit refs +- runs nested review with `--dangerously-bypass-approvals-and-sandbox --sandbox danger-full-access` by default +- with `OPENCLAW_TESTBOX=1` or `AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1`, disables auto local `pnpm run check` and routes Codex through generated prompt review (`codex review -`) so the no-local-heavy-tests policy is included; native Codex target review cannot accept extra prompt text +- non-Codex reviewers receive the generated diff prompt and maintainer validation policy text when maintainer validation is active +- keeps accepting `--full-access`; use `--no-yolo` or `AUTOREVIEW_YOLO=0` to opt out +- still accepts legacy `CODEX_REVIEW_*` env vars when the matching `AUTOREVIEW_*` var is unset +- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0 + +## Final Report + +Include: + +- review command used +- tests/proof run +- findings accepted/rejected, briefly why +- the clean review result from the final helper/review run, or why a remaining finding was consciously rejected + +Do not run another Codex review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean. + +## PR / CI Closeout + +- Prefer direct run/job APIs after CI starts: `gh run view --json jobs`; use PR rollup only for final mergeability. +- After rebase, compare `origin/main..HEAD`; drop CI-fix commits already upstream before pushing. +- For prompt snapshot CI failures, prove/generate with Linux Node 24 before rerunning the failed job. +- Update PR body once near the final head unless proof labels are missing or stale enough to block CI. diff --git a/.agents/skills/autoreview/scripts/autoreview b/.agents/skills/autoreview/scripts/autoreview new file mode 100755 index 0000000..92aaac7 --- /dev/null +++ b/.agents/skills/autoreview/scripts/autoreview @@ -0,0 +1,697 @@ +#!/usr/bin/env bash +set -euo pipefail + +usage() { + cat <<'EOF' +Usage: autoreview [options] + +Options: + --mode auto|local|branch|commit + Target selection. Default: auto. + --base REF Base ref for branch review. Default: PR base or origin/main. + --commit REF Commit ref for commit review. Default: HEAD. + --reviewer codex|claude|pi|opencode|droid|copilot|auto + Review engine. Default: Codex. + --fallback-reviewer auto|claude|pi|opencode|droid|copilot|none + Fallback when Codex is unavailable or exits nonzero. Default: none. + --codex-bin PATH Codex binary. Default: codex. + --claude-bin PATH Claude binary. Default: claude. + --pi-bin PATH Pi binary. Default: pi. + --opencode-bin PATH OpenCode binary. Default: opencode. + --droid-bin PATH Droid binary. Default: droid. + --copilot-bin PATH GitHub Copilot binary. Default: copilot. + --full-access Keep yolo/full-access mode enabled. Default. + --no-yolo Run nested Codex review with normal sandbox/approval prompts. + --output FILE Also save output to file. + --parallel-tests CMD Run review and test command concurrently. + Default: PNPM_CONFIG_PM_ON_FAIL=ignore PNPM_CONFIG_VERIFY_DEPS_BEFORE_RUN=false PNPM_CONFIG_OFFLINE=true pnpm run check when available. + --dry-run Print selected commands, do not run. + -h, --help Show help. + +Environment: + OPENCLAW_TESTBOX=1 or AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1 + Enable maintainer-only OpenClaw Crabbox/Testbox validation policy. + +Modes: + local codex review --uncommitted + branch codex review --base + commit codex review --commit + auto dirty tree -> local, else PR/current branch -> branch +EOF +} + +mode=auto +base_ref= +commit_ref=HEAD +reviewer=${AUTOREVIEW_REVIEWER:-${CODEX_REVIEW_REVIEWER:-auto}} +fallback_reviewer=${AUTOREVIEW_FALLBACK_REVIEWER:-${CODEX_REVIEW_FALLBACK_REVIEWER:-none}} +codex_bin=${CODEX_BIN:-codex} +claude_bin=${CLAUDE_BIN:-claude} +pi_bin=${PI_BIN:-pi} +opencode_bin=${OPENCODE_BIN:-opencode} +droid_bin=${DROID_BIN:-droid} +copilot_bin=${COPILOT_BIN:-copilot} +codex_args=() +yolo=${AUTOREVIEW_YOLO:-${CODEX_REVIEW_YOLO:-1}} +output=${AUTOREVIEW_OUTPUT:-${CODEX_REVIEW_OUTPUT:-}} +parallel_tests= +parallel_tests_auto=false +dry_run=false +codex_review_prompt= +codex_review_prompt_file=false +openclaw_maintainer_validation=${AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION:-${OPENCLAW_TESTBOX:-0}} + +while [[ $# -gt 0 ]]; do + case "$1" in + --mode) + mode=${2:-} + shift 2 + ;; + --base) + base_ref=${2:-} + shift 2 + ;; + --commit) + commit_ref=${2:-} + shift 2 + ;; + --reviewer) + reviewer=${2:-} + shift 2 + ;; + --fallback-reviewer) + fallback_reviewer=${2:-} + shift 2 + ;; + --codex-bin) + codex_bin=${2:-} + shift 2 + ;; + --claude-bin) + claude_bin=${2:-} + shift 2 + ;; + --pi-bin) + pi_bin=${2:-} + shift 2 + ;; + --opencode-bin) + opencode_bin=${2:-} + shift 2 + ;; + --droid-bin) + droid_bin=${2:-} + shift 2 + ;; + --copilot-bin) + copilot_bin=${2:-} + shift 2 + ;; + --full-access) + yolo=1 + shift + ;; + --no-yolo) + yolo=0 + shift + ;; + --output) + output=${2:-} + shift 2 + ;; + --parallel-tests) + parallel_tests=${2:-} + shift 2 + ;; + --dry-run) + dry_run=true + shift + ;; + -h|--help) + usage + exit 0 + ;; + *) + usage >&2 + exit 2 + ;; + esac +done + +case "$yolo" in + 0|false|False|FALSE|no|No|NO|off|Off|OFF) ;; + *) codex_args+=(--dangerously-bypass-approvals-and-sandbox --sandbox danger-full-access) ;; +esac + +case "$mode" in + auto|local|branch|commit) ;; + *) + echo "invalid --mode: $mode" >&2 + exit 2 + ;; +esac + +case "$reviewer" in + auto|codex|claude|pi|opencode|droid|copilot) ;; + *) + echo "invalid --reviewer: $reviewer" >&2 + exit 2 + ;; +esac + +case "$fallback_reviewer" in + auto|claude|pi|opencode|droid|copilot|none) ;; + *) + echo "invalid --fallback-reviewer: $fallback_reviewer" >&2 + exit 2 + ;; +esac + +repo_root=$(git rev-parse --show-toplevel) +printf -v quoted_repo_root '%q' "$repo_root" + +has_package_check_script() { + command -v node >/dev/null 2>&1 || return 1 + node -e 'const { readFileSync } = require("node:fs"); const p = JSON.parse(readFileSync(process.argv[1], "utf8")); process.exit(p.scripts?.check ? 0 : 1)' \ + "$repo_root/package.json" \ + >/dev/null 2>&1 +} + +auto_tests_disabled() { + case "${AUTOREVIEW_AUTO_TESTS:-${CODEX_REVIEW_AUTO_TESTS:-1}}" in + 0|false|False|FALSE|no|No|NO|off|Off|OFF) return 0 ;; + *) return 1 ;; + esac +} + +current_branch=$(git branch --show-current 2>/dev/null || true) +dirty=false +if [[ -n "$(git status --porcelain)" ]]; then + dirty=true +fi + +pr_url= +if [[ -z "$base_ref" && "$mode" != local ]] && command -v gh >/dev/null 2>&1; then + if pr_lines=$(gh pr view --json baseRefName,url --jq '[.baseRefName, .url] | @tsv' 2>/dev/null); then + base_name=${pr_lines%%$'\t'*} + pr_url=${pr_lines#*$'\t'} + if [[ -n "$base_name" ]]; then + base_ref="origin/$base_name" + fi + fi +fi + +if [[ -z "$base_ref" ]]; then + base_ref=origin/main +fi + +review_kind= +if [[ "$mode" == local || ( "$mode" == auto && "$dirty" == true ) ]]; then + review_kind=local +elif [[ "$mode" == commit ]]; then + review_kind=commit +elif [[ "$mode" == branch || ( "$mode" == auto && -n "$current_branch" && "$current_branch" != "main" ) ]]; then + review_kind=branch +else + echo "no review target: clean main checkout and no forced mode" >&2 + exit 1 +fi + +if [[ "$review_kind" == local ]]; then + review_cmd=("$codex_bin" "${codex_args[@]}" review --uncommitted) +elif [[ "$review_kind" == commit ]]; then + review_cmd=("$codex_bin" "${codex_args[@]}" review --commit "$commit_ref") +else + review_cmd=("$codex_bin" "${codex_args[@]}" review --base "$base_ref") +fi + +case "$openclaw_maintainer_validation" in + 1|true|True|TRUE|yes|Yes|YES|on|On|ON) openclaw_maintainer_validation=1 ;; + *) openclaw_maintainer_validation=0 ;; +esac +if [[ -z "$parallel_tests" && "$openclaw_maintainer_validation" != 1 ]] && + ! auto_tests_disabled; then + if [[ -f "$repo_root/package.json" && -f "$repo_root/pnpm-lock.yaml" && -d "$repo_root/node_modules" ]] && + command -v pnpm >/dev/null 2>&1 && + has_package_check_script; then + parallel_tests="cd $quoted_repo_root && PNPM_CONFIG_PM_ON_FAIL=ignore PNPM_CONFIG_VERIFY_DEPS_BEFORE_RUN=false PNPM_CONFIG_OFFLINE=true pnpm run check" + parallel_tests_auto=true + fi +fi +if [[ "$openclaw_maintainer_validation" == 1 ]]; then + codex_review_prompt=$(cat <<'EOF' +OpenClaw maintainer autoreview validation policy: +- Review the diff by reading code, tests, and dependency contracts. +- Do not run local memory-heavy Node validation from review mode. This includes local pnpm checks/tests, Vitest, tsgo, npm test, and node scripts/run-vitest.mjs. +- If runtime proof is needed, use existing proof or route validation through Crabbox / Blacksmith Testbox and report the exact provider and id. +- If remote validation is not necessary for the finding, state the targeted proof that should be run instead of starting local tests. +EOF +) + review_cmd=("$codex_bin" "${codex_args[@]}" review -) + codex_review_prompt_file=true +fi + +printf 'autoreview target: %s\n' "$review_kind" +printf 'branch: %s\n' "${current_branch:-detached}" +if [[ -n "$pr_url" ]]; then + printf 'pr: %s\n' "$pr_url" +fi +if [[ "$reviewer" == auto ]]; then + printf 'reviewer: codex\n' +else + printf 'reviewer: %s\n' "$reviewer" +fi +case "$reviewer" in + codex|auto) ;; + *) + printf 'note: Codex native review mode is the recommended and best-supported review path; %s uses a generated diff prompt.\n' "$reviewer" + ;; +esac +if [[ "$reviewer" == auto || "$reviewer" == codex ]]; then + printf 'review:' + printf ' %q' "${review_cmd[@]}" + printf '\n' + if [[ "$codex_review_prompt_file" == true ]]; then + printf 'review policy: OpenClaw maintainer validation active; using generated prompt review because Codex target review cannot accept extra prompt text\n' + fi +else + printf 'review: %s prompt review\n' "$reviewer" +fi +if [[ -n "$parallel_tests" ]]; then + printf 'tests: %s' "$parallel_tests" + if [[ "$parallel_tests_auto" == true ]]; then + printf ' (auto)' + fi + printf '\n' +fi +if [[ "$review_kind" == branch ]]; then + printf 'fetch: git fetch origin --quiet\n' +fi +if [[ -n "$output" ]]; then + printf 'output: %s\n' "$output" +fi + +if [[ "$dry_run" == true ]]; then + exit 0 +fi + +if [[ "$review_kind" == branch ]]; then + git fetch origin --quiet || { + echo "warning: git fetch origin failed; reviewing with existing refs" >&2 + } +fi + +review_output=$output +review_output_is_temp=false +if [[ -z "$review_output" ]]; then + review_output=$(mktemp) + review_output_is_temp=true +fi +mkdir -p "$(dirname "$review_output")" +: > "$review_output" + +cleanup() { + if [[ "${review_output_is_temp:-false}" == true && -n "${review_output:-}" ]]; then + rm -f "$review_output" + fi + if [[ -n "${prompt_file:-}" ]]; then + rm -f "$prompt_file" + fi +} +trap cleanup EXIT + +run_review() { + local status=0 + mkdir -p "$(dirname "$review_output")" + if [[ "$codex_review_prompt_file" == true ]]; then + build_prompt_file || return + "${review_cmd[@]}" < "$prompt_file" 2>&1 | tee "$review_output" + status=${PIPESTATUS[0]} + rm -f "$prompt_file" + prompt_file= + return "$status" + fi + "${review_cmd[@]}" 2>&1 | tee "$review_output" +} + +diff_for_review() { + case "$review_kind" in + local) + git -C "$repo_root" diff --stat + git -C "$repo_root" diff --cached --stat + git -C "$repo_root" diff --find-renames + git -C "$repo_root" diff --cached --find-renames + while IFS= read -r untracked_file; do + [[ -n "$untracked_file" ]] || continue + git -C "$repo_root" diff --no-index -- /dev/null "$untracked_file" || true + done < <(git -C "$repo_root" ls-files --others --exclude-standard) + ;; + commit) + git -C "$repo_root" show --find-renames --stat --format=fuller "$commit_ref" + git -C "$repo_root" show --find-renames --format=medium "$commit_ref" + ;; + branch) + git -C "$repo_root" diff --find-renames --stat "$base_ref"...HEAD + git -C "$repo_root" diff --find-renames "$base_ref"...HEAD + ;; + esac +} + +build_prompt_file() { + prompt_file=$(mktemp) + { + cat <] Short title + File: path:line + Why: one sentence + Fix: one sentence +- If no accepted/actionable findings, output exactly: + autoreview clean: no accepted/actionable findings reported + +Diff: +EOF + diff_for_review + } > "$prompt_file" || return +} + +reviewer_output_has_clean_marker() { + local path=$1 + grep -Eq '^[^[:alnum:]]*autoreview clean: no accepted/actionable findings reported[[:space:]]*$' "$path" +} + +run_prompt_reviewer() { + local selected=$1 + local copilot_prompt= + local prompt_bytes=0 + local reviewer_output + local status=0 + + if ! build_prompt_file; then + rm -f "${prompt_file:-}" + prompt_file= + return 1 + fi + reviewer_output=$(mktemp) + mkdir -p "$(dirname "$review_output")" + + case "$selected" in + claude) + if ! command -v "$claude_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $claude_bin" >&2 + status=127 + elif printf 'fallback: claude -p\n' | tee -a "$review_output"; then + "$claude_bin" --tools "" --no-session-persistence -p < "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + pi) + if ! command -v "$pi_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $pi_bin" >&2 + status=127 + elif printf 'fallback: pi -p\n' | tee -a "$review_output"; then + "$pi_bin" --no-tools --no-session -p < "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + opencode) + if ! command -v "$opencode_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $opencode_bin" >&2 + status=127 + elif printf 'fallback: opencode run\n' | tee -a "$review_output"; then + "$opencode_bin" run --pure --dir "$repo_root" \ + "Review the attached prompt file. Do not modify files." \ + --file "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + droid) + if ! command -v "$droid_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $droid_bin" >&2 + status=127 + elif printf 'fallback: droid exec\n' | tee -a "$review_output"; then + "$droid_bin" exec --cwd "$repo_root" -f "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + else + status=$? + fi + ;; + copilot) + if ! command -v "$copilot_bin" >/dev/null 2>&1; then + echo "fallback reviewer unavailable: $copilot_bin" >&2 + status=127 + elif printf 'fallback: copilot\n' | tee -a "$review_output"; then + prompt_bytes=$(wc -c < "$prompt_file" | tr -d '[:space:]') + if (( prompt_bytes > 120000 )); then + echo "copilot reviewer unavailable: generated prompt is too large for copilot -p; use codex, droid, or another file/stdin-capable reviewer" \ + 2>&1 | tee -a "$review_output" "$reviewer_output" + status=1 + else + copilot_prompt=$(< "$prompt_file") + "$copilot_bin" -C "$repo_root" --available-tools=none --stream off --output-format text --silent \ + -p "$copilot_prompt" \ + 2>&1 | tee -a "$review_output" "$reviewer_output" + status=$? + fi + else + status=$? + fi + ;; + *) + echo "unsupported prompt reviewer: $selected" >&2 + status=2 + ;; + esac + if [[ "$status" == 0 ]]; then + if grep -Eq '\[P[0-3]\]' "$reviewer_output"; then + status=1 + elif ! grep -q '[^[:space:]]' "$reviewer_output"; then + status=1 + elif ! reviewer_output_has_clean_marker "$reviewer_output"; then + status=1 + fi + fi + rm -f "$reviewer_output" + rm -f "$prompt_file" + prompt_file= + return "$status" +} + +run_selected_review() { + local selected=$1 + case "$selected" in + codex) + if ! command -v "$codex_bin" >/dev/null 2>&1; then + echo "codex reviewer unavailable: $codex_bin" >&2 + return 127 + fi + run_review + ;; + claude|pi|opencode|droid|copilot) + run_prompt_reviewer "$selected" + ;; + *) + echo "unsupported reviewer: $selected" >&2 + return 2 + ;; + esac +} + +fallback_reviewer_is_available() { + local selected=$1 + case "$selected" in + claude) command -v "$claude_bin" >/dev/null 2>&1 ;; + pi) command -v "$pi_bin" >/dev/null 2>&1 ;; + opencode) command -v "$opencode_bin" >/dev/null 2>&1 ;; + droid) command -v "$droid_bin" >/dev/null 2>&1 ;; + copilot) command -v "$copilot_bin" >/dev/null 2>&1 ;; + *) return 1 ;; + esac +} + +run_auto_fallback_review() { + local selected + if [[ "$fallback_reviewer" != auto ]]; then + run_selected_review "$fallback_reviewer" + return $? + fi + + for selected in claude pi opencode droid copilot; do + if fallback_reviewer_is_available "$selected"; then + run_selected_review "$selected" + return $? + fi + done + + echo "fallback reviewer unavailable: no configured fallback CLI found" >&2 + return 127 +} + +run_auto_review() { + run_selected_review codex + local status=$? + if [[ "$status" == 0 ]]; then + return 0 + fi + if (( status > 128 && status < 192 )); then + return "$status" + fi + if review_output_has_findings; then + return "$status" + fi + if [[ "$fallback_reviewer" == none ]]; then + return "$status" + fi + if [[ "$fallback_reviewer" == auto ]]; then + printf 'autoreview warning: codex exited %s; trying configured fallback reviewers\n' "$status" >&2 + else + printf 'autoreview warning: codex exited %s; falling back to %s\n' "$status" "$fallback_reviewer" >&2 + fi + run_auto_fallback_review +} + +elapsed_since() { + local started_at=$1 + local finished_at + finished_at=$(date +%s) + printf '%s\n' "$((finished_at - started_at))" +} + +format_elapsed() { + local seconds=$1 + if (( seconds < 60 )); then + printf '%ss\n' "$seconds" + else + printf '%sm%ss\n' "$((seconds / 60))" "$((seconds % 60))" + fi +} + +review_output_empty() { + [[ ! -s "$review_output" ]] || ! grep -q '[^[:space:]]' "$review_output" +} + +review_findings_text() { + if grep -Fxq 'codex' "$review_output"; then + awk ' + $0 == "codex" { + capture = 1 + output = $0 ORS + next + } + capture { + output = output $0 ORS + } + END { + printf "%s", output + } + ' "$review_output" + return + fi + cat "$review_output" +} + +review_output_has_findings() { + review_findings_text | grep -Eq '\[P[0-3]\]' +} + +report_clean_review_or_fail() { + local elapsed_text + elapsed_text=$(format_elapsed "${review_elapsed_seconds:-0}") + + if review_output_has_findings; then + printf 'autoreview complete after %s\n' "$elapsed_text" + printf 'autoreview findings: accepted/actionable findings reported\n' + return 1 + fi + if review_output_empty; then + printf 'autoreview complete after %s; no output\n' "$elapsed_text" + return 1 + fi + printf 'autoreview complete after %s\n' "$elapsed_text" + printf 'autoreview clean: no accepted/actionable findings reported\n' +} + +if [[ -z "$parallel_tests" ]]; then + review_started_at=$(date +%s) + set +e + if [[ "$reviewer" == auto ]]; then + run_auto_review + else + run_selected_review "$reviewer" + fi + review_status=$? + review_elapsed_seconds=$(elapsed_since "$review_started_at") + set -e + if [[ "$review_status" == 0 ]]; then + report_clean_review_or_fail + exit $? + fi + exit "$review_status" +fi + +review_status_file=$(mktemp) +review_elapsed_file=$(mktemp) +tests_status_file=$(mktemp) + +( + set +e + review_started_at=$(date +%s) + if [[ "$reviewer" == auto ]]; then + run_auto_review + else + run_selected_review "$reviewer" + fi + status=$? + elapsed=$(elapsed_since "$review_started_at") + printf '%s\n' "$status" > "$review_status_file" + printf '%s\n' "$elapsed" > "$review_elapsed_file" +) & +review_pid=$! + +( + set +e + bash -lc "$parallel_tests" + status=$? + printf '%s\n' "$status" > "$tests_status_file" +) & +tests_pid=$! + +wait "$review_pid" || true +wait "$tests_pid" || true + +review_status=$(cat "$review_status_file") +review_elapsed_seconds=$(cat "$review_elapsed_file") +tests_status=$(cat "$tests_status_file") +rm -f "$review_status_file" "$review_elapsed_file" "$tests_status_file" + +printf 'autoreview exit: %s\n' "$review_status" +printf 'tests exit: %s\n' "$tests_status" + +if [[ "$review_status" != 0 || "$tests_status" != 0 ]]; then + exit 1 +fi + +report_clean_review_or_fail diff --git a/.agents/skills/crabbox/SKILL.md b/.agents/skills/crabbox/SKILL.md new file mode 100644 index 0000000..f81c0b5 --- /dev/null +++ b/.agents/skills/crabbox/SKILL.md @@ -0,0 +1,711 @@ +--- +name: crabbox +description: Use the Crabbox wrapper for OpenClaw remote validation across Linux, macOS, Windows, and WSL2, including delegated Blacksmith Testbox proof. Report the actual provider and id. +--- + +# Crabbox + +Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests, +CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed +reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup. + +Crabbox is the transport/orchestration surface. The actual backend can be: + +- brokered AWS Crabbox: direct provider, `provider=aws`, lease ids like + `cbx_...`, `syncDelegated=false` +- Blacksmith Testbox through Crabbox: delegated provider, + `provider=blacksmith-testbox`, ids like `tbx_...`, `syncDelegated=true` + +For OpenClaw maintainer broad `pnpm` gates, Blacksmith Testbox through the +Crabbox wrapper is acceptable and often preferred when the standing Testbox +rules apply. Do not describe those runs as "AWS Crabbox"; report them as +Testbox-through-Crabbox with the `tbx_...` id and Actions run. + +Use the repo `.crabbox.yaml` brokered AWS path when the task specifically needs +direct AWS Crabbox behavior, persistent direct-provider leases, `--fresh-pr`, +`--full-resync`, environment forwarding, capture/download support, or provider +comparison. Use `--provider blacksmith-testbox` when the task needs OpenClaw +maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or +the user asks for Testbox/Blacksmith. + +## First Checks + +- Run from the repo root. Crabbox sync mirrors the current checkout. +- Check the wrapper and providers before remote work: + +```sh +command -v crabbox +../crabbox/bin/crabbox --version +../crabbox/bin/crabbox run --help | sed -n '1,120p' +../crabbox/bin/crabbox desktop launch --help +../crabbox/bin/crabbox webvnc --help +``` + +- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH + shim can be stale. +- Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider` + means brokered AWS today. +- The brokered AWS default is a Linux developer image in `eu-west-1`; the repo + config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply. + If warmup drifts well past the minute-scale path, verify image promotion, + region/AZ placement, and FSR state before blaming OpenClaw. +- For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with + `--provider blacksmith-testbox` or the repo Testbox helpers when the standing + Testbox policy applies. +- Always report the actual provider and id. `cbx_...` means AWS Crabbox; + `tbx_...` means Blacksmith Testbox through Crabbox. If the output only says + `blacksmith testbox list`, use `blacksmith testbox list --all` before + concluding no box exists. +- If a warm direct-provider lease smells stale, retry with `--full-resync` + (alias `--fresh-sync`) before replacing the lease. This resets the remote + workdir, skips the fingerprint fast path, reseeds Git when possible, and + uploads the checkout from scratch. +- For live/provider bugs, use the configured secret workflow before downgrading + to mocks. Copy only the exact needed key into the remote process environment + for that one command. Do not print it, do not sync it as a repo file, and do + not leave it in remote shell history or logs. If no secret-safe injection path + is available, say true live provider auth is blocked instead of silently using + a fake key. +- Prefer local targeted tests for tight edit loops. Broad gates belong remote. +- Do not treat inherited shell env as operator intent. In particular, + `OPENCLAW_LOCAL_CHECK_MODE=throttled` from the local shell is not permission + to move broad `pnpm check:changed`, `pnpm test:changed`, full `pnpm test`, or + lint/typecheck fan-out onto the laptop. +- Only use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` when the user explicitly + asks for local proof in the current task. If Testbox is queued or capacity is + constrained, report the blocker and keep only targeted local edit-loop checks + running. + +## macOS And Windows Targets + +Use these only when the task needs an existing non-Linux host. OpenClaw broad +Linux validation uses the repo Crabbox config unless a provider is explicitly +requested. + +Native brokered Windows is available for Windows-specific proof. Use the AWS +developer image in `us-west-2` on demand; it has the expected OpenClaw developer +toolchain and Docker image cache. Keep broad Linux gates on Linux/Testbox unless +the bug is Windows-specific: + +```sh +../crabbox/bin/crabbox warmup \ + --provider aws \ + --target windows \ + --windows-mode normal \ + --region us-west-2 \ + --market on-demand \ + --timing-json +``` + +The hydrate workflow assumes Docker should already be baked into Linux images +and only installs it as a fallback. Do not add per-run Docker installs to proof +commands unless the image probe shows Docker is actually missing. + +When the user explicitly asks for brokered macOS runners, use Crabbox AWS +macOS only after confirming the deployed coordinator supports EC2 Mac host +lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota +and IAM. Prefer `CRABBOX_HOST_ID` for a known Crabbox-managed Dedicated Host, +or run the no-spend preflight first: + +```sh +crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json +crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json +CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh +``` + +Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report +paid-host blockers as quota, IAM, coordinator deployment, or host availability +instead of falling back to local macOS. + +Crabbox supports static SSH targets: + +```sh +../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test +../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test" +../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test +``` + +- `target=macos` and `target=windows --windows-mode wsl2` use the POSIX SSH, + bash, Git, rsync, and tar contract. +- Native Windows uses OpenSSH, PowerShell, Git, and tar; sync is manifest tar + archive transfer into `static.workRoot`. Direct native Windows runs support + `--script*`, `--env-from-profile`, `--preflight`, and PowerShell `--shell`. +- `crabbox actions hydrate/register` are Linux-only today; use plain + `crabbox run` loops for static macOS and Windows hosts. +- Live proof needs a reachable, operator-managed SSH host. Without one, verify + with `../crabbox/bin/crabbox run --help`, config/flag tests, and the Crabbox + Go test suite. + +## Direct Brokered AWS Backend + +Use this when the task needs direct AWS Crabbox semantics rather than the +prepared Blacksmith Testbox CI environment. + +Changed gate: + +```sh +../crabbox/bin/crabbox run \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + --shell -- \ + "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" +``` + +Full suite: + +```sh +../crabbox/bin/crabbox run \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + --shell -- \ + "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test" +``` + +Focused rerun: + +```sh +../crabbox/bin/crabbox run \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + --shell -- \ + "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test " +``` + +Read the JSON summary. Useful fields: + +- `provider`: `aws` +- `leaseId`: `cbx_...` +- `syncDelegated`: `false` +- `commandPhases`: populated when the command prints `CRABBOX_PHASE:` +- `commandMs` / `totalMs` +- `exitCode` + +Crabbox should stop one-shot AWS leases automatically after the run. Verify +cleanup when a run fails, is interrupted, or the command output is unclear: + +```sh +../crabbox/bin/crabbox list --provider aws +``` + +## Blacksmith Testbox Through Crabbox + +Use this for OpenClaw maintainer broad/heavy `pnpm` gates when the prepared CI +environment is the right proof surface: + +```sh +node scripts/crabbox-wrapper.mjs run \ + --provider blacksmith-testbox \ + --blacksmith-org openclaw \ + --blacksmith-workflow .github/workflows/ci-check-testbox.yml \ + --blacksmith-job check \ + --blacksmith-ref main \ + --idle-timeout 90m \ + --ttl 240m \ + --timing-json \ + -- \ + CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 pnpm check:changed +``` + +Read the JSON summary and the Testbox line. Useful fields: + +- `provider`: `blacksmith-testbox` +- `leaseId`: `tbx_...` +- `syncDelegated`: `true` +- `syncPhases`: delegated/skipped because Blacksmith owns checkout/sync +- Actions run URL/id from the Testbox output +- `exitCode` + +`blacksmith testbox list` may hide hydrating or ready boxes. Use: + +```sh +blacksmith testbox list --all +blacksmith testbox status +``` + +## Observability Flags + +Use these on debugging runs before inventing ad hoc logging: + +- `--preflight`: prints run context, workspace mode, SSH target, remote user/cwd, + and target-specific tool probes. Defaults cover `git`, `tar`, `node`, `npm`, + `corepack`, `pnpm`, `yarn`, `bun`, `docker`, plus POSIX + `sudo`/`apt`/`bubblewrap` and native Windows + `powershell`/`execution_policy`/`longpaths`/`temp`/`pwsh`. Add + `--preflight-tools node,bun,docker`, `CRABBOX_PREFLIGHT_TOOLS`, or repo + `run.preflightTools` to replace the list. `default` expands built-ins; `none` + prints only the workspace summary. Preflight is diagnostic only; install + toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or + the run script. On `blacksmith-testbox`, this prints a delegated-unsupported + note because the workflow owns setup. +- `CRABBOX_ENV_ALLOW=NAME,...`: forwards only listed local env vars for direct + providers and prints `set len=N secret=true` style summaries. On + `blacksmith-testbox`, env forwarding is unsupported; put secrets in the + Testbox workflow instead. +- `--env-from-profile ` plus `--allow-env NAME`: loads simple + `export NAME=value` / `NAME=value` lines from a local profile without + executing it, then forwards only allowlisted names. `--allow-env` is + repeatable and comma-separated. Profile values override ambient allowlisted + env values for that run. Direct POSIX, WSL2, and native Windows runs are + supported; delegated providers are not. Crabbox probes the uploaded profile + remotely and prints redacted presence/length metadata before the command. +- `--env-helper `: with `--env-from-profile` on POSIX SSH targets, + persists `.crabbox/env/` and `.crabbox/env/.env` so follow-up + commands on the same lease can run through `./.crabbox/env/ `. + Use only on leases you control; the profile stays until cleanup, lease reset, + or `--full-resync`. +- `--script ` / `--script-stdin`: upload a local script into + `.crabbox/scripts/` and execute it on the remote box. Shebang scripts execute + directly on POSIX; scripts without a shebang run through `bash`. Native + Windows uploads run through Windows PowerShell, and Crabbox appends `.ps1` + when needed. Arguments after `--` become script args. +- `--fresh-pr owner/repo#123|URL|number`: skip dirty local sync and create a + fresh remote checkout of the GitHub PR. Bare numbers use the current repo's + GitHub origin. Add `--apply-local-patch` only when the current local + `git diff --binary HEAD` should be applied on top of that PR checkout. +- `--full-resync` / `--fresh-sync`: reset a stale direct-provider workdir + before syncing. Use after sync fingerprints look wrong, SSH times out before + sync, or rsync watchdog output suggests it. It is redundant with + `--fresh-pr`, incompatible with `--no-sync`, and unsupported by delegated + providers. +- `--capture-stdout ` / `--capture-stderr `: write remote streams to + local files and keep binary/noisy output out of retained logs. Parent + directories must already exist. These are direct-provider only. +- `--capture-on-fail`: on non-zero direct-provider exits, downloads + `.crabbox/captures/*.tar.gz` with `test-results`, `playwright-report`, + `coverage`, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed. +- `--keep-on-failure`: leave a failed one-shot lease alive for live debugging + until idle/TTL expiry. Useful on direct providers and delegated one-shots. +- `--timing-json`: final machine-readable timing. Add + `echo CRABBOX_PHASE:install`, `CRABBOX_PHASE:test`, etc. in long shell + commands; direct providers and Blacksmith Testbox both report them as + `commandPhases`. + +Live-provider debug template for direct AWS/Hetzner leases: + +```sh +mkdir -p .crabbox/logs +../crabbox/bin/crabbox run --provider aws \ + --preflight \ + --allow-env OPENAI_API_KEY,OPENAI_BASE_URL \ + --timing-json \ + --capture-stdout .crabbox/logs/live-provider.stdout.log \ + --capture-stderr .crabbox/logs/live-provider.stderr.log \ + --capture-on-fail \ + --shell -- \ + "echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live" +``` + +Do not pass `--capture-*`, `--download`, `--checksum`, `--force-sync-large`, or +`--sync-only` to delegated providers. Also do not pass `--script*`, +`--fresh-pr`, `--full-resync`, or `--env-helper` there. Crabbox rejects these +because the provider owns sync or command transport. `--keep-on-failure` is OK +for delegated one-shots when you need to inspect a failed lease. + +## Efficient Bug E2E Verification + +Use the smallest Crabbox lane that proves the reported user path, not just the +touched code. Aim for one after-fix E2E proof before commenting, closing, or +opening a PR for a user-visible bug. + +When the user says "test in Crabbox", do not simply copy tests to the remote +box and run them there. Crabbox is for remote real-scenario proof: copy or +install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API +call that failed, and capture behavior from that entrypoint. For regressions or +bug reports, prove the broken state first when feasible, then run the same +scenario after the fix. + +Pick the lane by symptom: + +- Docker/setup/install bug: build a package tarball and run the matching + `scripts/e2e/*-docker.sh` or package script. This proves npm packaging, + install paths, runtime deps, config writes, and container behavior. +- Provider/model/auth bug: prefer true live E2E. Use the configured secret + workflow, then inject the single needed key into Crabbox if needed. Scrub + unrelated provider env vars in the child command so interactive defaults do + not drift to another provider. If only a dummy key is used, label the proof + narrowly, e.g. "UI/install path only; live provider auth not exercised." +- Channel delivery bug: use the channel Docker/live lane when available; include + setup, config, gateway start, send/receive or agent-turn proof, and redacted + logs. +- Gateway/session/tool bug: prefer an end-to-end CLI or Gateway RPC command that + creates real state and inspects the resulting files/API output. +- Pure parser/config bug: targeted tests may be enough, but still run a + Crabbox command when OS, package, Docker, secrets, or service lifecycle could + change behavior. + +Efficient flow: + +1. Reproduce or prove the pre-fix symptom from the real user-facing entrypoint + when feasible. If the issue cannot be reproduced, capture the exact command + and observed behavior instead. +2. Patch locally and run narrow local tests for edit speed. +3. Run one Crabbox E2E command that starts from the user-facing entrypoint: + package install, Docker setup, onboarding, channel add, gateway start, or + agent turn as appropriate. +4. Record proof as: Testbox id, command, environment shape, redacted secret + source, and copied success/failure output. +5. If the issue says "cannot reproduce", ask for the missing config/log fields + that would distinguish the tested path from the reporter's path. + +Keep it efficient: + +- Reuse existing E2E scripts and helper assertions before writing ad hoc shell. +- Use `--script ` or `--script-stdin` for multi-line E2E commands instead + of quote-heavy `--shell` strings on direct SSH providers. +- Use `--fresh-pr ` when validating an upstream PR in isolation from the + local dirty tree. Add `--apply-local-patch` only when testing a local fixup on + top of that PR. +- Use `--full-resync` before replacing a warmed direct-provider lease when the + remote workdir or sync fingerprint appears stale. +- Use one-shot Crabbox for a single proof; use a reusable Testbox only when + several commands must share built images, installed packages, or live state. +- Prefer `OPENCLAW_CURRENT_PACKAGE_TGZ` with Docker/package lanes when testing a + candidate tarball; prefer the repo's package helper instead of direct source + execution when the bug might be packaging/install related. +- Keep secrets redacted. It is fine to report key presence, source, and length; + never print secret values. +- Include `--timing-json` on broad or flaky runs when command duration or sync + behavior matters. + +Before/after PR proof on delegated Testbox: + +- For PRs that should prove "broken before, fixed after", compare base and PR + on the same Testbox when practical. Fetch both refs, create detached temp + worktrees under `/tmp`, install in each, then run the same harness twice. +- Do not checkout base/PR refs in the synced repo root. Delegated Testbox sync + may leave the root dirty with local files; `git checkout` can abort or mix + proof state. +- Temp harness files under `/tmp` do not resolve repo packages by default. Put + the harness inside the worktree, or in ESM use + `createRequire(path.join(process.cwd(), "package.json"))` before requiring + workspace deps such as `@lydell/node-pty`. +- For full-screen TUI/CLI bugs, a PTY harness is stronger than helper-only + assertions. Use a real PTY, wait for visible lifecycle markers, send input, + then send control keys and assert process exit/stuck behavior. +- When validating a rebased local branch before push, remember delegated sync + usually validates synced file content on a detached dirty checkout, not a + remote commit object. Record the local head SHA, changed files, Testbox id, + and final success markers; after pushing, ensure the pushed SHA has the same + file content. +- If GitHub CI is still queued but the exact changed content passed Testbox + `pnpm check:changed`, `pnpm check:test-types`, and the real E2E proof, it is + reasonable to merge once required checks allow it. Note any still-running + unrelated shards in the proof comment instead of waiting forever. + +Interactive CLI/onboarding: + +- For full-screen or prompt-heavy CLI flows, run the target command inside tmux + on the Crabbox and drive it with `tmux send-keys`; capture proof with + `tmux capture-pane`, redacted through `sed`. +- Prefer deterministic arrow navigation over search typing for Clack-style + searchable selects. Raw `send-keys -l openai` may not trigger filtering in a + tmux pane; inspect option order locally or on-box and send exact Down/Enter + sequences. +- Isolate mutable state with `OPENCLAW_STATE_DIR=$(mktemp -d)`. Plugin npm + installs live under that state dir (`npm/node_modules/...`), not under + `OPENCLAW_CONFIG_DIR`. Verify downloads by checking the state dir, package + lock, and installed package metadata. +- To test automatic setup installs against local package artifacts, use + `OPENCLAW_ALLOW_PLUGIN_INSTALL_OVERRIDES=1` plus + `OPENCLAW_PLUGIN_INSTALL_OVERRIDES='{"plugin-id":"npm-pack:/tmp/plugin.tgz"}'`. + Pack with `npm pack`, set an isolated `OPENCLAW_STATE_DIR`, and verify the + package under `npm/node_modules`. Overrides are test-only and must not be + treated as official/trusted-source installs. +- For OpenAI/Codex onboarding proof, the useful markers are the UI line + `Installed Codex plugin`, `npm/node_modules/@openclaw/codex`, and the + package-lock entry showing the bundled `@openai/codex` dependency. A dummy + OpenAI-shaped key can prove only UI/install behavior; it is not live auth. + +## Reuse And Keepalive + +For most Crabbox calls, one-shot is enough. Use reuse only when you need +multiple manual commands on the same hydrated box. + +If Crabbox returns a reusable id or you intentionally keep a lease: + +```sh +../crabbox/bin/crabbox run --id --no-sync --timing-json --shell -- "pnpm test " +``` + +Stop boxes you created before handoff: + +```sh +../crabbox/bin/crabbox stop +blacksmith testbox stop --id +``` + +## Interactive Desktop And WebVNC + +Prefer WebVNC for human inspection because the browser portal can preload the +lease VNC password and avoids a native VNC client's copy/paste/password dance. +Use native `crabbox vnc` only when WebVNC is unavailable, the browser portal is +broken, or the user explicitly wants a local VNC client. + +Common desktop flow: + +```sh +../crabbox/bin/crabbox warmup --provider hetzner --desktop --browser --class standard --idle-timeout 60m --ttl 240m +../crabbox/bin/crabbox desktop launch --provider hetzner --id --browser --url https://example.com --webvnc --open --take-control +``` + +Useful WebVNC commands: + +```sh +../crabbox/bin/crabbox webvnc --provider hetzner --id --open --take-control +../crabbox/bin/crabbox webvnc daemon start --provider hetzner --id --open --take-control +../crabbox/bin/crabbox webvnc daemon status --provider hetzner --id +../crabbox/bin/crabbox webvnc daemon stop --provider hetzner --id +../crabbox/bin/crabbox webvnc status --provider hetzner --id +../crabbox/bin/crabbox webvnc reset --provider hetzner --id --open --take-control +../crabbox/bin/crabbox desktop doctor --provider hetzner --id +../crabbox/bin/crabbox desktop click --provider hetzner --id --x 640 --y 420 +../crabbox/bin/crabbox desktop paste --provider hetzner --id --text "user@example.com" +../crabbox/bin/crabbox desktop key --provider hetzner --id ctrl+l +../crabbox/bin/crabbox artifacts collect --id --all --output artifacts/ +../crabbox/bin/crabbox artifacts publish --dir artifacts/ --pr +``` + +`desktop launch --webvnc --open` is usually the nicest one-shot: it starts the +browser/app inside the visible session, bridges the lease into the authenticated +WebVNC portal, and opens the portal. Keep browsers windowed for human QA; use +`--fullscreen` only for capture/video workflows. +For human handoff, include `--take-control` so the opened portal viewer gets +keyboard/mouse control automatically instead of landing as an observer. + +Human handoff preflight: + +- Do not assume a visible desktop or launched browser means the repo CLI/app is + installed, built, or on the interactive terminal's `PATH`. +- Before handing WebVNC to a human tester, prove the expected command from the + same kept lease and from a neutral directory such as `~`. +- If the handoff needs repo-local code, sync/build/link it explicitly on that + lease. Source-tree CLIs often need build output before a symlink works. +- Prefer a real `command -v && --version` + check over a repo-root-only `pnpm ...` command. + +Generic handoff repair pattern: + +```sh +../crabbox/bin/crabbox run --id --full-resync --shell -- \ + "set -euo pipefail + pnpm install --frozen-lockfile + pnpm build + sudo ln -sf \"\$PWD/\" /usr/local/bin/ + cd ~ + command -v + --version" +``` + +## If Crabbox Fails + +Keep the fallback narrow. First decide whether the failure is Crabbox itself, +the brokered AWS lease, Blacksmith/Testbox, repo hydration, sync, or the test +command. + +Fast checks: + +```sh +command -v crabbox +../crabbox/bin/crabbox --version +../crabbox/bin/crabbox run --help | sed -n '1,140p' +../crabbox/bin/crabbox doctor +command -v blacksmith +blacksmith --version +blacksmith testbox list +``` + +Common Crabbox-only failures: + +- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling + repo, or update/install Crabbox before retrying. +- Bad local config: inspect `.crabbox.yaml`, `crabbox config show`, and + `crabbox whoami`; normal OpenClaw proof should use brokered AWS without + asking for cloud keys. +- Slug/claim confusion: use the raw `cbx_...` / `tbx_...` id, or run one-shot + without `--id`. +- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the + printed Actions URL. Large sync warnings now include top source directories + by file count and a hint to update `.crabboxignore` / `sync.exclude`; inspect + those before reaching for `--force-sync-large`. Quiet rsync watchdogs and SSH + timeouts now print `next_action=` hints; follow them, usually `--full-resync` + first and a fresh lease second. +- Cleanup uncertainty: run `crabbox list --provider aws`; for explicit + Blacksmith runs, use `blacksmith testbox list` and stop only boxes you + created. +- Testbox queued/capacity pressure: do not retry Blacksmith repeatedly. Rerun + once without `--provider` so `.crabbox.yaml` routes to brokered AWS, or report + the Blacksmith blocker if Testbox itself is the requested proof. + +If brokered AWS cannot dispatch, sync, attach, or stop, retry once with +`--debug` and `--timing-json`: + +```sh +../crabbox/bin/crabbox run --debug --timing-json -- \ + CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed +``` + +Full suite: + +```sh +../crabbox/bin/crabbox run --debug --timing-json -- \ + CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test +``` + +Auth fallback, only when `blacksmith` says auth is missing: + +```sh +blacksmith auth login --non-interactive --organization openclaw +``` + +Raw Blacksmith footguns: + +- Run from repo root. The CLI syncs the current directory. +- Save the returned `tbx_...` id in the session. +- Reuse that id for focused reruns; stop it before handoff. +- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag. +- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable + queue. + +Use Blacksmith only when the task is specifically about Testbox, brokered AWS +is unavailable, or an explicit comparison is needed. If Blacksmith is down or +quota-limited, do not keep probing it; stay on brokered AWS and note the +delegated-provider outage. + +## Blacksmith Backend Notes + +Crabbox Blacksmith backend delegates setup to: + +- org: `openclaw` +- workflow: `.github/workflows/ci-check-testbox.yml` +- job: `check` +- ref: `main` unless testing a branch/tag intentionally + +The hydration workflow owns checkout, Node/pnpm setup, dependency install, +secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command +execution, timing, logs/results, and cleanup. + +Minimal Blacksmith-backed Crabbox run, from repo root: + +```sh +../crabbox/bin/crabbox run --provider blacksmith-testbox --timing-json -- \ + CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed +``` + +Use direct Blacksmith only when Crabbox is the broken layer and you are +isolating a Crabbox bug. Prefer direct `blacksmith testbox list` for cleanup +diagnostics, not as a reusable work queue. + +Important Blacksmith footguns: + +- Always run from repo root. The CLI syncs the current directory. +- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag. +- If auth is missing and browser auth is acceptable: + +```sh +blacksmith auth login --non-interactive --organization openclaw +``` + +## Brokered AWS + +Use AWS for normal OpenClaw remote proof. The repo `.crabbox.yaml` already +selects brokered AWS, so omit `--provider` unless you are testing a different +provider deliberately. + +```sh +../crabbox/bin/crabbox warmup --class beast --market on-demand --idle-timeout 90m +../crabbox/bin/crabbox hydrate --id +../crabbox/bin/crabbox run --id --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed" +../crabbox/bin/crabbox stop +``` + +Install/auth for owned Crabbox if needed: + +```sh +brew install openclaw/tap/crabbox +crabbox login --url https://crabbox.openclaw.ai --provider aws +``` + +New users should self-resolve broker auth before anyone asks for AWS keys: + +```sh +crabbox config show +crabbox doctor +crabbox whoami +``` + +- If broker auth is missing, run `crabbox login --url https://crabbox.openclaw.ai --provider aws`. +- If the CLI asks for `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or AWS + profile setup during normal OpenClaw validation, assume the agent selected + the wrong path. Use brokered `crabbox login` or an existing brokered lease + before asking the user for cloud credentials. +- Ask for AWS keys only for explicit direct-provider/account administration, + not for normal brokered OpenClaw proof. +- Trusted automation may still use + `printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`. + +macOS config lives at: + +```text +~/Library/Application Support/crabbox/config.yaml +``` + +It should include `broker.url`, `broker.token`, and usually `provider: aws` +for OpenClaw lanes. Let that config drive normal validation. + +### Interactive Desktop / WebVNC + +For human desktop demos, prefer `webvnc` over native `vnc` and keep the remote +desktop visible/windowed. Do not fullscreen the remote browser or hide the XFCE +panel/window chrome unless the explicit goal is video/capture output. After +launch, verify a screenshot shows the desktop panel plus browser title bar. If +Chrome is fullscreen, toggle it back with: + +```sh +crabbox run --id --shell -- 'DISPLAY=:99 xdotool search --onlyvisible --class google-chrome windowactivate key F11' +``` + +## Diagnostics + +```sh +crabbox status --id --wait +crabbox inspect --id --json +crabbox sync-plan +crabbox history --limit 20 +crabbox history --lease +crabbox attach +crabbox events --json +crabbox logs +crabbox results +crabbox cache stats --id +crabbox ssh --id +blacksmith testbox list +``` + +Use `--debug` on `run` when measuring sync timing. +Use `--timing-json` on warmup, hydrate, and run when comparing backends. +Use `--market spot|on-demand` only on AWS warmup/one-shot runs. + +## Failure Triage + +- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists + the provider selected by `.crabbox.yaml`; update Crabbox before falling back. +- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect + the hydration step. +- Sync failed: rerun with `--debug`; check changed-file count and whether the + checkout is dirty. +- Command failed: rerun only the failing shard/file first. Do not rerun a full + suite until the focused failure is understood. +- Cleanup uncertain: `crabbox list --provider aws`; for explicit Blacksmith + runs, use `blacksmith testbox list` and stop owned `tbx_...` leases you + created. +- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above, + then file/fix the Crabbox issue. + +## Boundary + +Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the +hydration workflow and keep Crabbox generic around lease, sync, command +execution, logs/results, timing, and cleanup. diff --git a/.crabbox.yaml b/.crabbox.yaml new file mode 100644 index 0000000..b9070dc --- /dev/null +++ b/.crabbox.yaml @@ -0,0 +1,49 @@ +profile: clawpatch-check +provider: aws +class: standard +capacity: + market: spot + strategy: most-available + fallback: on-demand-after-120s + hints: true + regions: + - eu-west-1 + - eu-west-2 + - eu-central-1 + - us-east-1 + - us-west-2 +actions: + workflow: .github/workflows/crabbox-hydrate.yml + job: hydrate + ref: main + runnerLabels: + - crabbox + - openclaw + - clawpatch + runnerVersion: latest + ephemeral: true +aws: + region: eu-west-1 + rootGB: 160 +sync: + delete: true + checksum: false + gitSeed: true + fingerprint: true + baseRef: main + exclude: + - .artifacts + - .codex + - .DS_Store + - coverage + - dist + - dist-test + - node_modules +env: + allow: + - CI + - NODE_OPTIONS + - PNPM_* +ssh: + user: crabbox + port: "2222" diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 7dabfaa..fbc569a 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -7,6 +7,9 @@ # Security automation and disclosure surfaces. /SECURITY.md @openclaw/openclaw-secops +/.agents/skills/ @openclaw/openclaw-secops +/.crabbox.yaml @openclaw/openclaw-secops +/.github/actionlint.yaml @openclaw/openclaw-secops /.github/codeql/ @openclaw/openclaw-secops /.github/dependabot.yml @openclaw/openclaw-secops /.github/workflows/ @openclaw/openclaw-secops diff --git a/.github/actionlint.yaml b/.github/actionlint.yaml new file mode 100644 index 0000000..fa23916 --- /dev/null +++ b/.github/actionlint.yaml @@ -0,0 +1,5 @@ +self-hosted-runner: + labels: + - crabbox + - openclaw + - clawpatch diff --git a/.github/workflows/crabbox-hydrate.yml b/.github/workflows/crabbox-hydrate.yml new file mode 100644 index 0000000..2daa12d --- /dev/null +++ b/.github/workflows/crabbox-hydrate.yml @@ -0,0 +1,118 @@ +name: Crabbox Hydrate + +on: + workflow_dispatch: + inputs: + crabbox_id: + description: "Crabbox lease ID" + required: true + type: string + ref: + description: "Git ref to hydrate" + required: false + type: string + crabbox_runner_label: + description: "Dynamic Crabbox runner label" + required: true + type: string + crabbox_job: + description: "Hydration job identifier expected by Crabbox" + required: false + default: "hydrate" + type: string + crabbox_keep_alive_minutes: + description: "Minutes to keep the hydrated job alive" + required: false + default: "90" + type: string + +permissions: + contents: read + +jobs: + hydrate: + name: hydrate + runs-on: [self-hosted, crabbox, openclaw, clawpatch, "${{ inputs.crabbox_runner_label }}"] + timeout-minutes: 120 + steps: + - uses: actions/checkout@v6 + with: + ref: ${{ inputs.ref || github.ref }} + + - uses: actions/setup-node@v6 + with: + node-version: 24 + + - name: Prepare pnpm workspace + shell: bash + run: | + set -euo pipefail + git fetch --no-tags --depth=50 origin "+refs/heads/main:refs/remotes/origin/main" + corepack enable + corepack prepare "$(node -p "require('./package.json').packageManager")" --activate + pnpm install --frozen-lockfile + node --version + pnpm --version + + - name: Mark Crabbox ready + shell: bash + env: + CRABBOX_ID: ${{ inputs.crabbox_id }} + CRABBOX_JOB: ${{ inputs.crabbox_job }} + run: | + set -euo pipefail + job="${CRABBOX_JOB}" + if [ -z "$job" ]; then job=hydrate; fi + case "$CRABBOX_ID" in + ''|*[!A-Za-z0-9._-]*) + echo "Invalid crabbox_id" >&2 + exit 2 + ;; + esac + mkdir -p "$HOME/.crabbox/actions" + state="$HOME/.crabbox/actions/${CRABBOX_ID}.env" + env_file="$HOME/.crabbox/actions/${CRABBOX_ID}.env.sh" + { + for key in CI GITHUB_ACTIONS GITHUB_WORKSPACE GITHUB_REPOSITORY GITHUB_RUN_ID GITHUB_RUN_NUMBER GITHUB_RUN_ATTEMPT GITHUB_REF GITHUB_REF_NAME GITHUB_SHA GITHUB_EVENT_NAME GITHUB_ACTOR RUNNER_OS RUNNER_ARCH RUNNER_TEMP RUNNER_TOOL_CACHE PATH; do + value="${!key-}" + if [ -n "$value" ]; then + printf 'export %s=%q\n' "$key" "$value" + fi + done + } > "${env_file}.tmp" + mv "${env_file}.tmp" "$env_file" + tmp="${state}.tmp" + { + echo "WORKSPACE=${GITHUB_WORKSPACE}" + echo "RUN_ID=${GITHUB_RUN_ID}" + echo "JOB=${job}" + echo "ENV_FILE=${env_file}" + echo "READY_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)" + } > "$tmp" + mv "$tmp" "$state" + + - name: Keep Crabbox job alive + shell: bash + env: + CRABBOX_ID: ${{ inputs.crabbox_id }} + CRABBOX_KEEP_ALIVE_MINUTES: ${{ inputs.crabbox_keep_alive_minutes }} + run: | + set -euo pipefail + case "$CRABBOX_ID" in + ''|*[!A-Za-z0-9._-]*) + echo "Invalid crabbox_id" >&2 + exit 2 + ;; + esac + minutes="${CRABBOX_KEEP_ALIVE_MINUTES}" + case "$minutes" in + ''|*[!0-9]*) minutes=90 ;; + esac + stop="$HOME/.crabbox/actions/${CRABBOX_ID}.stop" + deadline=$(( $(date +%s) + minutes * 60 )) + while [ "$(date +%s)" -lt "$deadline" ]; do + if [ -f "$stop" ]; then + exit 0 + fi + sleep 15 + done diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml new file mode 100644 index 0000000..46276bb --- /dev/null +++ b/.github/workflows/stale.yml @@ -0,0 +1,91 @@ +name: Stale + +on: + schedule: + - cron: "17 4 * * *" + workflow_dispatch: + +permissions: {} + +jobs: + stale: + permissions: + issues: write + pull-requests: write + runs-on: ubuntu-latest + steps: + - name: Ensure stale label + env: + GH_TOKEN: ${{ github.token }} + run: gh label create stale --description "Inactive issue or pull request" --color ededed --force + + - name: Mark stale unassigned issues and pull requests + uses: actions/stale@v10 + with: + days-before-issue-stale: 14 + days-before-issue-close: 7 + days-before-pr-stale: 14 + days-before-pr-close: 7 + stale-issue-label: stale + stale-pr-label: stale + exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale + exempt-pr-labels: maintainer,no-stale + operations-per-run: 1000 + ascending: true + exempt-all-assignees: true + remove-stale-when-updated: true + stale-issue-message: | + This issue has been automatically marked as stale due to inactivity. + Please add updated clawpatch details or it will be closed. + stale-pr-message: | + This pull request has been automatically marked as stale due to inactivity. + Please update it or it will be closed. + close-issue-message: | + Closing due to inactivity. + If this still affects clawpatch, open a new issue with current reproduction details. + close-issue-reason: not_planned + close-pr-message: | + Closing due to inactivity. + If this PR should be revived, reopen it with current context and validation. + + - name: Mark stale assigned issues + uses: actions/stale@v10 + with: + days-before-issue-stale: 30 + days-before-issue-close: 10 + days-before-pr-stale: -1 + days-before-pr-close: -1 + stale-issue-label: stale + exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale + operations-per-run: 1000 + ascending: true + include-only-assigned: true + remove-stale-when-updated: true + stale-issue-message: | + This assigned issue has been automatically marked as stale after 30 days of inactivity. + Please add an update or it will be closed. + close-issue-message: | + Closing due to inactivity. + If this still affects clawpatch, reopen or file a new issue with current evidence. + close-issue-reason: not_planned + + - name: Mark stale assigned pull requests + uses: actions/stale@v10 + with: + days-before-issue-stale: -1 + days-before-issue-close: -1 + days-before-pr-stale: 27 + days-before-pr-close: 7 + stale-pr-label: stale + exempt-pr-labels: maintainer,no-stale + operations-per-run: 1000 + ascending: true + include-only-assigned: true + ignore-pr-updates: true + remove-stale-when-updated: true + stale-pr-message: | + This assigned pull request has been automatically marked as stale after being open for 27 days. + Please add an update or it will be closed. + close-pr-message: | + Closing due to inactivity. + If this PR should be revived, reopen it with current context and validation.