Skip to content

chore(benchmarks): remove POC benchmark#3

Merged
mabry1985 merged 1 commit into
mainfrom
chore/remove-poc-benchmark
May 17, 2026
Merged

chore(benchmarks): remove POC benchmark#3
mabry1985 merged 1 commit into
mainfrom
chore/remove-poc-benchmark

Conversation

@mabry1985
Copy link
Copy Markdown

@mabry1985 mabry1985 commented May 17, 2026

Summary

  • Remove benchmarks/roguelike-ai-poc/ (POC idea + reference artifacts).
  • Replace the example used in the first-plan tutorial and the data-model reference with a neutral contact-list-deduper example.
  • benchmarks/ directory stays as a convention; new benchmarks can be added per the layout in benchmarks/README.md.

Test plan

  • grep -rn roguelike returns nothing in the repo.
  • CI passes (unaffected; docs-only + benchmark-dir change).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Updated benchmark documentation to clarify expected directory structure and guidance.
    • Updated tutorial examples to feature a contact list deduplication CLI instead of previous examples.
    • Updated ID convention examples in reference documentation.
  • Chores

    • Removed legacy benchmark reference materials and associated documentation.

Review Change Stack

Replace the example used in the first-plan tutorial and data-model
reference with a neutral contact-list-deduper example. Benchmarks
directory stays in place as a convention; new benchmarks can be
added per the layout described in benchmarks/README.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9f62a3c7-5daf-4868-a0e0-d03fe0f1a9ed

📥 Commits

Reviewing files that changed from the base of the PR and between 08f763e and f2d0ef6.

⛔ Files ignored due to path filters (1)
  • README.md is excluded by !*.md
📒 Files selected for processing (32)
  • benchmarks/README.md
  • benchmarks/roguelike-ai-poc/prompt.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.md
  • benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.json
  • benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.md
  • benchmarks/roguelike-ai-poc/reference/events.jsonl
  • benchmarks/roguelike-ai-poc/reference/index.html
  • benchmarks/roguelike-ai-poc/reference/project.json
  • benchmarks/roguelike-ai-poc/reference/project.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.md
  • benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.json
  • benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.md
  • benchmarks/roguelike-ai-poc/run.sh
  • docs/README.md
  • docs/reference/data-model.md
  • docs/tutorials/your-first-plan.md

Walkthrough

This PR removes the roguelike-ai-poc benchmark and updates benchmark infrastructure documentation, then refactors tutorials and examples to demonstrate the tool using a new contact-list deduplication CLI scenario instead of the prior roguelike benchmark.

Changes

Benchmark cleanup and documentation refactor

Layer / File(s) Summary
Benchmark structure documentation
benchmarks/README.md, benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md
Benchmark README documents the expected directory structure (benchmarks/<name>/prompt.md, reference/, run.sh) and clarifies that no public benchmarks are yet committed. The roguelike benchmark directory and its task/decision metadata are removed.
Tutorial and reference documentation updates
docs/README.md, docs/reference/data-model.md, docs/tutorials/your-first-plan.md
Tutorial is refactored to use a contact-list deduplication CLI example (fuzzy matching, CSV handling) instead of the roguelike benchmark. Sample wizard output, CLI arguments, and ID convention examples are updated. Lens review text clarifies that skeptic reviews appear only under mvp/full presets, not poc.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/remove-poc-benchmark

Comment @coderabbitai help to get the list of available commands and usage tips.

@mabry1985 mabry1985 merged commit 620986c into main May 17, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant