A modular development workflow suite for Claude Code inspired by Ashley Ha's workflow, adapted to work 100% locally with thoughts.
π Read more: Tu CLAUDE.md no funciona sin Context Engineering (Spanish article about Stepwise-dev)
Solves the context management problem: LLMs lose attention after 60% context usage.
Implements Research β Plan β Implement β Validate with frequent /clear and persistent thoughts/ storage.
- Keep context < 60% (attention threshold)
- Split work into phases
- Clear between phases, save to
thoughts/ - Never lose research or decisions
More generated code = more risk if you don't have a solid feedback loop.
The faster AI generates code, the more these practices matter:
- Story Splitting β AI can produce a lot in little time. If scope isn't cut, chaos scales just as fast.
- Hamburger Method β Deliver value end-to-end continuously by slicing features into thin vertical layers.
- Small Safe Steps β Each step must be reversible. Speed of generation is not speed to production.
- Advanced testing β Mutation, acceptance, and architectural testing. The feedback loop must be solid. No more excuses.
This repository contains 4 independent plugins that can be installed separately based on your needs:
The foundation plugin with the complete Research β Plan β Implement β Validate cycle.
Includes:
- 13 skills (
research-codebase,create-plan,iterate-plan,implement-plan,validate-plan,thoughts-management,bugmagnet,hamburger-method,small-safe-steps,story-splitting,test-desiderata,tdd,grill-me) - 5 specialized agents (codebase exploration and thoughts management)
Clean git commit workflow without Claude attribution.
Includes:
- 1 skill (
commit) - Smart staging and commit message generation
Web search and research capabilities for external context.
Includes:
- 1 specialized agent (
web-search-researcher) - Deep web research with source citations
Advanced multi-agent research system with parallel web searches and synthesis.
Includes:
- 1 skill (
deep-research, includesgenerate-reportscript for structured reports) - 3 specialized agents (research-lead, research-worker, citation-analyst)
- Comprehensive research reports with citations and metadata
claude plugin marketplace add https://github.com/nikeyes/stepwise-dev.git
# Install all plugins
claude plugin install stepwise-core@stepwise-dev
claude plugin install stepwise-git@stepwise-dev
claude plugin install stepwise-web@stepwise-dev
claude plugin install stepwise-research@stepwise-dev# Add marketplace (SSH or HTTPS)
claude plugin marketplace add https://github.com/nikeyes/stepwise-dev.git
# Install only the core workflow
claude plugin install stepwise-core@stepwise-dev
# Optionally add git operations
claude plugin install stepwise-git@stepwise-dev
# Optionally add web research
claude plugin install stepwise-web@stepwise-dev
# Optionally add multi-agent deep research
claude plugin install stepwise-research@stepwise-devRestart Claude Code after installation.
Use --bare with --plugin-dir to load only your local plugin directories, skipping all installed/marketplace plugins:
claude --bare \
--plugin-dir /path/to/stepwise-dev/core \
--plugin-dir /path/to/stepwise-dev/git \
--plugin-dir /path/to/stepwise-dev/web \
--plugin-dir /path/to/stepwise-dev/research--bare disables plugin sync (so installed plugins are ignored) but still loads the directories you pass via --plugin-dir. This means your local changes are tested in isolation without needing to reinstall anything.
Don't have a project to test with? Use stepwise-todo-api-test β a sample repository designed for testing these plugins.
After running thoughts-init (from stepwise-core) in a project:
<your-project>/
βββ thoughts/
β βββ nikey_es/ # Your personal notes (you write)
β β βββ tickets/ # Ticket documentation
β β βββ notes/ # Personal notes
β βββ shared/ # Team-shared documents (Claude writes)
β βββ research/ # Research documents
β βββ plans/ # Implementation plans
β βββ prs/ # PR descriptions
βββ ...
Key distinction:
nikey_es/: Personal tickets/notes you create manuallyshared/: Formal docs Claude generates from commands
Use grep -r thoughts/ to search across all documents.
Use /clear between phases. Knowledge lives in thoughts/, not in the context window.
| Phase | Main command | Helpers (skills / agents) |
|---|---|---|
| Across all phases | /clear between phases |
thoughts-management, thoughts-locator, thoughts-analyzer |
| Before (product side) | /story-splitting |
Applied to the PRD / ticket / use case β not the code |
| π Research | /research-codebase, /deep-research |
codebase-locator, codebase-analyzer, codebase-pattern-finder, web-search-researcher, citation-analyst |
| πΊοΈ Plan | /create-plan, /iterate-plan |
/hamburger-method, /small-safe-steps, /grill-me (stress-test the plan) |
| π οΈ Implement | /implement-plan, /commit |
/tdd (test-first development), /test-desiderata (test quality), /bugmagnet <file> (edge-case & bug hunt) |
| β Validate | /validate-plan |
β |
| π Any web lookup | "search the web for..." | web-search-researcher fires automatically |
/stepwise-core:research-codebase How does authentication work?Spawns parallel agents, searches codebase and thoughts/, generates comprehensive research document.
/stepwise-core:create-plan Add rate limiting to the APIIterates with you 5+ times, creates detailed phases with verification steps. Use /grill-me to stress-test the plan before moving on β it interviews you on every assumption until the design is solid.
/stepwise-core:implement-plan @thoughts/shared/plans/2025-11-09-rate-limiting.mdExecutes one phase at a time, validates before proceeding. Use /tdd to drive the implementation test-first (redβgreenβrefactor). While implementing, lean on /test-desiderata to keep test quality high and /bugmagnet <file> to surface edge cases on a specific module.
/stepwise-core:validate-plan @thoughts/shared/plans/2025-11-09-rate-limiting.mdSystematically verifies the entire implementation.
/stepwise-git:commitCreates clean commits without Claude attribution.
# Research (core)
/stepwise-core:research-codebase Where is user registration handled?
# /clear
# Plan (core)
/stepwise-core:create-plan Add OAuth login support
# /clear
# Implement (core)
/stepwise-core:implement-plan @thoughts/shared/plans/...md
# /clear
# Validate (core)
/stepwise-core:validate-plan @thoughts/shared/plans/...md
# Commit (git)
/stepwise-git:commit# Research external best practices (web)
"What are the best practices for implementing rate limiting in REST APIs?"
# The web-search-researcher agent will be invoked automatically
# Research your codebase (core)
/stepwise-core:research-codebase Where do we handle API rate limiting?
# Continue with plan and implementation...# Check versions
claude plugin list
# Update marketplace and all plugins
claude plugin marketplace update stepwise-dev
claude plugin update stepwise-core@stepwise-dev
claude plugin update stepwise-git@stepwise-dev
claude plugin update stepwise-web@stepwise-dev
claude plugin update stepwise-research@stepwise-dev- Keep context under 60% β past that, accuracy drops.
/clearbetween phases β knowledge lives inthoughts/, not in the context window.- Read a 200-line plan before Claude writes 2,000 lines of code.
- Implement one phase at a time β with its own tests and its own commit.
- Delegate noisy work (web research, large codebase scans) to sub-agents so the parent context stays clean.
/context # Check current usage
/clear # Clear between phasesChange Username: Set export THOUGHTS_USER=your_name or edit the thoughts-init script.
make test # Run all automated tests
make test-verbose # Run tests with debug output
make check # Run shellcheck on bash scripts
make ci # Run full CI validationEach skill has an eval suite in its <skill-name>-workspace/evals/ directory:
core/skills/bugmagnet-workspace/evals/
βββ evals.json # Eval definitions (prompts, assertions, grading guide)
βββ files/ # Test fixtures (source files the skill analyzes)
βββ iteration-1/ # Benchmark run results
β βββ benchmark.json # Machine-readable: per-eval pass rates, timing, tokens
β βββ benchmark.md # Human-readable summary table
β βββ eval-1-name/ # Per-eval evidence
β βββ eval_metadata.json
βββ iteration-2/
βββ ...
Running evals:
/skill-creator:skill-creator Run evals from <skill-name>-workspace/evals/evals.jsonThis runs each eval with-skill and without-skill, grades assertions, and writes results to a new iteration-N/ directory.
Reading benchmark results:
Open iteration-N/benchmark.md for a quick summary table, or benchmark.json for detailed per-assertion evidence. Key metrics:
- pass_rate: percentage of assertions passed (with_skill vs without_skill)
- delta: the skill's added value over baseline β higher is better
- time_seconds / tokens: cost of using the skill
Compare across iterations to track skill improvements over time.
Viewing detailed eval reports:
make eval-list # List skills with eval iterations
make eval-view SKILL=test-desiderata # View latest iteration
make eval-view SKILL=test-desiderata ITER=1 # View specific iteration
make eval-view SKILL=test-desiderata PREV=1 # Compare latest vs iteration-1The viewer opens two tabs: Outputs (per-eval outputs, grading, and feedback) and Benchmark (aggregate pass rates, delta, timing, and token usage).
- Original Article: I mastered the Claude Code workflow by Ashley Ha
- HumanLayer: Original inspiration from HumanLayer's .claude directory
Test improvements in your workflow, document changes, and share with the community.
Apache License 2.0 - See LICENSE file for details.
Derived from HumanLayer's Claude Code workflow under Apache License 2.0.
Several skills are derived from Matt Pocock's skills (grill-me, tdd), eferro's skill-factory (hamburger-method, small-safe-steps, story-splitting, test-desiderata, and tdd/zombies reference) and Gojko Adzic's BugMagnet. See NOTICE for detailed attribution.
Major enhancements:
- Multi-plugin architecture for modular installation
- Specialized agent system
- Local-only thoughts/ management with Agent Skill
- Automated testing infrastructure
- Enhanced TDD-focused success criteria
Happy Coding! π
Questions? Open an issue on GitHub.