Skip to content

nikeyes/stepwise-dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

115 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Stepwise Dev - Multi-Plugin Suite

Plugin Available License Tests

A modular development workflow suite for Claude Code inspired by Ashley Ha's workflow, adapted to work 100% locally with thoughts.

πŸ“– Read more: Tu CLAUDE.md no funciona sin Context Engineering (Spanish article about Stepwise-dev)

🎯 What This Is

Solves the context management problem: LLMs lose attention after 60% context usage.

Implements Research β†’ Plan β†’ Implement β†’ Validate with frequent /clear and persistent thoughts/ storage.

Philosophy

  • Keep context < 60% (attention threshold)
  • Split work into phases
  • Clear between phases, save to thoughts/
  • Never lose research or decisions

Why This Workflow With AI

More generated code = more risk if you don't have a solid feedback loop.

The faster AI generates code, the more these practices matter:

  • Story Splitting β€” AI can produce a lot in little time. If scope isn't cut, chaos scales just as fast.
  • Hamburger Method β€” Deliver value end-to-end continuously by slicing features into thin vertical layers.
  • Small Safe Steps β€” Each step must be reversible. Speed of generation is not speed to production.
  • Advanced testing β€” Mutation, acceptance, and architectural testing. The feedback loop must be solid. No more excuses.

πŸ“¦ Available Plugins

This repository contains 4 independent plugins that can be installed separately based on your needs:

1. stepwise-core (Core Workflow)

The foundation plugin with the complete Research β†’ Plan β†’ Implement β†’ Validate cycle.

Includes:

  • 13 skills (research-codebase, create-plan, iterate-plan, implement-plan, validate-plan, thoughts-management, bugmagnet, hamburger-method, small-safe-steps, story-splitting, test-desiderata, tdd, grill-me)
  • 5 specialized agents (codebase exploration and thoughts management)

β†’ Read more

2. stepwise-git (Git Operations)

Clean git commit workflow without Claude attribution.

Includes:

  • 1 skill (commit)
  • Smart staging and commit message generation

β†’ Read more

3. stepwise-web (Web Research)

Web search and research capabilities for external context.

Includes:

  • 1 specialized agent (web-search-researcher)
  • Deep web research with source citations

β†’ Read more

4. stepwise-research (Multi-Agent Deep Research)

Advanced multi-agent research system with parallel web searches and synthesis.

Includes:

  • 1 skill (deep-research, includes generate-report script for structured reports)
  • 3 specialized agents (research-lead, research-worker, citation-analyst)
  • Comprehensive research reports with citations and metadata

β†’ Read more

πŸš€ Installation

Option 1: Install All Plugins (Recommended for first-time users)

claude plugin marketplace add https://github.com/nikeyes/stepwise-dev.git

# Install all plugins
claude plugin install stepwise-core@stepwise-dev
claude plugin install stepwise-git@stepwise-dev
claude plugin install stepwise-web@stepwise-dev
claude plugin install stepwise-research@stepwise-dev

Option 2: Install Only What You Need

# Add marketplace (SSH or HTTPS)
claude plugin marketplace add https://github.com/nikeyes/stepwise-dev.git

# Install only the core workflow
claude plugin install stepwise-core@stepwise-dev

# Optionally add git operations
claude plugin install stepwise-git@stepwise-dev

# Optionally add web research
claude plugin install stepwise-web@stepwise-dev

# Optionally add multi-agent deep research
claude plugin install stepwise-research@stepwise-dev

Restart Claude Code after installation.

Local Development (Testing Without Installing)

Use --bare with --plugin-dir to load only your local plugin directories, skipping all installed/marketplace plugins:

claude --bare \
    --plugin-dir /path/to/stepwise-dev/core \
    --plugin-dir /path/to/stepwise-dev/git \
    --plugin-dir /path/to/stepwise-dev/web \
    --plugin-dir /path/to/stepwise-dev/research

--bare disables plugin sync (so installed plugins are ignored) but still loads the directories you pass via --plugin-dir. This means your local changes are tested in isolation without needing to reinstall anything.

πŸ§ͺ Try It Out

Don't have a project to test with? Use stepwise-todo-api-test β€” a sample repository designed for testing these plugins.

πŸ“ Directory Structure

After running thoughts-init (from stepwise-core) in a project:

<your-project>/
β”œβ”€β”€ thoughts/
β”‚   β”œβ”€β”€ nikey_es/          # Your personal notes (you write)
β”‚   β”‚   β”œβ”€β”€ tickets/       # Ticket documentation
β”‚   β”‚   └── notes/         # Personal notes
β”‚   └── shared/            # Team-shared documents (Claude writes)
β”‚       β”œβ”€β”€ research/      # Research documents
β”‚       β”œβ”€β”€ plans/         # Implementation plans
β”‚       └── prs/           # PR descriptions
└── ...

Key distinction:

  • nikey_es/: Personal tickets/notes you create manually
  • shared/: Formal docs Claude generates from commands

Use grep -r thoughts/ to search across all documents.

πŸ”„ The Four-Phase Workflow

Use /clear between phases. Knowledge lives in thoughts/, not in the context window.

Quick reference

Phase Main command Helpers (skills / agents)
Across all phases /clear between phases thoughts-management, thoughts-locator, thoughts-analyzer
Before (product side) /story-splitting Applied to the PRD / ticket / use case β€” not the code
πŸ” Research /research-codebase, /deep-research codebase-locator, codebase-analyzer, codebase-pattern-finder, web-search-researcher, citation-analyst
πŸ—ΊοΈ Plan /create-plan, /iterate-plan /hamburger-method, /small-safe-steps, /grill-me (stress-test the plan)
πŸ› οΈ Implement /implement-plan, /commit /tdd (test-first development), /test-desiderata (test quality), /bugmagnet <file> (edge-case & bug hunt)
βœ… Validate /validate-plan β€”
🌐 Any web lookup "search the web for..." web-search-researcher fires automatically

Phase 1: Research (stepwise-core)

/stepwise-core:research-codebase How does authentication work?

Spawns parallel agents, searches codebase and thoughts/, generates comprehensive research document.

Phase 2: Plan (stepwise-core)

/stepwise-core:create-plan Add rate limiting to the API

Iterates with you 5+ times, creates detailed phases with verification steps. Use /grill-me to stress-test the plan before moving on β€” it interviews you on every assumption until the design is solid.

Phase 3: Implement (stepwise-core)

/stepwise-core:implement-plan @thoughts/shared/plans/2025-11-09-rate-limiting.md

Executes one phase at a time, validates before proceeding. Use /tdd to drive the implementation test-first (red→green→refactor). While implementing, lean on /test-desiderata to keep test quality high and /bugmagnet <file> to surface edge cases on a specific module.

Phase 4: Validate (stepwise-core)

/stepwise-core:validate-plan @thoughts/shared/plans/2025-11-09-rate-limiting.md

Systematically verifies the entire implementation.

Commit (stepwise-git)

/stepwise-git:commit

Creates clean commits without Claude attribution.

πŸ’‘ Usage Examples

Example 1: Complete Feature Development

# Research (core)
/stepwise-core:research-codebase Where is user registration handled?
# /clear

# Plan (core)
/stepwise-core:create-plan Add OAuth login support
# /clear

# Implement (core)
/stepwise-core:implement-plan @thoughts/shared/plans/...md
# /clear

# Validate (core)
/stepwise-core:validate-plan @thoughts/shared/plans/...md

# Commit (git)
/stepwise-git:commit

Example 2: Using Web Research

# Research external best practices (web)
"What are the best practices for implementing rate limiting in REST APIs?"
# The web-search-researcher agent will be invoked automatically

# Research your codebase (core)
/stepwise-core:research-codebase Where do we handle API rate limiting?

# Continue with plan and implementation...

🏷️ Version Management

# Check versions
claude plugin list

# Update marketplace and all plugins
claude plugin marketplace update stepwise-dev

claude plugin update stepwise-core@stepwise-dev
claude plugin update stepwise-git@stepwise-dev
claude plugin update stepwise-web@stepwise-dev
claude plugin update stepwise-research@stepwise-dev

πŸ“ Golden Rules

  1. Keep context under 60% β€” past that, accuracy drops.
  2. /clear between phases β€” knowledge lives in thoughts/, not in the context window.
  3. Read a 200-line plan before Claude writes 2,000 lines of code.
  4. Implement one phase at a time β€” with its own tests and its own commit.
  5. Delegate noisy work (web research, large codebase scans) to sub-agents so the parent context stays clean.
/context  # Check current usage
/clear    # Clear between phases

πŸ”§ Customization

Change Username: Set export THOUGHTS_USER=your_name or edit the thoughts-init script.

πŸ§ͺ Testing

make test          # Run all automated tests
make test-verbose  # Run tests with debug output
make check         # Run shellcheck on bash scripts
make ci            # Run full CI validation

Skill Evaluation

Each skill has an eval suite in its <skill-name>-workspace/evals/ directory:

core/skills/bugmagnet-workspace/evals/
β”œβ”€β”€ evals.json              # Eval definitions (prompts, assertions, grading guide)
β”œβ”€β”€ files/                  # Test fixtures (source files the skill analyzes)
β”œβ”€β”€ iteration-1/            # Benchmark run results
β”‚   β”œβ”€β”€ benchmark.json      # Machine-readable: per-eval pass rates, timing, tokens
β”‚   β”œβ”€β”€ benchmark.md        # Human-readable summary table
β”‚   └── eval-1-name/        # Per-eval evidence
β”‚       └── eval_metadata.json
β”œβ”€β”€ iteration-2/
└── ...

Running evals:

/skill-creator:skill-creator Run evals from <skill-name>-workspace/evals/evals.json

This runs each eval with-skill and without-skill, grades assertions, and writes results to a new iteration-N/ directory.

Reading benchmark results:

Open iteration-N/benchmark.md for a quick summary table, or benchmark.json for detailed per-assertion evidence. Key metrics:

  • pass_rate: percentage of assertions passed (with_skill vs without_skill)
  • delta: the skill's added value over baseline β€” higher is better
  • time_seconds / tokens: cost of using the skill

Compare across iterations to track skill improvements over time.

Viewing detailed eval reports:

make eval-list                                # List skills with eval iterations
make eval-view SKILL=test-desiderata          # View latest iteration
make eval-view SKILL=test-desiderata ITER=1   # View specific iteration
make eval-view SKILL=test-desiderata PREV=1   # Compare latest vs iteration-1

The viewer opens two tabs: Outputs (per-eval outputs, grading, and feedback) and Benchmark (aggregate pass rates, delta, timing, and token usage).

πŸ“š Learn More

🀝 Contributing

Test improvements in your workflow, document changes, and share with the community.

πŸ“„ License

Apache License 2.0 - See LICENSE file for details.

πŸ”– Attribution

Derived from HumanLayer's Claude Code workflow under Apache License 2.0.

Several skills are derived from Matt Pocock's skills (grill-me, tdd), eferro's skill-factory (hamburger-method, small-safe-steps, story-splitting, test-desiderata, and tdd/zombies reference) and Gojko Adzic's BugMagnet. See NOTICE for detailed attribution.

Major enhancements:

  • Multi-plugin architecture for modular installation
  • Specialized agent system
  • Local-only thoughts/ management with Agent Skill
  • Automated testing infrastructure
  • Enhanced TDD-focused success criteria

Happy Coding! πŸš€

Questions? Open an issue on GitHub.

About

Dev workflow for Claude Code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors