Skip to content

haileyvictory/scout

Repository files navigation

Scout — E2E Experience Evaluation Framework

Scout is a structured, AI-powered framework for running end-to-end experience evaluations. Point it at any product experience — a web portal, VS Code extension, CLI tool, AI agent, or API — and it delivers an actionable report with prioritized recommendations backed by real evidence.

What Scout Does

Scout orchestrates a team of specialized AI agents that evaluate your experience from three angles:

Signal Agent What It Does
🔍 Walkthrough Experience Walker Walks through your experience step-by-step, scoring against user-defined quality primitives
💬 Research Researcher Gathers user sentiment from GitHub issues, forums, and community + deep competitive intelligence
📊 Data Data Analyst Analyzes telemetry and funnel drop-offs (requires Kusto/ADX access — see note below)
📋 Actionable Next Steps Report Writer Delivers a prioritized list of improvements based on the findings — each with the problem, a concrete fix, the impact it drives, and what to start on first

These signals are synthesized by the Report Writer into a comprehensive evaluation report. The most valuable output is the prioritized next steps — a ranked list of recommendations (P0 → P3) where each one explains the problem, suggests a specific fix, and quantifies the impact so teams know exactly what to work on and why.

📊 Data signal availability: The Data Analyst requires access to a Kusto/Azure Data Explorer cluster with telemetry data. Without it, Scout runs as a 2-signal framework (walkthrough + research) and delivers a Data Strategy Memo suggesting what to measure. The 2-signal report is still a complete, valuable evaluation — no fake data or empty sections.

What You Get

  • Primitives Scorecard — every quality dimension rated 1–5 with evidence
  • Funnel Analysis — where users drop off and why (requires Kusto/ADX access; gracefully degrades to a Data Strategy Memo)
  • Competitive Intelligence — how competitors approach the same problem, innovation signals, market direction
  • Cross-Signal Correlations — connections between walkthrough findings, user sentiment, and data (depth depends on available telemetry access; report adapts when unavailable)
  • Prioritized Recommendations — P0 through P3, each with problem + fix + impact
  • GitHub Issues — optionally file every recommendation as a tracked issue

Supported Experience Types

Type Examples Primary Tools
Portal / Web UI Azure Portal, dashboards, admin consoles Playwright MCP
VS Code Extension Extensions in VS Code desktop or web Playwright MCP (web), user-assisted walkthrough (desktop)
CLI az, gh, npm, developer CLIs Terminal
Skill / Agent AI agents, Copilot skills Waza, Terminal
API / SDK REST APIs, client SDKs Terminal, HTTP tools

Tools Included

Scout ships with tools pre-configured and recommends additional ones based on your evaluation:

Pre-Installed (ready out of the box)

Tool What It Does How It's Configured
Playwright MCP Browser automation — walks web UIs, takes screenshots, interacts with elements npm dependency + MCP server config
Terminal Command execution for CLI evaluations and data collection Built-in to VS Code

Recommended (prompted during setup)

Tool What It Does How to Get It
Kusto Workbench KQL queries against Azure Data Explorer for telemetry analysis VS Code Extension (auto-recommended)
GitHub CLI Issue filing (Phase 5), repo search, user feedback collection System install: winget install GitHub.cli / brew install gh
Waza Skill/Agent invocation and evaluation Go binary (manual)

Built-In to GitHub Copilot

Tool What It Does
fetch_webpage Fetches web pages for research — forums, docs, competitor sites
Semantic Search Searches the codebase for relevant patterns

Prerequisites

⚠️ Copilot Agent Mode is required. Scout's agents run entirely in GitHub Copilot's Agent Mode. Make sure you have an active Copilot subscription and that Agent Mode is enabled in your VS Code settings.

Workspace Trust: When you first open the Scout repo, VS Code may ask you to trust the workspace. Click "Yes, I trust the authors" — this is required for the auto-install task and MCP server to function.

🔐 Authentication for portals: For experiences that require login (Azure Portal, admin consoles, etc.), Scout will open the browser and ask you to log in once manually. After that, the agent takes over and navigates autonomously. Enterprise SSO/MFA is handled by you — the agent never touches credentials.

Getting Started

1. Clone the repo

git clone https://github.com/haileyhuber8/scout.git
cd scout

2. Install dependencies

npm install

This installs Playwright MCP and other dependencies. VS Code will also prompt to install recommended extensions.

3. Start an evaluation

Open the repo in VS Code, then open GitHub Copilot Chat and switch to Agent Mode using the mode dropdown at the top of the chat panel. The evaluation agent will automatically activate.

Just start talking:

"I want to evaluate the onboarding experience for my VS Code extension"

The Architect agent will guide you through:

  1. Defining what to test — experience type, scope, audience, evaluation question
  2. Discovering primitives — the quality dimensions to score against
  3. Reviewing the plan — full evaluation plan before launch
  4. Tool setup — verifying all needed tools are ready
  5. Running the evaluation — walkthrough + research + data analysis
  6. Delivering the report — synthesized findings with prioritized recommendations
  7. Filing issues — optionally create GitHub issues for each recommendation

Quick Start

When the Architect asks how you'd like to set up:

  • 🚀 Quick Start (~2 min) — smart defaults, review everything in one shot
  • 🔧 Full Customization (~10-15 min) — configure each setting individually

Most users start with Quick Start and customize from there.

See an Example

Check out examples/azure-portal/ for a completed evaluation — including the primitives spec and final report — to see what Scout produces.

How It Works

You → Architect (what to test) → Tool Installer (setup check)
                                        ↓
              ┌─────────────────────────┼─────────────────────────┐
              ↓                         ↓                         ↓
     Experience Walker           Researcher              Data Analyst
     (walks the experience)   (user sentiment +       (telemetry +
                               competitive intel)      funnel analysis)
              └─────────────────────────┼─────────────────────────┘
                                        ↓
                              Architect validates signals
                                        ↓
                              Report Writer synthesizes
                                        ↓
                              Prioritized report + issues

Each evaluation creates a project folder with all artifacts:

my-experience/
├── PRIMITIVES.md          # What you're testing and why
├── README.md              # Evaluation plan
├── project.json           # Configuration
├── experience/
│   ├── walkthroughs/      # Step-by-step walkthrough notes
│   └── screenshots/       # Visual evidence
├── research/
│   ├── sentiment.md       # User feedback analysis
│   ├── competitor-analysis/
│   └── user-feedback/
├── data/
│   ├── telemetry/         # Raw data queries and results
│   ├── baselines/         # Comparison baselines
│   └── funnel-analysis/   # Funnel drop-off analysis
├── analysis/              # Cross-cutting analysis
├── output/
│   └── report.md          # THE REPORT — the primary deliverable
└── assets/                # Supporting files

Evaluation Reports

Reports adapt to your audience and available data:

Audience Depth What's Included
Leadership Executive (~150 lines) Summary, scorecard, P0 recs, competitive headline
PM / Designer Standard (~500 lines) All sections, moderate detail
Engineering Deep Dive (~900 lines) Full analysis, all appendices, data queries

Reports gracefully degrade when signals aren't available — a 2-signal (walkthrough + research) report is still a full, valuable report. No fake data, no empty sections.

Project Structure

scout/
├── .github/
│   ├── agents/
│   │   └── evaluation.agent.md    # The evaluation framework spec (agent instructions)
│   └── copilot-instructions.md    # Fallback guidance for non-Agent-Mode users
├── .vscode/
│   ├── mcp.json               # Pre-configured MCP servers
│   ├── tasks.json             # Auto-install on folder open
│   └── extensions.json        # Recommended VS Code extensions
├── _template/                 # Template for new evaluations
│   ├── project.json
│   ├── PRIMITIVES.md
│   ├── README.md
│   └── (directory structure)
├── examples/                  # Reference evaluations
│   └── azure-portal/          # Completed example (primitives + report)
├── _meta/
│   ├── impact.json            # Framework usage tracking
│   └── config.json            # Framework configuration
├── package.json               # Dependencies (Playwright MCP)
└── README.md                  # This file

Impact Tracking

Scout records evaluation metadata in _meta/impact.json after each run — evaluations completed, findings count, and issues filed. This is lightweight tracking to help you measure Scout's value over time.

Note: Impact tracking is recorded automatically but the dashboard view (_meta/IMPACT.md) is not yet generated automatically. Check _meta/impact.json directly for raw metrics.

Contributing

This is a private repository. To suggest changes, open an issue or submit a pull request.

License

Private — not for redistribution.

About

Scout — End-to-end experience evaluation framework powered by GitHub Copilot

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages