From e5b396fbcdbc9458ceac0b9b3315e0aab3e2d0ec Mon Sep 17 00:00:00 2001 From: Ethan Dutton <46871249+ejdutton@users.noreply.github.com> Date: Wed, 3 Jun 2026 15:32:09 -0400 Subject: [PATCH 1/3] feat(corpus): expand seed.yaml from 9 to 237 entries via new importer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Imports all unique plugins from anthropics/claude-plugins-official (205 of 209 raw entries kept) and anthropics/knowledge-work-plugins (30 of 60 — the latter is ~50% mirror entries of the former) via a new committed script at packages/dev-tools/src/import-marketplace.ts (bun run import-marketplace). Maps each upstream entry to a PluginEntry, deduplicates by source URL (preserved VAT-owned entries always win; otherwise alphabetical-first-name wins), and rewrites corpus/seed.yaml with a provenance header. URL composition handles all five upstream source shapes; confidence is URL-based (anthropics owner → first-party, else curated; ./partner-built/ override → curated). Issue #99 slice 1b — follows PR #111 (slice 1a). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + corpus/seed.yaml | 1176 ++++++++++++++++- package.json | 1 + packages/dev-tools/src/import-marketplace.ts | 492 +++++++ .../dev-tools/test/import-marketplace.test.ts | 235 ++++ 5 files changed, 1886 insertions(+), 19 deletions(-) create mode 100644 packages/dev-tools/src/import-marketplace.ts create mode 100644 packages/dev-tools/test/import-marketplace.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 7854fbd6..dfec8b14 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Internal +- **Corpus seed expanded from 9 → 237 entries via a new committed importer at `packages/dev-tools/src/import-marketplace.ts` (`bun run import-marketplace`).** The script fetches `.claude-plugin/marketplace.json` from `anthropics/claude-plugins-official` (205 of 209 raw entries kept) and `anthropics/knowledge-work-plugins` (30 of 60 — the knowledge-work catalog turns out to be ≈50% mirror entries of the official catalog) via `gh api`, maps each upstream entry to a `PluginEntry`, deduplicates by `source` URL (preserved VAT-owned entries always win; otherwise alphabetical-first-name wins within each duplicate cluster), and rewrites `corpus/seed.yaml`. Mapping rules: `bucket: official` uniformly (both catalogs are anthropics-curated marketplaces — `bucket` is the *reporting posture* per slice 1a, not code provenance); `confidence: first-party` for catalog-internal string sources and `github.com/anthropics/...` object sources, else `curated`; the `./partner-built/` knowledge-work convention overrides to `curated`; `maturity: production` for all entries. URL composition handles all five upstream source shapes (string, `git-subdir` ± `ref`, `url` ± `path`, `github`), throwing on unknown discriminators. The seven sample entries from slice 1a are regenerated from upstream manifests on every re-import. Issue #99 slice 1b — follows the schema change from PR #111 (slice 1a). - **Empirical compatibility harness (`packages/dev-tools/src/compat-empirical/`).** Per-#100 research scaffold: a CLI (`predict`/`run`/`judge`/`report`/`all`) that runs candidate skills against claude-code, claude-cowork, and claude-chat, then joins VAT's static predictions with deterministic runtime observations and an LLM-judge semantic read into a reality-vs-prediction matrix. The output is an evidence artifact a follow-up PR will draw on to propose detector improvements, each citing specific (skill, runtime) cells. No detector code or `RUNTIME_PROFILES` changes here. Lives in the private `@vibe-agent-toolkit/dev-tools` package no adopter-facing surface. - **Empirical compat harness v2 (`packages/dev-tools/src/compat-empirical/`).** Foundations PR per [the v2 design](./docs/research/2026-05-23-compat-empirical-harness-v2-design.md). Probe coverage: multi-prompt + repeat-N with adaptive N=3→N=5 extension, mandatory positive+negative prompt pairing per corpus entry, and negative-prompt agreement inversion so false-positive triggers surface as `vat-optimistic`. Evidence quality: deterministic class widened from 6 to 9 values (splitting `error` into `install-failed`/`runtime-error`, `not-invoked` into `not-invoked-engaged`/`not-invoked-empty`, adding `refused`), judge prompt rewritten to v2 with a `refused` verdict. Report fidelity: coverage stats, per-bucket headline (own/official/community × ran/agree/optimistic/pessimistic/gray-zone), gray-zone (mixed-signal) and high-variance subsections, per-attempt variance rendered inline (`runtime-error (2/3) / failed (3/3)`). Judge replay: persisted `judge-calls/---.json` artifacts plus a new `re-judge` subcommand that re-executes them against an optionally different model or freshly-edited system prompt without re-spending operator hours on the runtime side. Two PR-#108 deferred bug fixes also landed: `git fetch --tags --force` before named-ref fetch (annotated tag refresh) and `setup()` teardown-first idempotency for the manual driver. Still private to `@vibe-agent-toolkit/dev-tools`; corpus authoring, first real run, and the docs deliverable are the downstream work. - **Cowork driver spike.** Added [`docs/contributing/cowork-driver-spike.md`](docs/contributing/cowork-driver-spike.md) — a time-boxed investigation (per §4a of the harness v2 design) of whether `claude-cowork` can be driven programmatically by the empirical compat harness today. Verdict: **not feasible**; cowork is a Claude Desktop app product with no public API/CLI surface. The `claude-cowork` runtime stays on `scripted-assisted` until Anthropic ships a Cowork CLI mode, Sessions API, or documented filesystem-import path. Adjacent finding (not a cowork replacement): the public-beta Skills API (`POST /v1/skills` + `container.skills[]` on `/v1/messages`) supports a fully-automatable *new* runtime — captured in the spike doc as a potential follow-up, gated on a separate design decision. diff --git a/corpus/seed.yaml b/corpus/seed.yaml index b06dfcd0..65c5e8d5 100644 --- a/corpus/seed.yaml +++ b/corpus/seed.yaml @@ -1,18 +1,20 @@ # Tracked plugins for `vat corpus scan`. -# See docs/superpowers/specs/2026-05-01-corpus-scan-phase-1-design.md # Source is the unique key. Each entry can carry an optional `validation:` # block with the same shape as `skills.defaults.validation` in # vibe-agent-toolkit.config.yaml — used to silence findings on this # plugin when we've decided the rule is wrong (or not yet right enough). -# All entries start with no validation block; overrides accumulate as -# evidence over scan runs. +# +# Last imported from upstream marketplaces on 2026-06-03 by +# packages/dev-tools/src/import-marketplace.ts +# +# Sources: +# anthropics/claude-plugins-official @ 4979da0 — 205 entries +# anthropics/knowledge-work-plugins @ f53ea6a — 30 entries +# +# Hand-curated entries (preserved on re-import): 2 at top. +# Re-import: bun run import-marketplace plugins: - # Policy: audit the published artifact (claude-marketplace branch or - # marketplace plugins/ path), not source. Source-tree audit can miss - # skills authored as flat-file siblings (e.g. VAT's resources/skills/ - # vat-*.md → dist/skills//SKILL.md), and source layouts can drift - # from what Claude Code actually receives at install time. - source: https://github.com/jdutton/vibe-agent-toolkit.git#claude-marketplace:plugins/vibe-agent-toolkit name: vibe-agent-toolkit bucket: official @@ -23,10 +25,683 @@ plugins: bucket: official confidence: first-party maturity: production - - # claude-plugins-official samples (verified via gh api 2026-05-01) - - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/skill-creator - name: skill-creator + - source: https://github.com/42Crunch-AI/claude-plugins.git#v1.5.5:plugins/api-security-testing + name: 42crunch-api-security-testing + bucket: official + confidence: curated + maturity: production + - source: https://github.com/adobe/skills.git#main:plugins/creative-cloud/adobe-for-creativity + name: adobe-for-creativity + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/agent-sdk-dev + name: agent-sdk-dev + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/SalesforceAIResearch/agentforce-adlc.git + name: agentforce-adlc + bucket: official + confidence: curated + maturity: production + - source: https://github.com/endorlabs/ai-plugins.git + name: ai-plugins + bucket: official + confidence: curated + maturity: production + - source: https://github.com/AikidoSec/aikido-claude-plugin.git + name: aikido + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Airtable/skills.git#main:plugins/airtable + name: airtable + bucket: official + confidence: curated + maturity: production + - source: https://github.com/gemini-cli-extensions/alloydb.git + name: alloydb + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/agent-plugins.git#main:plugins/amazon-location-service + name: amazon-location-service + bucket: official + confidence: curated + maturity: production + - source: https://github.com/amplitude/mcp-marketplace.git#main:plugins/amplitude + name: amplitude + bucket: official + confidence: curated + maturity: production + - source: https://github.com/apolloio/apollo-mcp-plugin.git + name: apollo + bucket: official + confidence: curated + maturity: production + - source: https://github.com/apollographql/skills.git + name: apollo-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/appwrite/claude-plugin.git + name: appwrite + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/asana + name: asana + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/astronomer/agents.git + name: astronomer-data-agents + bucket: official + confidence: curated + maturity: production + - source: https://github.com/atlanhq/agent-toolkit.git + name: atlan + bucket: official + confidence: curated + maturity: production + - source: https://github.com/atlassian/atlassian-mcp-server.git + name: atlassian + bucket: official + confidence: curated + maturity: production + - source: https://github.com/BrainBlend-AI/atomic-agents.git#:claude-plugin/atomic-agents + name: atomic-agents + bucket: official + confidence: curated + maturity: production + - source: https://github.com/auth0/agent-skills.git#main:plugins/auth0 + name: auth0 + bucket: official + confidence: curated + maturity: production + - source: https://github.com/aws/agent-toolkit-for-aws.git#main:plugins/aws-agents + name: aws-agents + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/agent-plugins.git#main:plugins/aws-amplify + name: aws-amplify + bucket: official + confidence: curated + maturity: production + - source: https://github.com/aws/agent-toolkit-for-aws.git#main:plugins/aws-core + name: aws-core + bucket: official + confidence: curated + maturity: production + - source: https://github.com/aws/agent-toolkit-for-aws.git#main:plugins/aws-data-analytics + name: aws-data-analytics + bucket: official + confidence: curated + maturity: production + - source: https://github.com/aws-samples/sample-claude-code-plugins-for-startups.git#main:plugins/aws-dev-toolkit + name: aws-dev-toolkit + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/agent-plugins.git#main:plugins/aws-serverless + name: aws-serverless + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/startups.git#main:advisor/plugins/aws-startup-advisor + name: aws-startup-advisor + bucket: official + confidence: curated + maturity: production + - source: https://github.com/microsoft/azure-skills.git + name: azure + bucket: official + confidence: curated + maturity: production + - source: https://github.com/AzureCosmosDB/cosmosdb-claude-code-plugin.git + name: azure-cosmos-db-assistant + bucket: official + confidence: curated + maturity: production + - source: https://github.com/base44/skills.git + name: base44 + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Bigdata-com/bigdata-plugins-marketplace.git#main:plugins/bigdata-com + name: bigdata-com + bucket: official + confidence: curated + maturity: production + - source: https://github.com/box/box-for-ai.git + name: box + bucket: official + confidence: curated + maturity: production + - source: https://github.com/brightdata/skills.git + name: brightdata-plugin + bucket: official + confidence: curated + maturity: production + - source: https://github.com/buildkite/skills.git + name: buildkite + bucket: official + confidence: curated + maturity: production + - source: https://github.com/carta/plugins.git#main:plugins/carta-cap-table + name: carta-cap-table + bucket: official + confidence: curated + maturity: production + - source: https://github.com/carta/plugins.git#main:plugins/carta-crm + name: carta-crm + bucket: official + confidence: curated + maturity: production + - source: https://github.com/carta/plugins.git#main:plugins/carta-investors + name: carta-investors + bucket: official + confidence: curated + maturity: production + - source: https://github.com/cap-js/mcp-server.git + name: cds-mcp + bucket: official + confidence: curated + maturity: production + - source: https://github.com/ChromeDevTools/chrome-devtools-mcp.git + name: chrome-devtools-mcp + bucket: official + confidence: curated + maturity: production + - source: https://github.com/circlefin/skills.git#master:plugins/circle + name: circle-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/circlebackai/claude-code-plugin.git + name: circleback + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/clangd-lsp + name: clangd-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/claude-code-setup + name: claude-code-setup + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/claude-md-management + name: claude-md-management + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/ClickHouse/clickhouse-claude-code-plugin.git + name: clickhouse + bucket: official + confidence: curated + maturity: production + - source: https://github.com/ClickHouse/agent-skills.git + name: clickhouse-best-practices + bucket: official + confidence: curated + maturity: production + - source: https://github.com/gemini-cli-extensions/cloud-sql-postgresql.git + name: cloud-sql-postgresql + bucket: official + confidence: curated + maturity: production + - source: https://github.com/cloudflare/skills.git + name: cloudflare + bucket: official + confidence: curated + maturity: production + - source: https://github.com/cloudinary-devs/cloudinary-plugin.git + name: cloudinary + bucket: official + confidence: curated + maturity: production + - source: https://github.com/cockroachdb/claude-plugin.git + name: cockroachdb + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/code-modernization + name: code-modernization + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/code-review + name: code-review + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/code-simplifier + name: code-simplifier + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/coderabbitai/skills.git + name: coderabbit + bucket: official + confidence: curated + maturity: production + - source: https://github.com/CodSpeedHQ/codspeed.git + name: codspeed + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/commit-commands + name: commit-commands + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/context7 + name: context7 + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/get-convex/convex-backend-skill.git + name: convex + bucket: official + confidence: curated + maturity: production + - source: https://github.com/CrowdStrike/foundry-skills.git + name: crowdstrike-falcon-foundry + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/csharp-lsp + name: csharp-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/cwc-makers + name: cwc-makers + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/dash0hq/dash0-agent-plugin.git + name: dash0 + bucket: official + confidence: curated + maturity: production + - source: https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack.git + name: data-agent-kit-starter-pack + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/agent-plugins.git#main:plugins/databases-on-aws + name: databases-on-aws + bucket: official + confidence: curated + maturity: production + - source: https://github.com/datadog-labs/claude-code-plugin.git + name: datadog + bucket: official + confidence: curated + maturity: production + - source: https://github.com/datahub-project/datahub-skills.git + name: datahub-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/datarobot-oss/datarobot-agent-skills.git + name: datarobot-agent-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/microsoft/Dataverse-skills.git#main:.github/plugins/dataverse + name: dataverse + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/agent-plugins.git#main:plugins/deploy-on-aws + name: deploy-on-aws + bucket: official + confidence: curated + maturity: production + - source: https://github.com/wonderwhy-er/DesktopCommanderMCP.git#main:plugins/claude + name: desktop-commander + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/discord + name: discord + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/dominodatalab/domino-claude-plugin.git + name: dominodatalab + bucket: official + confidence: curated + maturity: production + - source: https://github.com/duckdb/duckdb-skills.git + name: duckdb-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/DuendeSoftware/duende-skills.git + name: duende-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/exa-labs/exa-mcp-server.git + name: exa + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/explanatory-output-style + name: explanatory-output-style + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/expo/skills.git#main:plugins/expo + name: expo + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/fakechat + name: fakechat + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/fastly/fastly-agent-toolkit.git + name: fastly-agent-toolkit + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/feature-dev + name: feature-dev + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/voxel51/fiftyone-skills.git + name: fiftyone + bucket: official + confidence: curated + maturity: production + - source: https://github.com/figma/mcp-server-guide.git + name: figma + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/firebase + name: firebase + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/firecrawl/firecrawl-claude-plugin.git + name: firecrawl + bucket: official + confidence: curated + maturity: production + - source: https://github.com/atlassian/forge-skills.git + name: forge-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/frontend-design + name: frontend-design + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/fullstorydev/fullstory-skills.git + name: fullstory + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/github + name: github + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/gitlab + name: gitlab + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/gopls-lsp + name: gopls-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/greptile + name: greptile + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/hookify + name: hookify + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/huggingface/skills.git + name: huggingface-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/hunter-io/claude-plugin.git + name: hunter + bucket: official + confidence: curated + maturity: production + - source: https://github.com/heygen-com/hyperframes.git + name: hyperframes + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/imessage + name: imessage + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/intercom/claude-plugin-external.git + name: intercom + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/jdtls-lsp + name: jdtls-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/jfrog/claude-plugin.git + name: jfrog + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/kotlin-lsp + name: kotlin-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/laravel-boost + name: laravel-boost + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/learning-output-style + name: learning-output-style + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/legalzoom/claude-plugins.git#main:plugins/legalzoom + name: legalzoom + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/linear + name: linear + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/Shopify/liquid-skills.git#main:plugins/liquid-lsp + name: liquid-lsp + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Shopify/liquid-skills.git#main:plugins/liquid-skills + name: liquid-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/pydantic/skills.git#main:plugins/logfire + name: logfire + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/lua-lsp + name: lua-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/ory/lumen.git + name: lumen + bucket: official + confidence: curated + maturity: production + - source: https://github.com/lusha-oss/lusha-mcp-plugin.git + name: lusha + bucket: official + confidence: curated + maturity: production + - source: https://github.com/mapbox/mapbox-agent-skills.git + name: mapbox + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/math-olympiad + name: math-olympiad + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/modelcontextprotocol/ext-apps.git#main:plugins/mcp-apps + name: mcp-apps + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/mcp-server-dev + name: mcp-server-dev + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/mcp-tunnels + name: mcp-tunnels + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/mercadopago/mercadopago-claude-marketplace.git#main:plugins/mercadopago + name: mercadopago + bucket: official + confidence: curated + maturity: production + - source: https://github.com/MicrosoftDocs/mcp.git + name: microsoft-docs + bucket: official + confidence: curated + maturity: production + - source: https://github.com/awslabs/startups.git#main:migrate/plugins/migration-to-aws + name: migration-to-aws + bucket: official + confidence: curated + maturity: production + - source: https://github.com/mintlify/mintlify-claude-plugin.git + name: mintlify + bucket: official + confidence: curated + maturity: production + - source: https://github.com/miroapp/miro-ai.git#main:claude-plugins/miro + name: miro + bucket: official + confidence: curated + maturity: production + - source: https://github.com/mongodb/agent-skills.git + name: mongodb + bucket: official + confidence: curated + maturity: production + - source: https://github.com/neondatabase/agent-skills.git#main:plugins/neon-postgres + name: neon + bucket: official + confidence: curated + maturity: production + - source: https://github.com/netlify/context-and-tools.git + name: netlify-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/oracle/netsuite-suitecloud-sdk.git#master:packages/agent-skills + name: netsuite-suitecloud + bucket: official + confidence: curated + maturity: production + - source: https://github.com/nvsecurity/nightvision-skills.git + name: nightvision + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Nimbleway/agent-skills.git + name: nimble + bucket: official + confidence: curated + maturity: production + - source: https://github.com/makenotion/claude-code-notion-plugin.git + name: notion + bucket: official + confidence: curated + maturity: production + - source: https://github.com/NVIDIA/skills.git#main:plugins/nvidia-skills + name: nvidia-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/oracle-samples/oracle-aidp-samples.git#main:ai/claude-code-plugins/oracle-ai-data-platform-workbench-spark-connectors + name: oracle-ai-data-platform-workbench-spark-connectors + bucket: official + confidence: curated + maturity: production + - source: https://github.com/growthxai/output.git#main:coding_assistants/claude/plugins/outputai + name: outputai + bucket: official + confidence: curated + maturity: production + - source: https://github.com/PagerDuty/claude-code-plugins.git + name: pagerduty + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/php-lsp + name: php-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/gopigment/ai-plugins.git + name: pigment + bucket: official + confidence: curated + maturity: production + - source: https://github.com/pinecone-io/pinecone-claude-code-plugin.git + name: pinecone + bucket: official + confidence: curated + maturity: production + - source: https://github.com/planetscale/claude-plugin.git + name: planetscale + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/playground + name: playground + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/playwright + name: playwright bucket: official confidence: first-party maturity: production @@ -35,30 +710,493 @@ plugins: bucket: official confidence: first-party maturity: production - - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/code-review - name: code-review + - source: https://github.com/PostHog/ai-plugin.git + name: posthog bucket: official - confidence: first-party + confidence: curated + maturity: production + - source: https://github.com/gitroomhq/postiz-agent.git + name: postiz + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Postman-Devrel/postman-claude-code-plugin.git + name: postman + bucket: official + confidence: curated maturity: production - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/pr-review-toolkit name: pr-review-toolkit bucket: official confidence: first-party maturity: production - - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/feature-dev - name: feature-dev + - source: https://github.com/prisma/claude-plugin.git + name: prisma + bucket: official + confidence: curated + maturity: production + - source: https://github.com/pydantic/skills.git#main:plugins/ai + name: pydantic-ai + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/pyright-lsp + name: pyright-lsp bucket: official confidence: first-party maturity: production - - # knowledge-work-plugins samples (each plugin is at the repo root, not under plugins/) + - source: https://github.com/qdrant/skills.git + name: qdrant-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/qodo-ai/qodo-skills.git + name: qodo-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/TheQtCompanyRnD/agent-skills.git + name: qt-development-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/quarkusio/quarkus-agent-mcp.git + name: quarkus-agent + bucket: official + confidence: curated + maturity: production + - source: https://github.com/railwayapp/railway-skills.git#main:plugins/railway + name: railway + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/ralph-loop + name: ralph-loop + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/RevenueCat/rc-claude-code-plugin.git#:revenuecat + name: rc + bucket: official + confidence: curated + maturity: production + - source: https://github.com/redis/agent-skills.git#main:plugins/redis-development + name: redis-development + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Digital-Process-Tools/claude-remember.git + name: remember + bucket: official + confidence: curated + maturity: production + - source: https://github.com/resend/resend-skills.git + name: resend + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Rootly-AI-Labs/rootly-claude-plugin.git + name: rootly + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/ruby-lsp + name: ruby-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/runwayml/skills.git + name: runway-api + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/rust-analyzer-lsp + name: rust-analyzer-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/awslabs/agent-plugins.git#main:plugins/sagemaker-ai + name: sagemaker-ai + bucket: official + confidence: curated + maturity: production + - source: https://github.com/sanity-io/agent-toolkit.git + name: sanity + bucket: official + confidence: curated + maturity: production + - source: https://github.com/SAP/open-ux-tools.git#main:packages/fiori-mcp-server + name: sap-fiori-mcp-server + bucket: official + confidence: curated + maturity: production + - source: https://github.com/SAP/mdk-mcp-server.git + name: sap-mdk-server + bucket: official + confidence: curated + maturity: production + - source: https://github.com/spotify/save-to-spotify.git#main:plugin + name: save-to-spotify + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/security-guidance + name: security-guidance + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/semgrep/mcp-marketplace.git#:plugin + name: semgrep + bucket: official + confidence: curated + maturity: production + - source: https://github.com/getsentry/sentry-for-claude.git + name: sentry + bucket: official + confidence: curated + maturity: production + - source: https://github.com/getsentry/cli.git#main:plugins/sentry-cli + name: sentry-cli + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/serena + name: serena + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/ServiceNow/sdk.git#master:providers/claude/plugin + name: servicenow-sdk + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/session-report + name: session-report + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/Shopify/shopify-plugins.git + name: shopify + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Shopify/Shopify-AI-Toolkit.git + name: shopify-ai-toolkit + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/skill-creator + name: skill-creator + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/slackapi/slack-mcp-plugin.git + name: slack + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Snowflake-Labs/snowflake-ai-kit.git#main:plugins/cortex-code + name: snowflake-cortex-code + bucket: official + confidence: curated + maturity: production + - source: https://github.com/SonarSource/sonarqube-agent-plugins.git + name: sonarqube + bucket: official + confidence: curated + maturity: production + - source: https://github.com/sonatype/sonatype-guide-claude-plugin.git + name: sonatype-guide + bucket: official + confidence: curated + maturity: production + - source: https://github.com/sourcegraph-community/sourcegraph-claudecode-plugin.git + name: sourcegraph + bucket: official + confidence: curated + maturity: production + - source: https://github.com/spotify/ads-claude-plugin.git + name: spotify-ads-api + bucket: official + confidence: curated + maturity: production + - source: https://github.com/stripe/ai.git#main:providers/claude/plugin + name: stripe + bucket: official + confidence: curated + maturity: production + - source: https://github.com/sumup/sumup-skills.git#:providers/claude/plugin + name: sumup + bucket: official + confidence: curated + maturity: production + - source: https://github.com/supabase-community/supabase-plugin.git + name: supabase + bucket: official + confidence: curated + maturity: production + - source: https://github.com/obra/superpowers.git + name: superpowers + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/swift-lsp + name: swift-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/JetBrains/teamcity-cli.git + name: teamcity-cli + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/telegram + name: telegram + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:external_plugins/terraform + name: terraform + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/togethercomputer/skills.git + name: togetherai-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/twilio/ai.git + name: twilio-developer-kit + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/claude-plugins-official.git#main:plugins/typescript-lsp + name: typescript-lsp + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/UI5/plugins-coding-agents.git#main:plugins/ui5 + name: ui5 + bucket: official + confidence: curated + maturity: production + - source: https://github.com/UI5/plugins-coding-agents.git#main:plugins/ui5-typescript-conversion + name: ui5-typescript-conversion + bucket: official + confidence: curated + maturity: production + - source: https://github.com/VantaInc/vanta-mcp-plugin.git + name: vanta-mcp-plugin + bucket: official + confidence: curated + maturity: production + - source: https://github.com/vercel/vercel-plugin.git + name: vercel + bucket: official + confidence: curated + maturity: production + - source: https://github.com/explorium-ai/vibeprospecting-plugin.git + name: vibe-prospecting + bucket: official + confidence: curated + maturity: production + - source: https://github.com/windsor-ai/claude-windsor-ai-plugin.git + name: windsor-ai + bucket: official + confidence: curated + maturity: production + - source: https://github.com/wix/skills.git + name: wix + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Automattic/claude-code-wordpress.com.git + name: wordpress-com + bucket: official + confidence: curated + maturity: production + - source: https://github.com/workos/skills.git#main:plugins/workos + name: workos + bucket: official + confidence: curated + maturity: production + - source: https://github.com/youdotcom-oss/agent-skills.git + name: youdotcom-agent-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/zapier/zapier-mcp.git#main:plugins/zapier + name: zapier + bucket: official + confidence: curated + maturity: production + - source: https://github.com/zilliztech/zilliz-plugin.git#:plugins/zilliz + name: zilliz + bucket: official + confidence: curated + maturity: production + - source: https://github.com/zoom/zoom-plugin.git + name: zoom-plugin + bucket: official + confidence: curated + maturity: production + - source: https://github.com/Zoominfo/zoominfo-mcp-plugin.git + name: zoominfo + bucket: official + confidence: curated + maturity: production + - source: https://github.com/zscaler/zscaler-mcp-server.git + name: zscaler + bucket: official + confidence: curated + maturity: production + - source: https://github.com/amekala/adspirer-mcp-plugin.git + name: knowledge-work-adspirer-ads-agent + bucket: official + confidence: curated + maturity: production + - source: techwolf-ai/ai-first-toolkit#main:plugins/ai-firstify + name: knowledge-work-ai-firstify + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:partner-built/apollo + name: knowledge-work-apollo + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:bio-research + name: knowledge-work-bio-research + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:partner-built/brand-voice + name: knowledge-work-brand-voice + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:partner-built/common-room + name: knowledge-work-common-room + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:cowork-plugin-management + name: knowledge-work-cowork-plugin-management + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:customer-support + name: knowledge-work-customer-support + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/daloopa/plugin.git + name: knowledge-work-daloopa + bucket: official + confidence: curated + maturity: production - source: https://github.com/anthropics/knowledge-work-plugins.git#main:data name: knowledge-work-data bucket: official confidence: first-party maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:design + name: knowledge-work-design + bucket: official + confidence: first-party + maturity: production - source: https://github.com/anthropics/knowledge-work-plugins.git#main:engineering name: knowledge-work-engineering bucket: official confidence: first-party maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:enterprise-search + name: knowledge-work-enterprise-search + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:finance + name: knowledge-work-finance + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:human-resources + name: knowledge-work-human-resources + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:legal + name: knowledge-work-legal + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/LSEG-API-Samples/lseg-claude-plugin.git + name: knowledge-work-lseg + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:marketing + name: knowledge-work-marketing + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:operations + name: knowledge-work-operations + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:pdf-viewer + name: knowledge-work-pdf-viewer + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:product-management + name: knowledge-work-product-management + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/Accoil/product-tracking-skills.git + name: knowledge-work-product-tracking-skills + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:productivity + name: knowledge-work-productivity + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:sales + name: knowledge-work-sales + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/searchfit/searchfit-seo.git + name: knowledge-work-searchfit-seo + bucket: official + confidence: curated + maturity: production + - source: https://github.com/ServiceNow/sdk.git#main:providers/claude/plugin + name: knowledge-work-servicenow-sdk + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:partner-built/slack + name: knowledge-work-slack-by-salesforce + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:small-business + name: knowledge-work-small-business + bucket: official + confidence: first-party + maturity: production + - source: https://github.com/kensho-technologies/spglobal-agent-skills.git#main:plugins/spglobal-plugin + name: knowledge-work-sp-global + bucket: official + confidence: curated + maturity: production + - source: https://github.com/anthropics/knowledge-work-plugins.git#main:partner-built/zoom-plugin + name: knowledge-work-zoom-plugin + bucket: official + confidence: curated + maturity: production diff --git a/package.json b/package.json index 89a122e9..f469b1aa 100644 --- a/package.json +++ b/package.json @@ -61,6 +61,7 @@ "determine-publish-tags": "tsx packages/dev-tools/src/determine-publish-tags.ts", "publish-with-rollback": "tsx packages/dev-tools/src/publish-with-rollback.ts", "extract-changelog": "tsx packages/dev-tools/src/extract-changelog.ts", + "import-marketplace": "tsx packages/dev-tools/src/import-marketplace.ts", "fix-workspace-deps": "tsx packages/dev-tools/src/fix-workspace-deps.ts", "resolve-workspace-deps": "tsx packages/dev-tools/src/resolve-workspace-deps.ts", "prepare-cli-bin": "cd packages/cli && tsx ../dev-tools/src/prepare-bin.ts", diff --git a/packages/dev-tools/src/import-marketplace.ts b/packages/dev-tools/src/import-marketplace.ts new file mode 100644 index 00000000..42fdfd76 --- /dev/null +++ b/packages/dev-tools/src/import-marketplace.ts @@ -0,0 +1,492 @@ +/** + * Importer for `corpus/seed.yaml` — generates entries from upstream marketplaces. + * + * Fetches `.claude-plugin/marketplace.json` from + * `anthropics/claude-plugins-official` and `anthropics/knowledge-work-plugins`, + * maps each plugin to a PluginEntry, and rewrites `corpus/seed.yaml`. + * + * Mapping rules and design are documented in + * `~/code/vat-issue-99-slice-1b-plan.md` (slice 1b of issue #99). + * + * Usage: + * bun run import-marketplace + * + * Exit codes: + * 0 - success (seed.yaml written) + * 1 - failure (network, schema mismatch, name collision, unknown source shape) + */ + +/* eslint-disable security/detect-non-literal-fs-filename */ +// File paths derived from PROJECT_ROOT (controlled, not user input) + +import { writeFileSync } from 'node:fs'; +import { pathToFileURL } from 'node:url'; + +import { safeExecSync, safePath } from '@vibe-agent-toolkit/utils'; +import { z } from 'zod'; + +import { PROJECT_ROOT, log } from './common.js'; + +// --------------------------------------------------------------------------- +// Catalog config — both upstream catalogs use `main` as their default branch. +// --------------------------------------------------------------------------- + +interface Catalog { + /** GitHub owner */ + owner: string; + /** GitHub repo name */ + name: string; + /** Full clone URL */ + cloneUrl: string; + /** Default branch (used as `` for string-shape entries that live in this catalog) */ + ref: string; + /** Prefix applied to all names from this catalog (collision-avoidance) */ + namePrefix: string; +} + +export const CATALOG_OFFICIAL: Catalog = { + owner: 'anthropics', + name: 'claude-plugins-official', + cloneUrl: 'https://github.com/anthropics/claude-plugins-official.git', + ref: 'main', + namePrefix: '', +}; + +export const CATALOG_KNOWLEDGE_WORK: Catalog = { + owner: 'anthropics', + name: 'knowledge-work-plugins', + cloneUrl: 'https://github.com/anthropics/knowledge-work-plugins.git', + ref: 'main', + namePrefix: 'knowledge-work-', +}; + +// --------------------------------------------------------------------------- +// Upstream manifest schema — Postel's law: passthrough for external data. +// We only read `name`, `source`, and (optionally) `author`. Everything else +// the upstream carries (description, category, homepage, etc.) is silently +// discarded. +// --------------------------------------------------------------------------- + +const SourceObjectSchema = z + .object({ + source: z.string(), + }) + .passthrough(); + +const UpstreamEntrySchema = z + .object({ + name: z.string().min(1), + source: z.union([z.string(), SourceObjectSchema]), + author: z + .object({ name: z.string() }) + .passthrough() + .optional(), + }) + .passthrough(); + +const ManifestSchema = z + .object({ + plugins: z.array(UpstreamEntrySchema), + }) + .passthrough(); + +export type UpstreamEntry = z.infer; + +// --------------------------------------------------------------------------- +// Output entry shape — kept in sync with `PluginEntrySchema` in +// `packages/cli/src/commands/corpus/seed.ts`. We hand-write entries rather +// than importing the schema (avoids dev-tools → cli reverse dependency). +// The committed seed is validated by `loadSeedFile()` at downstream load time. +// --------------------------------------------------------------------------- + +interface PluginEntry { + source: string; + name: string; + bucket: 'official'; + confidence: 'first-party' | 'curated'; + maturity: 'production'; +} + +const NAME_REGEX = /^[A-Za-z0-9_-]+$/; +const FIRST_PARTY: PluginEntry['confidence'] = 'first-party'; +const CURATED: PluginEntry['confidence'] = 'curated'; + +// Hand-curated entries that pre-date the import — VAT-owned plugins that +// don't live in either upstream catalog. Re-emitted verbatim on every run. +const PRESERVED_ENTRIES: PluginEntry[] = [ + { + source: + 'https://github.com/jdutton/vibe-agent-toolkit.git#claude-marketplace:plugins/vibe-agent-toolkit', + name: 'vibe-agent-toolkit', + bucket: 'official', + confidence: FIRST_PARTY, + maturity: 'production', + }, + { + source: 'https://github.com/jdutton/vibe-validate.git#claude-marketplace', + name: 'vibe-validate', + bucket: 'official', + confidence: FIRST_PARTY, + maturity: 'production', + }, +]; + +// --------------------------------------------------------------------------- +// Fetch helpers — both use `gh api` via safeExecSync so this script inherits +// the same auth setup as ad-hoc `gh` commands. +// --------------------------------------------------------------------------- + +function ghFetch(args: string[]): string { + const out = safeExecSync('gh', args, { encoding: 'utf8' }); + return typeof out === 'string' ? out : out.toString('utf8'); +} + +function fetchManifest(catalog: Catalog): z.infer { + const raw = ghFetch([ + 'api', + `repos/${catalog.owner}/${catalog.name}/contents/.claude-plugin/marketplace.json`, + '-H', + 'Accept: application/vnd.github.raw', + ]); + return ManifestSchema.parse(JSON.parse(raw)); +} + +function fetchCatalogSha(catalog: Catalog): string { + const raw = ghFetch([ + 'api', + `repos/${catalog.owner}/${catalog.name}/commits/${catalog.ref}`, + '--jq', + '.sha', + ]); + return raw.trim().slice(0, 7); +} + +// --------------------------------------------------------------------------- +// Mapping primitives — exported so unit tests can exercise each rule directly. +// --------------------------------------------------------------------------- + +/** Replace anything outside the schema's name regex with `-`. */ +export function mungeName(name: string): string { + return name.replaceAll(/[^A-Za-z0-9_-]+/g, '-'); +} + +/** Compose the canonical `.git#:` source URL. */ +export function composeSourceUrl(entry: UpstreamEntry, catalog: Catalog): string { + const src = entry.source; + + // String shape — entry lives inside the catalog repo. + if (typeof src === 'string') { + const stripped = src.replace(/^\.\//, ''); + return `${catalog.cloneUrl}#${catalog.ref}:${stripped}`; + } + + // Object shape — discriminated by `source.source`. `git-subdir` and `url` + // both carry an optional `path` and an optional `ref`; omit either when + // absent and let the audit clone fall back to the repo's default branch. + const disc = src.source; + if (disc === 'git-subdir' || disc === 'url') { + const url = readString(src, 'url', entry.name); + const ref = (src as Record)['ref']; + const path = (src as Record)['path']; + const refStr = typeof ref === 'string' ? ref : ''; + const pathStr = typeof path === 'string' ? path : ''; + if (refStr === '' && pathStr === '') return url; + return `${url}#${refStr}:${pathStr}`; + } + if (disc === 'github') { + const repo = readString(src, 'repo', entry.name); + return `https://github.com/${repo}.git`; + } + throw new Error( + `Entry "${entry.name}": unknown source discriminator "${disc}". ` + + `Update import-marketplace.ts to handle this shape.`, + ); +} + +function readString(obj: Record, key: string, ownerName: string): string { + const v = obj[key]; + if (typeof v !== 'string' || v.length === 0) { + throw new Error( + `Entry "${ownerName}": expected string at source.${key}, got ${typeof v} (${JSON.stringify(v)})`, + ); + } + return v; +} + +/** + * Derive `confidence` from the upstream `source` URL. + * + * - String shape → `first-party` (both catalogs are anthropics-owned), unless + * the path starts with `./partner-built/` (knowledge-work convention for + * vendor-contributed plugins). + * - Object shape → `first-party` iff the resolved GitHub owner is `anthropics`, + * else `curated`. + * + * The `author` field on upstream entries is NOT consulted (40% of entries + * lack it, and the field is inconsistent across catalogs). + */ +export function deriveConfidence(entry: UpstreamEntry): PluginEntry['confidence'] { + const src = entry.source; + + if (typeof src === 'string') { + return src.startsWith('./partner-built/') ? CURATED : FIRST_PARTY; + } + + const disc = src.source; + let url: string | undefined; + if (disc === 'github') { + const repo = (src as Record)['repo']; + if (typeof repo === 'string') { + url = `https://github.com/${repo}.git`; + } + } else { + const u = (src as Record)['url']; + if (typeof u === 'string') { + url = u; + } + } + + if (url === undefined) { + return CURATED; + } + const ownerMatch = /^https:\/\/github\.com\/([^/]+)\//.exec(url); + return ownerMatch?.[1] === 'anthropics' ? FIRST_PARTY : CURATED; +} + +export function mapEntry(entry: UpstreamEntry, catalog: Catalog): PluginEntry { + const name = mungeName(`${catalog.namePrefix}${entry.name}`); + if (!NAME_REGEX.test(name)) { + throw new Error( + `Entry "${entry.name}" → "${name}" still fails name regex after munging`, + ); + } + return { + source: composeSourceUrl(entry, catalog), + name, + bucket: 'official', + confidence: deriveConfidence(entry), + maturity: 'production', + }; +} + +// --------------------------------------------------------------------------- +// Deduplication & uniqueness checks +// +// `loadSeedFile()` treats `source` as the unique key (it throws on dupes). +// But the upstream catalogs intentionally list the same plugin under multiple +// presentation-name aliases (e.g. `data`, `data-engineering`, and +// `astronomer-data-agents` all resolve to the same `github.com/astronomer/agents` +// repo). For the seed — which represents *unique audit targets* — we +// deduplicate by source URL and keep the alphabetical-first name. +// --------------------------------------------------------------------------- + +interface CombineResult { + /** Final entry list in seed.yaml order: preserved, then official, then knowledge-work. */ + final: PluginEntry[]; + /** Count of official entries that survived dedup. */ + officialKept: number; + /** Count of knowledge-work entries that survived dedup. */ + kwKept: number; + /** Names that were dropped because their source URL had already been claimed. */ + droppedNames: string[]; +} + +/** + * Merge preserved + official + knowledge-work entries, dropping duplicates by + * source URL. Preserved entries always win; within imports, the first + * occurrence wins (callers pass alphabetically-sorted arrays, so the + * alphabetical-first name in each duplicate cluster lands in the seed). + */ +export function combineAndDedupe( + official: PluginEntry[], + kw: PluginEntry[], +): CombineResult { + const seen = new Set(); + const final: PluginEntry[] = []; + const droppedNames: string[] = []; + + for (const e of PRESERVED_ENTRIES) { + seen.add(e.source); + final.push(e); + } + + let officialKept = 0; + for (const e of official) { + if (seen.has(e.source)) { + droppedNames.push(e.name); + continue; + } + seen.add(e.source); + final.push(e); + officialKept++; + } + + let kwKept = 0; + for (const e of kw) { + if (seen.has(e.source)) { + droppedNames.push(e.name); + continue; + } + seen.add(e.source); + final.push(e); + kwKept++; + } + + return { final, officialKept, kwKept, droppedNames }; +} + +function assertUniqueNames(entries: PluginEntry[]): void { + const names = new Set(); + for (const e of entries) { + if (names.has(e.name)) { + throw new Error(`Duplicate name after mapping: ${e.name}`); + } + names.add(e.name); + } +} + +// --------------------------------------------------------------------------- +// YAML output — written by hand (rather than via `yaml.stringify`) to keep +// the file format byte-identical across runs with no upstream changes. +// --------------------------------------------------------------------------- + +interface ImportCounts { + official: number; + knowledgeWork: number; +} + +function buildHeader(officialSha: string, kwSha: string, counts: ImportCounts): string { + const date = new Date().toISOString().slice(0, 10); + return [ + `# Tracked plugins for \`vat corpus scan\`.`, + `# Source is the unique key. Each entry can carry an optional \`validation:\``, + `# block with the same shape as \`skills.defaults.validation\` in`, + `# vibe-agent-toolkit.config.yaml — used to silence findings on this`, + `# plugin when we've decided the rule is wrong (or not yet right enough).`, + `#`, + `# Last imported from upstream marketplaces on ${date} by`, + `# packages/dev-tools/src/import-marketplace.ts`, + `#`, + `# Sources:`, + `# anthropics/claude-plugins-official @ ${officialSha} — ${counts.official} entries`, + `# anthropics/knowledge-work-plugins @ ${kwSha} — ${counts.knowledgeWork} entries`, + `#`, + `# Hand-curated entries (preserved on re-import): ${PRESERVED_ENTRIES.length} at top.`, + `# Re-import: bun run import-marketplace`, + ``, + ``, + ].join('\n'); +} + +function stringifyEntries(entries: PluginEntry[]): string { + return entries + .map( + e => + ` - source: ${e.source}\n` + + ` name: ${e.name}\n` + + ` bucket: ${e.bucket}\n` + + ` confidence: ${e.confidence}\n` + + ` maturity: ${e.maturity}\n`, + ) + .join(''); +} + +// --------------------------------------------------------------------------- +// Main +// --------------------------------------------------------------------------- + +function run(): void { + log('Fetching upstream manifests via gh CLI…', 'cyan'); + + const official = fetchManifest(CATALOG_OFFICIAL); + const kw = fetchManifest(CATALOG_KNOWLEDGE_WORK); + const officialSha = fetchCatalogSha(CATALOG_OFFICIAL); + const kwSha = fetchCatalogSha(CATALOG_KNOWLEDGE_WORK); + + log( + ` claude-plugins-official @ ${officialSha}: ${official.plugins.length} upstream entries`, + 'reset', + ); + log( + ` knowledge-work-plugins @ ${kwSha}: ${kw.plugins.length} upstream entries`, + 'reset', + ); + + // Sort by name within each catalog so the diff is stable when upstream + // re-orders entries (which they do frequently). + const officialEntries = official.plugins + .map(e => mapEntry(e, CATALOG_OFFICIAL)) + .sort((a, b) => a.name.localeCompare(b.name)); + const kwEntries = kw.plugins + .map(e => mapEntry(e, CATALOG_KNOWLEDGE_WORK)) + .sort((a, b) => a.name.localeCompare(b.name)); + + const { final, officialKept, kwKept, droppedNames } = combineAndDedupe( + officialEntries, + kwEntries, + ); + assertUniqueNames(final); + + const mungedCount = + countMunged(official.plugins, '') + + countMunged(kw.plugins, CATALOG_KNOWLEDGE_WORK.namePrefix); + + log('', 'reset'); + log('Mapping summary:', 'cyan'); + log(` Preserved entries: ${PRESERVED_ENTRIES.length}`, 'reset'); + log(` Imported (official, raw): ${officialEntries.length}`, 'reset'); + log(` Imported (knowledge-work, raw): ${kwEntries.length}`, 'reset'); + log( + ` Duplicate-source aliases dropped: ${droppedNames.length}`, + droppedNames.length > 0 ? 'yellow' : 'reset', + ); + if (droppedNames.length > 0) { + log(` [${droppedNames.join(', ')}]`, 'yellow'); + } + log(` Official kept: ${officialKept}`, 'reset'); + log(` Knowledge-work kept: ${kwKept}`, 'reset'); + log(` Total in seed.yaml: ${final.length}`, 'green'); + log( + ` Names that required munging: ${mungedCount}`, + mungedCount > 0 ? 'yellow' : 'reset', + ); + + const output = + buildHeader(officialSha, kwSha, { + official: officialKept, + knowledgeWork: kwKept, + }) + + `plugins:\n` + + stringifyEntries(final); + + const seedPath = safePath.join(PROJECT_ROOT, 'corpus', 'seed.yaml'); + writeFileSync(seedPath, output, 'utf8'); + log('', 'reset'); + log(`✓ Wrote ${seedPath}`, 'green'); +} + +function countMunged(entries: UpstreamEntry[], prefix: string): number { + let count = 0; + for (const e of entries) { + const raw = `${prefix}${e.name}`; + if (mungeName(raw) !== raw) { + count++; + } + } + return count; +} + +// Only run when invoked directly (e.g. `bun run import-marketplace`). When +// imported by unit tests, this top-level branch is skipped so importing the +// module doesn't fetch from gh or rewrite seed.yaml. +const invokedDirectly = + process.argv[1] !== undefined && import.meta.url === pathToFileURL(process.argv[1]).href; + +if (invokedDirectly) { + try { + run(); + } catch (err) { + log(`✗ ${(err as Error).message}`, 'red'); + process.exit(1); + } +} diff --git a/packages/dev-tools/test/import-marketplace.test.ts b/packages/dev-tools/test/import-marketplace.test.ts new file mode 100644 index 00000000..bb1669eb --- /dev/null +++ b/packages/dev-tools/test/import-marketplace.test.ts @@ -0,0 +1,235 @@ +/** + * Unit tests for import-marketplace.ts mapping primitives. + * + * The end-to-end importer is exercised by running the script against the + * live upstream marketplaces during slice 1b development; these tests + * pin down the pure-function building blocks so future refactors stay safe. + */ +import { describe, expect, it } from 'vitest'; + +import { + CATALOG_KNOWLEDGE_WORK, + CATALOG_OFFICIAL, + combineAndDedupe, + composeSourceUrl, + deriveConfidence, + mapEntry, + mungeName, + type UpstreamEntry, +} from '../src/import-marketplace.js'; + +// Small factory to keep individual tests focused on the inputs that matter. +function upstream(name: string, source: UpstreamEntry['source']): UpstreamEntry { + return { name, source }; +} + +// Literals referenced by 3+ tests, pulled out to satisfy sonarjs/no-duplicate-string. +const SKILL_CREATOR = 'skill-creator'; +const GIT_SUBDIR = 'git-subdir'; +const FIRST_PARTY = 'first-party'; +const EXAMPLE_FOO_URL = 'https://github.com/example/foo.git'; + +describe('mungeName', () => { + it('leaves a valid name unchanged', () => { + expect(mungeName(SKILL_CREATOR)).toEqual(SKILL_CREATOR); + }); + + it('replaces a dot with a dash', () => { + expect(mungeName('wordpress.com')).toEqual('wordpress-com'); + }); + + it('collapses a run of invalid characters into a single dash', () => { + expect(mungeName('foo!!bar')).toEqual('foo-bar'); + }); +}); + +describe('composeSourceUrl', () => { + it('handles string source by combining with catalog clone URL and ref', () => { + const entry = upstream(SKILL_CREATOR, `./plugins/${SKILL_CREATOR}`); + expect(composeSourceUrl(entry, CATALOG_OFFICIAL)).toEqual( + `https://github.com/anthropics/claude-plugins-official.git#main:plugins/${SKILL_CREATOR}`, + ); + }); + + it('handles git-subdir with ref + path', () => { + const entry = upstream('api-security-testing', { + source: GIT_SUBDIR, + url: 'https://github.com/42Crunch-AI/claude-plugins.git', + path: 'plugins/api-security-testing', + ref: 'v1.5.5', + sha: 'deadbeef', + }); + expect(composeSourceUrl(entry, CATALOG_OFFICIAL)).toEqual( + 'https://github.com/42Crunch-AI/claude-plugins.git#v1.5.5:plugins/api-security-testing', + ); + }); + + it('handles git-subdir without ref (default-branch fallback)', () => { + const entry = upstream('semgrep', { + source: GIT_SUBDIR, + url: 'https://github.com/semgrep/mcp-marketplace.git', + path: 'plugin', + sha: 'deadbeef', + }); + expect(composeSourceUrl(entry, CATALOG_OFFICIAL)).toEqual( + 'https://github.com/semgrep/mcp-marketplace.git#:plugin', + ); + }); + + it('handles url shape with path (omits ref)', () => { + const entry = upstream('atomic-agents', { + source: 'url', + url: 'https://github.com/BrainBlend-AI/atomic-agents.git', + path: 'claude-plugin/atomic-agents', + sha: 'deadbeef', + }); + expect(composeSourceUrl(entry, CATALOG_OFFICIAL)).toEqual( + 'https://github.com/BrainBlend-AI/atomic-agents.git#:claude-plugin/atomic-agents', + ); + }); + + it('handles url shape without path (just the clone URL)', () => { + const entry = upstream('agentforce-adlc', { + source: 'url', + url: 'https://github.com/SalesforceAIResearch/agentforce-adlc.git', + sha: 'deadbeef', + }); + expect(composeSourceUrl(entry, CATALOG_OFFICIAL)).toEqual( + 'https://github.com/SalesforceAIResearch/agentforce-adlc.git', + ); + }); + + it('handles github shape by composing https URL from repo field', () => { + const entry = upstream('fullstory', { + source: 'github', + repo: 'fullstorydev/fullstory-skills', + commit: 'abc123', + sha: 'deadbeef', + }); + expect(composeSourceUrl(entry, CATALOG_OFFICIAL)).toEqual( + 'https://github.com/fullstorydev/fullstory-skills.git', + ); + }); + + it('throws on an unknown source discriminator', () => { + const entry = upstream('weird', { + source: 'unknown-shape', + }); + expect(() => composeSourceUrl(entry, CATALOG_OFFICIAL)).toThrow(/unknown source discriminator/); + }); +}); + +describe('deriveConfidence', () => { + it('returns first-party for a string source not under partner-built', () => { + expect(deriveConfidence(upstream('data', './data'))).toEqual(FIRST_PARTY); + }); + + it('returns curated for a string source under partner-built', () => { + expect(deriveConfidence(upstream('zoom', './partner-built/zoom-plugin'))).toEqual('curated'); + }); + + it('returns first-party for an object source on a github.com/anthropics URL', () => { + const entry = upstream('something', { + source: GIT_SUBDIR, + url: 'https://github.com/anthropics/some-other-repo.git', + path: 'plugins/x', + ref: 'main', + }); + expect(deriveConfidence(entry)).toEqual(FIRST_PARTY); + }); + + it('returns curated for an object source on a non-anthropics URL', () => { + const entry = upstream('lusha', { + source: 'url', + url: 'https://github.com/lusha-oss/lusha-mcp-plugin.git', + sha: 'deadbeef', + }); + expect(deriveConfidence(entry)).toEqual('curated'); + }); + + it('returns curated for a github-shape source whose repo is not under anthropics', () => { + const entry = upstream('fullstory', { + source: 'github', + repo: 'fullstorydev/fullstory-skills', + commit: 'abc', + }); + expect(deriveConfidence(entry)).toEqual('curated'); + }); +}); + +describe('mapEntry', () => { + it('applies the knowledge-work prefix to entries from the knowledge-work catalog', () => { + const entry = upstream('data', './data'); + expect(mapEntry(entry, CATALOG_KNOWLEDGE_WORK)).toEqual({ + source: 'https://github.com/anthropics/knowledge-work-plugins.git#main:data', + name: 'knowledge-work-data', + bucket: 'official', + confidence: FIRST_PARTY, + maturity: 'production', + }); + }); + + it('munges a name with a dot through to the final PluginEntry', () => { + const entry = upstream('wordpress.com', { + source: 'url', + url: 'https://github.com/Automattic/claude-code-wordpress.com.git', + sha: 'deadbeef', + }); + expect(mapEntry(entry, CATALOG_OFFICIAL).name).toEqual('wordpress-com'); + }); +}); + +describe('combineAndDedupe', () => { + const officialA: ReturnType = { + source: EXAMPLE_FOO_URL, + name: 'a-foo', + bucket: 'official', + confidence: 'curated', + maturity: 'production', + }; + const officialB: ReturnType = { + source: EXAMPLE_FOO_URL, // same source as officialA + name: 'b-foo', + bucket: 'official', + confidence: 'curated', + maturity: 'production', + }; + const officialC: ReturnType = { + source: 'https://github.com/example/bar.git', + name: 'c-bar', + bucket: 'official', + confidence: 'curated', + maturity: 'production', + }; + const kwA: ReturnType = { + source: EXAMPLE_FOO_URL, // collides with official + name: 'knowledge-work-foo', + bucket: 'official', + confidence: 'curated', + maturity: 'production', + }; + + it('keeps the first occurrence per source URL and reports dropped names', () => { + const result = combineAndDedupe([officialA, officialB, officialC], []); + expect(result.officialKept).toEqual(2); + expect(result.kwKept).toEqual(0); + expect(result.droppedNames).toEqual(['b-foo']); + expect(result.final.map(e => e.name)).toContain('a-foo'); + expect(result.final.map(e => e.name)).not.toContain('b-foo'); + }); + + it('prefers an official entry over a knowledge-work entry for the same source URL', () => { + const result = combineAndDedupe([officialA], [kwA]); + expect(result.officialKept).toEqual(1); + expect(result.kwKept).toEqual(0); + expect(result.droppedNames).toEqual(['knowledge-work-foo']); + }); + + it('always emits both preserved entries first', () => { + const result = combineAndDedupe([officialC], []); + expect(result.final.slice(0, 2).map(e => e.name)).toEqual([ + 'vibe-agent-toolkit', + 'vibe-validate', + ]); + }); +}); From 2a376569ed5f8661690892188b0c81acc7f4c859 Mon Sep 17 00:00:00 2001 From: Ethan Dutton <46871249+ejdutton@users.noreply.github.com> Date: Wed, 3 Jun 2026 18:09:11 -0400 Subject: [PATCH 2/3] refactor(corpus): derive preserved entries from existing seed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces the hardcoded PRESERVED_ENTRIES list with a structural partition: read corpus/seed.yaml at the start of the import, treat as "preserved" any entry whose source URL isn't one the importer would generate this run. Fixes the self-review issue that the hardcoded list would silently erase a third hand-added entry or any validation override on a kept catalog entry. Throws explicitly if a preserved entry carries a validation block, since stringifyEntries doesn't yet serialize them. Slice 1b has none. combineAndDedupe now takes preserved as an explicit parameter rather than closing over module state — easier to test, no hidden dep. Co-Authored-By: Claude Opus 4.7 (1M context) --- corpus/seed.yaml | 14 +- packages/dev-tools/src/import-marketplace.ts | 135 ++++++++++++++---- .../dev-tools/test/import-marketplace.test.ts | 82 ++++++++++- 3 files changed, 196 insertions(+), 35 deletions(-) diff --git a/corpus/seed.yaml b/corpus/seed.yaml index 65c5e8d5..1de3885a 100644 --- a/corpus/seed.yaml +++ b/corpus/seed.yaml @@ -6,12 +6,15 @@ # # Last imported from upstream marketplaces on 2026-06-03 by # packages/dev-tools/src/import-marketplace.ts +# (SHAs reflect upstream state at import time and drift fast; re-import freely.) # # Sources: -# anthropics/claude-plugins-official @ 4979da0 — 205 entries -# anthropics/knowledge-work-plugins @ f53ea6a — 30 entries +# anthropics/claude-plugins-official @ 6f90371 — 206 entries +# anthropics/knowledge-work-plugins @ 8785e40 — 30 entries # # Hand-curated entries (preserved on re-import): 2 at top. +# An existing entry is preserved iff its `source` URL isn't one the importer +# would generate this run (i.e. it doesn't live in either upstream catalog). # Re-import: bun run import-marketplace plugins: @@ -60,6 +63,11 @@ plugins: bucket: official confidence: curated maturity: production + - source: https://github.com/airwallex/airwallex-marketplace.git#master:plugins/airwallex + name: airwallex + bucket: official + confidence: curated + maturity: production - source: https://github.com/gemini-cli-extensions/alloydb.git name: alloydb bucket: official @@ -986,7 +994,7 @@ plugins: confidence: curated maturity: production - source: https://github.com/VantaInc/vanta-mcp-plugin.git - name: vanta-mcp-plugin + name: vanta bucket: official confidence: curated maturity: production diff --git a/packages/dev-tools/src/import-marketplace.ts b/packages/dev-tools/src/import-marketplace.ts index 42fdfd76..12f37cb7 100644 --- a/packages/dev-tools/src/import-marketplace.ts +++ b/packages/dev-tools/src/import-marketplace.ts @@ -19,10 +19,11 @@ /* eslint-disable security/detect-non-literal-fs-filename */ // File paths derived from PROJECT_ROOT (controlled, not user input) -import { writeFileSync } from 'node:fs'; +import { existsSync, readFileSync, writeFileSync } from 'node:fs'; import { pathToFileURL } from 'node:url'; import { safeExecSync, safePath } from '@vibe-agent-toolkit/utils'; +import * as yaml from 'yaml'; import { z } from 'zod'; import { PROJECT_ROOT, log } from './common.js'; @@ -111,25 +112,30 @@ const NAME_REGEX = /^[A-Za-z0-9_-]+$/; const FIRST_PARTY: PluginEntry['confidence'] = 'first-party'; const CURATED: PluginEntry['confidence'] = 'curated'; -// Hand-curated entries that pre-date the import — VAT-owned plugins that -// don't live in either upstream catalog. Re-emitted verbatim on every run. -const PRESERVED_ENTRIES: PluginEntry[] = [ - { - source: - 'https://github.com/jdutton/vibe-agent-toolkit.git#claude-marketplace:plugins/vibe-agent-toolkit', - name: 'vibe-agent-toolkit', - bucket: 'official', - confidence: FIRST_PARTY, - maturity: 'production', - }, - { - source: 'https://github.com/jdutton/vibe-validate.git#claude-marketplace', - name: 'vibe-validate', - bucket: 'official', - confidence: FIRST_PARTY, - maturity: 'production', - }, -]; +// Minimal schema for parsing the existing seed.yaml back in. Wider than the +// `PluginEntry` we emit (the canonical schema allows `community` bucket, +// `experimental` maturity, `listed` confidence, and a nested validation block) +// because the file on disk may have richer entries that we still need to +// preserve untouched. Stays in sync with `PluginEntrySchema` in +// `packages/cli/src/commands/corpus/seed.ts`. +const ExistingPluginEntrySchema = z + .object({ + source: z.string().min(1), + name: z.string().min(1), + bucket: z.enum(['official', 'community']), + confidence: z.enum(['first-party', 'curated', 'listed']), + maturity: z.enum(['production', 'experimental', 'example']), + validation: z.unknown().optional(), + }) + .strict(); + +const ExistingSeedSchema = z + .object({ + plugins: z.array(ExistingPluginEntrySchema), + }) + .strict(); + +type ExistingPluginEntry = z.infer; // --------------------------------------------------------------------------- // Fetch helpers — both use `gh api` via safeExecSync so this script inherits @@ -269,6 +275,66 @@ export function mapEntry(entry: UpstreamEntry, catalog: Catalog): PluginEntry { }; } +// --------------------------------------------------------------------------- +// Preservation — read existing seed.yaml, hold back any entry whose `source` +// isn't going to be re-produced by the importer. +// +// "Preserved" is defined structurally, not by a hardcoded allowlist: anything +// the importer wouldn't generate this run is treated as hand-curated and +// re-emitted verbatim. This covers the 2 VAT-owned entries today and any +// future hand-added entries without needing to update this file. +// --------------------------------------------------------------------------- + +/** + * Read and structurally validate the existing `corpus/seed.yaml`. Throws if + * the file is missing or malformed — re-import is not a bootstrap operation. + */ +export function loadExistingSeed(path: string): ExistingPluginEntry[] { + if (!existsSync(path)) { + throw new Error( + `Existing seed file not found: ${path}. Re-import requires an existing seed.yaml ` + + `(this script preserves entries that don't come from upstream).`, + ); + } + const raw = readFileSync(path, 'utf-8'); + const parsed = ExistingSeedSchema.parse(yaml.parse(raw)); + return parsed.plugins; +} + +/** + * Pick entries from the existing seed whose `source` is NOT one the importer + * is about to produce. These are hand-curated and re-emitted verbatim. + * + * Throws on a preserved entry that carries a `validation:` block — the + * verbatim stringify path doesn't currently serialize nested validation + * blocks, so a silent drop here would be a real bug. Slice 1b has no such + * entries; a later slice that introduces validation overrides needs to + * extend `stringifyEntries` first. + */ +export function partitionPreserved( + existing: ExistingPluginEntry[], + importedSources: Set, +): PluginEntry[] { + const preserved: PluginEntry[] = []; + for (const e of existing) { + if (importedSources.has(e.source)) continue; + if (e.validation !== undefined) { + throw new Error( + `Preserved entry "${e.name}" carries a validation block; stringifyEntries doesn't ` + + `serialize validation blocks yet. Either remove the block or extend the importer.`, + ); + } + preserved.push({ + source: e.source, + name: e.name, + bucket: e.bucket as PluginEntry['bucket'], + confidence: e.confidence as PluginEntry['confidence'], + maturity: e.maturity as PluginEntry['maturity'], + }); + } + return preserved; +} + // --------------------------------------------------------------------------- // Deduplication & uniqueness checks // @@ -298,6 +364,7 @@ interface CombineResult { * alphabetical-first name in each duplicate cluster lands in the seed). */ export function combineAndDedupe( + preserved: PluginEntry[], official: PluginEntry[], kw: PluginEntry[], ): CombineResult { @@ -305,7 +372,7 @@ export function combineAndDedupe( const final: PluginEntry[] = []; const droppedNames: string[] = []; - for (const e of PRESERVED_ENTRIES) { + for (const e of preserved) { seen.add(e.source); final.push(e); } @@ -351,6 +418,7 @@ function assertUniqueNames(entries: PluginEntry[]): void { // --------------------------------------------------------------------------- interface ImportCounts { + preserved: number; official: number; knowledgeWork: number; } @@ -366,12 +434,15 @@ function buildHeader(officialSha: string, kwSha: string, counts: ImportCounts): `#`, `# Last imported from upstream marketplaces on ${date} by`, `# packages/dev-tools/src/import-marketplace.ts`, + `# (SHAs reflect upstream state at import time and drift fast; re-import freely.)`, `#`, `# Sources:`, `# anthropics/claude-plugins-official @ ${officialSha} — ${counts.official} entries`, `# anthropics/knowledge-work-plugins @ ${kwSha} — ${counts.knowledgeWork} entries`, `#`, - `# Hand-curated entries (preserved on re-import): ${PRESERVED_ENTRIES.length} at top.`, + `# Hand-curated entries (preserved on re-import): ${counts.preserved} at top.`, + `# An existing entry is preserved iff its \`source\` URL isn't one the importer`, + `# would generate this run (i.e. it doesn't live in either upstream catalog).`, `# Re-import: bun run import-marketplace`, ``, ``, @@ -396,8 +467,12 @@ function stringifyEntries(entries: PluginEntry[]): string { // --------------------------------------------------------------------------- function run(): void { - log('Fetching upstream manifests via gh CLI…', 'cyan'); + const seedPath = safePath.join(PROJECT_ROOT, 'corpus', 'seed.yaml'); + log('Reading existing seed.yaml for preserved entries…', 'cyan'); + const existing = loadExistingSeed(seedPath); + + log('Fetching upstream manifests via gh CLI…', 'cyan'); const official = fetchManifest(CATALOG_OFFICIAL); const kw = fetchManifest(CATALOG_KNOWLEDGE_WORK); const officialSha = fetchCatalogSha(CATALOG_OFFICIAL); @@ -421,7 +496,14 @@ function run(): void { .map(e => mapEntry(e, CATALOG_KNOWLEDGE_WORK)) .sort((a, b) => a.name.localeCompare(b.name)); + const importedSources = new Set([ + ...officialEntries.map(e => e.source), + ...kwEntries.map(e => e.source), + ]); + const preserved = partitionPreserved(existing, importedSources); + const { final, officialKept, kwKept, droppedNames } = combineAndDedupe( + preserved, officialEntries, kwEntries, ); @@ -433,7 +515,10 @@ function run(): void { log('', 'reset'); log('Mapping summary:', 'cyan'); - log(` Preserved entries: ${PRESERVED_ENTRIES.length}`, 'reset'); + log(` Preserved entries: ${preserved.length}`, 'reset'); + if (preserved.length > 0) { + log(` [${preserved.map(e => e.name).join(', ')}]`, 'reset'); + } log(` Imported (official, raw): ${officialEntries.length}`, 'reset'); log(` Imported (knowledge-work, raw): ${kwEntries.length}`, 'reset'); log( @@ -453,13 +538,13 @@ function run(): void { const output = buildHeader(officialSha, kwSha, { + preserved: preserved.length, official: officialKept, knowledgeWork: kwKept, }) + `plugins:\n` + stringifyEntries(final); - const seedPath = safePath.join(PROJECT_ROOT, 'corpus', 'seed.yaml'); writeFileSync(seedPath, output, 'utf8'); log('', 'reset'); log(`✓ Wrote ${seedPath}`, 'green'); diff --git a/packages/dev-tools/test/import-marketplace.test.ts b/packages/dev-tools/test/import-marketplace.test.ts index bb1669eb..47041f07 100644 --- a/packages/dev-tools/test/import-marketplace.test.ts +++ b/packages/dev-tools/test/import-marketplace.test.ts @@ -15,6 +15,7 @@ import { deriveConfidence, mapEntry, mungeName, + partitionPreserved, type UpstreamEntry, } from '../src/import-marketplace.js'; @@ -208,9 +209,16 @@ describe('combineAndDedupe', () => { confidence: 'curated', maturity: 'production', }; + const preservedX: ReturnType = { + source: 'https://github.com/example/preserved.git', + name: 'x-preserved', + bucket: 'official', + confidence: 'first-party', + maturity: 'production', + }; it('keeps the first occurrence per source URL and reports dropped names', () => { - const result = combineAndDedupe([officialA, officialB, officialC], []); + const result = combineAndDedupe([], [officialA, officialB, officialC], []); expect(result.officialKept).toEqual(2); expect(result.kwKept).toEqual(0); expect(result.droppedNames).toEqual(['b-foo']); @@ -219,17 +227,77 @@ describe('combineAndDedupe', () => { }); it('prefers an official entry over a knowledge-work entry for the same source URL', () => { - const result = combineAndDedupe([officialA], [kwA]); + const result = combineAndDedupe([], [officialA], [kwA]); expect(result.officialKept).toEqual(1); expect(result.kwKept).toEqual(0); expect(result.droppedNames).toEqual(['knowledge-work-foo']); }); - it('always emits both preserved entries first', () => { - const result = combineAndDedupe([officialC], []); - expect(result.final.slice(0, 2).map(e => e.name)).toEqual([ - 'vibe-agent-toolkit', - 'vibe-validate', + it('emits preserved entries first, in input order', () => { + const result = combineAndDedupe([preservedX], [officialC], []); + expect(result.final.map(e => e.name)).toEqual(['x-preserved', 'c-bar']); + }); + + it('preserved entries win over imports with the same source URL', () => { + const preservedCollidingWithA: ReturnType = { + source: EXAMPLE_FOO_URL, + name: 'preserved-foo', + bucket: 'official', + confidence: FIRST_PARTY, + maturity: 'production', + }; + const result = combineAndDedupe([preservedCollidingWithA], [officialA], []); + expect(result.officialKept).toEqual(0); + expect(result.droppedNames).toEqual(['a-foo']); + expect(result.final.map(e => e.name)).toEqual(['preserved-foo']); + }); + + it('handles empty inputs without error', () => { + const result = combineAndDedupe([], [], []); + expect(result.final).toEqual([]); + expect(result.droppedNames).toEqual([]); + expect(result.officialKept).toEqual(0); + expect(result.kwKept).toEqual(0); + }); +}); + +describe('partitionPreserved', () => { + const baseExisting = [ + { + source: 'https://github.com/jdutton/vibe-validate.git#claude-marketplace', + name: 'vibe-validate', + bucket: 'official' as const, + confidence: FIRST_PARTY as 'first-party', + maturity: 'production' as const, + }, + { + source: 'https://github.com/anthropics/claude-plugins-official.git#main:plugins/skill-creator', + name: SKILL_CREATOR, + bucket: 'official' as const, + confidence: FIRST_PARTY as 'first-party', + maturity: 'production' as const, + }, + ]; + + it('keeps entries whose source is not in the imported set', () => { + const imported = new Set([ + 'https://github.com/anthropics/claude-plugins-official.git#main:plugins/skill-creator', ]); + const preserved = partitionPreserved(baseExisting, imported); + expect(preserved.map(e => e.name)).toEqual(['vibe-validate']); + }); + + it('returns empty when every existing source is also imported', () => { + const imported = new Set(baseExisting.map(e => e.source)); + expect(partitionPreserved(baseExisting, imported)).toEqual([]); + }); + + it('throws if a preserved entry carries a validation block', () => { + const [firstEntry] = baseExisting; + if (firstEntry === undefined) throw new Error('test setup invariant: baseExisting must be non-empty'); + const withValidation = [ + { ...firstEntry, validation: { severity: { SOME_CODE: 'ignore' } } }, + ]; + expect(() => partitionPreserved(withValidation, new Set())).toThrow(/validation block/); }); }); From b02b058ae4684502f0660aa2107c0a8d48093c8e Mon Sep 17 00:00:00 2001 From: Ethan Dutton <46871249+ejdutton@users.noreply.github.com> Date: Fri, 5 Jun 2026 13:33:56 -0400 Subject: [PATCH 3/3] fix(corpus): harden marketplace importer per PR #117 review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit C1: refuse to overwrite seed.yaml when either upstream catalog returned 0 plugins, or when the new entry count would drop more than 20% vs. the existing seed. New --allow-shrink flag bypasses both for the rare case where shrinkage is real (mid-deploy push, etc.). C2: widen the in-memory PluginEntry to match canonical PluginEntrySchema (bucket includes community; confidence includes listed; maturity includes experimental/example). This removes three "as PluginEntry[...]" casts in partitionPreserved that silently coerced wider preserved entries into the narrow shape — a footgun at the public-shaming gate the moment any community entry got preserved. mapEntry still emits the narrow literals for freshly-mapped upstream entries. I1: narrow the generated seed.yaml header — drop the per-entry `validation:` claim (the importer throws on validation blocks today) and rewrite the SHA language to clarify that entry `source` URLs pin a fragment ref (typically default branch), not a per-entry commit SHA. The catalog SHAs in the header are this run's audit provenance, not per-entry pinning. Also narrow the module-level "byte-identical" claim, which holds only same-day with unchanged upstream HEADs. I2: validate fetchCatalogSha output as 40-char hex before slicing, so a `gh` deprecation notice or `--jq` miss can't silently write garbage provenance into the header. I3: preserve fetch/parse error context. fetchManifest wraps JSON.parse and Zod schema errors with catalog identity and (for JSON) a body preview. Top-level catch prints CommandExecutionError.stderr (the real "HTTP 403 rate-limit" / "not authenticated" reason) and ZodError issue details for errors raised outside fetchManifest. S-items deferred per reviewer TL;DR. Committed seed.yaml header patched in place to match the new buildHeader output (without re-running the import, since upstream drifted to a duplicate-name state since the PR opened — unrelated to this fix). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- corpus/seed.yaml | 17 +- packages/dev-tools/src/import-marketplace.ts | 185 +++++++++++++++--- .../dev-tools/test/import-marketplace.test.ts | 72 +++++++ 4 files changed, 241 insertions(+), 35 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4a2ff2e1..355f04a4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Internal -- **Corpus seed expanded from 9 → 237 entries via a new committed importer at `packages/dev-tools/src/import-marketplace.ts` (`bun run import-marketplace`).** The script fetches `.claude-plugin/marketplace.json` from `anthropics/claude-plugins-official` (205 of 209 raw entries kept) and `anthropics/knowledge-work-plugins` (30 of 60 — the knowledge-work catalog turns out to be ≈50% mirror entries of the official catalog) via `gh api`, maps each upstream entry to a `PluginEntry`, deduplicates by `source` URL (preserved VAT-owned entries always win; otherwise alphabetical-first-name wins within each duplicate cluster), and rewrites `corpus/seed.yaml`. Mapping rules: `bucket: official` uniformly (both catalogs are anthropics-curated marketplaces — `bucket` is the *reporting posture* per slice 1a, not code provenance); `confidence: first-party` for catalog-internal string sources and `github.com/anthropics/...` object sources, else `curated`; the `./partner-built/` knowledge-work convention overrides to `curated`; `maturity: production` for all entries. URL composition handles all five upstream source shapes (string, `git-subdir` ± `ref`, `url` ± `path`, `github`), throwing on unknown discriminators. The seven sample entries from slice 1a are regenerated from upstream manifests on every re-import. Issue #99 slice 1b — follows the schema change from PR #111 (slice 1a). +- **Corpus seed expanded from 9 → 237 entries via a new committed importer at `packages/dev-tools/src/import-marketplace.ts` (`bun run import-marketplace [--allow-shrink]`).** The script fetches `.claude-plugin/marketplace.json` from `anthropics/claude-plugins-official` (205 of 209 raw entries kept) and `anthropics/knowledge-work-plugins` (30 of 60 — the knowledge-work catalog turns out to be ≈50% mirror entries of the official catalog) via `gh api`, maps each upstream entry to a `PluginEntry`, deduplicates by `source` URL (preserved VAT-owned entries always win; otherwise alphabetical-first-name wins within each duplicate cluster), and rewrites `corpus/seed.yaml`. Mapping rules: `bucket: official` uniformly (both catalogs are anthropics-curated marketplaces — `bucket` is the *reporting posture* per slice 1a, not code provenance); `confidence: first-party` for catalog-internal string sources and `github.com/anthropics/...` object sources, else `curated`; the `./partner-built/` knowledge-work convention overrides to `curated`; `maturity: production` for all entries. URL composition handles all five upstream source shapes (string, `git-subdir` ± `ref`, `url` ± `path`, `github`), throwing on unknown discriminators. The seven sample entries from slice 1a are regenerated from upstream manifests on every re-import. Re-import safety: the importer refuses to overwrite `corpus/seed.yaml` if either upstream catalog returned 0 plugins or the new entry count would drop more than 20% vs. the existing seed; `--allow-shrink` bypasses both gates for the rare case where shrinkage is real. The generated `seed.yaml` header dropped its earlier per-entry `validation:` claim (the importer throws on validation blocks today) and now states explicitly that entry `source` URLs pin a fragment ref (typically the default branch), not a per-entry commit SHA — the catalog SHAs in the header are this run's audit provenance. Issue #99 slice 1b — follows the schema change from PR #111 (slice 1a). - **Empirical compatibility harness (`packages/dev-tools/src/compat-empirical/`).** Per-#100 research scaffold: a CLI (`predict`/`run`/`judge`/`report`/`all`) that runs candidate skills against claude-code, claude-cowork, and claude-chat, then joins VAT's static predictions with deterministic runtime observations and an LLM-judge semantic read into a reality-vs-prediction matrix. The output is an evidence artifact a follow-up PR will draw on to propose detector improvements, each citing specific (skill, runtime) cells. No detector code or `RUNTIME_PROFILES` changes here. Lives in the private `@vibe-agent-toolkit/dev-tools` package no adopter-facing surface. - **Empirical compat harness v2 (`packages/dev-tools/src/compat-empirical/`).** Foundations PR per [the v2 design](./docs/research/2026-05-23-compat-empirical-harness-v2-design.md). Probe coverage: multi-prompt + repeat-N with adaptive N=3→N=5 extension, mandatory positive+negative prompt pairing per corpus entry, and negative-prompt agreement inversion so false-positive triggers surface as `vat-optimistic`. Evidence quality: deterministic class widened from 6 to 9 values (splitting `error` into `install-failed`/`runtime-error`, `not-invoked` into `not-invoked-engaged`/`not-invoked-empty`, adding `refused`), judge prompt rewritten to v2 with a `refused` verdict. Report fidelity: coverage stats, per-bucket headline (own/official/community × ran/agree/optimistic/pessimistic/gray-zone), gray-zone (mixed-signal) and high-variance subsections, per-attempt variance rendered inline (`runtime-error (2/3) / failed (3/3)`). Judge replay: persisted `judge-calls/---.json` artifacts plus a new `re-judge` subcommand that re-executes them against an optionally different model or freshly-edited system prompt without re-spending operator hours on the runtime side. Two PR-#108 deferred bug fixes also landed: `git fetch --tags --force` before named-ref fetch (annotated tag refresh) and `setup()` teardown-first idempotency for the manual driver. Still private to `@vibe-agent-toolkit/dev-tools`; corpus authoring, first real run, and the docs deliverable are the downstream work. - **Cowork driver spike.** Added [`docs/contributing/cowork-driver-spike.md`](docs/contributing/cowork-driver-spike.md) — a time-boxed investigation (per §4a of the harness v2 design) of whether `claude-cowork` can be driven programmatically by the empirical compat harness today. Verdict: **not feasible**; cowork is a Claude Desktop app product with no public API/CLI surface. The `claude-cowork` runtime stays on `scripted-assisted` until Anthropic ships a Cowork CLI mode, Sessions API, or documented filesystem-import path. Adjacent finding (not a cowork replacement): the public-beta Skills API (`POST /v1/skills` + `container.skills[]` on `/v1/messages`) supports a fully-automatable *new* runtime — captured in the spike doc as a potential follow-up, gated on a separate design decision. diff --git a/corpus/seed.yaml b/corpus/seed.yaml index 1de3885a..c0feb6fa 100644 --- a/corpus/seed.yaml +++ b/corpus/seed.yaml @@ -1,12 +1,13 @@ -# Tracked plugins for `vat corpus scan`. -# Source is the unique key. Each entry can carry an optional `validation:` -# block with the same shape as `skills.defaults.validation` in -# vibe-agent-toolkit.config.yaml — used to silence findings on this -# plugin when we've decided the rule is wrong (or not yet right enough). +# Tracked plugins for `vat corpus scan`. Source is the unique key. # # Last imported from upstream marketplaces on 2026-06-03 by -# packages/dev-tools/src/import-marketplace.ts -# (SHAs reflect upstream state at import time and drift fast; re-import freely.) +# packages/dev-tools/src/import-marketplace.ts. +# +# Each entry's `source` URL points at an upstream repo and (where the +# upstream specifies one) a fragment ref — typically the default branch, +# NOT a per-entry commit SHA. The catalog SHAs below are the audit +# provenance of *this importer run* (which catalog HEADs were read); +# entries themselves are not pinned and drift with upstream branches. # # Sources: # anthropics/claude-plugins-official @ 6f90371 — 206 entries @@ -15,7 +16,7 @@ # Hand-curated entries (preserved on re-import): 2 at top. # An existing entry is preserved iff its `source` URL isn't one the importer # would generate this run (i.e. it doesn't live in either upstream catalog). -# Re-import: bun run import-marketplace +# Re-import: bun run import-marketplace [--allow-shrink] plugins: - source: https://github.com/jdutton/vibe-agent-toolkit.git#claude-marketplace:plugins/vibe-agent-toolkit diff --git a/packages/dev-tools/src/import-marketplace.ts b/packages/dev-tools/src/import-marketplace.ts index 12f37cb7..f73c6bcb 100644 --- a/packages/dev-tools/src/import-marketplace.ts +++ b/packages/dev-tools/src/import-marketplace.ts @@ -9,11 +9,19 @@ * `~/code/vat-issue-99-slice-1b-plan.md` (slice 1b of issue #99). * * Usage: - * bun run import-marketplace + * bun run import-marketplace [--allow-shrink] + * + * Flags: + * --allow-shrink Bypass the safety checks that refuse to overwrite the seed + * when (a) an upstream catalog returned 0 plugins or (b) the + * new entry count drops >20% vs. the existing seed. Use only + * when an upstream catalog is *genuinely* empty or shrinking. * * Exit codes: * 0 - success (seed.yaml written) - * 1 - failure (network, schema mismatch, name collision, unknown source shape) + * 1 - failure (network, schema mismatch, name collision, unknown source + * shape, empty catalog without --allow-shrink, catastrophic shrinkage + * without --allow-shrink) */ /* eslint-disable security/detect-non-literal-fs-filename */ @@ -22,7 +30,7 @@ import { existsSync, readFileSync, writeFileSync } from 'node:fs'; import { pathToFileURL } from 'node:url'; -import { safeExecSync, safePath } from '@vibe-agent-toolkit/utils'; +import { CommandExecutionError, safeExecSync, safePath } from '@vibe-agent-toolkit/utils'; import * as yaml from 'yaml'; import { z } from 'zod'; @@ -94,18 +102,25 @@ const ManifestSchema = z export type UpstreamEntry = z.infer; // --------------------------------------------------------------------------- -// Output entry shape — kept in sync with `PluginEntrySchema` in -// `packages/cli/src/commands/corpus/seed.ts`. We hand-write entries rather -// than importing the schema (avoids dev-tools → cli reverse dependency). -// The committed seed is validated by `loadSeedFile()` at downstream load time. +// Output entry shape — the canonical `PluginEntrySchema` from +// `packages/cli/src/commands/corpus/seed.ts`, hand-mirrored to avoid a +// dev-tools → cli reverse dependency. The committed seed is validated by +// `loadSeedFile()` at downstream load time. +// +// The full union shape is carried through the importer's pipeline so that +// preserved entries (which can be any valid `PluginEntry`) round-trip +// without `as`-casts that would lie about their narrow type at the +// public-shaming gate. `mapEntry` still emits the narrow `official` / +// `first-party|curated` / `production` literals for freshly-mapped upstream +// entries. // --------------------------------------------------------------------------- interface PluginEntry { source: string; name: string; - bucket: 'official'; - confidence: 'first-party' | 'curated'; - maturity: 'production'; + bucket: 'official' | 'community'; + confidence: 'first-party' | 'curated' | 'listed'; + maturity: 'production' | 'experimental' | 'example'; } const NAME_REGEX = /^[A-Za-z0-9_-]+$/; @@ -154,7 +169,29 @@ function fetchManifest(catalog: Catalog): z.infer { '-H', 'Accept: application/vnd.github.raw', ]); - return ManifestSchema.parse(JSON.parse(raw)); + const catalogId = `${catalog.owner}/${catalog.name}`; + let parsed: unknown; + try { + parsed = JSON.parse(raw); + } catch (err) { + throw new Error( + `Failed to parse JSON from ${catalogId} marketplace.json: ${(err as Error).message}. ` + + `First 200 chars of body: ${JSON.stringify(raw.slice(0, 200))}`, + ); + } + try { + return ManifestSchema.parse(parsed); + } catch (err) { + if (err instanceof z.ZodError) { + const issues = err.issues + .map(i => ` ${i.path.join('.') || '(root)'}: ${i.message}`) + .join('\n'); + throw new Error( + `Manifest from ${catalogId} failed schema validation:\n${issues}`, + ); + } + throw err; + } } function fetchCatalogSha(catalog: Catalog): string { @@ -163,8 +200,17 @@ function fetchCatalogSha(catalog: Catalog): string { `repos/${catalog.owner}/${catalog.name}/commits/${catalog.ref}`, '--jq', '.sha', - ]); - return raw.trim().slice(0, 7); + ]).trim(); + if (!/^[0-9a-f]{40}$/.test(raw)) { + // `gh` deprecation/update notices and `--jq` misses both come back as a + // 200 with a stdout body that isn't a SHA — without this guard, a garbage + // or blank value would land in the header provenance and look real. + throw new Error( + `Unexpected response from gh api for ${catalog.owner}/${catalog.name} HEAD SHA: ` + + `expected 40-char hex, got ${JSON.stringify(raw.slice(0, 200))}`, + ); + } + return raw.slice(0, 7); } // --------------------------------------------------------------------------- @@ -327,9 +373,9 @@ export function partitionPreserved( preserved.push({ source: e.source, name: e.name, - bucket: e.bucket as PluginEntry['bucket'], - confidence: e.confidence as PluginEntry['confidence'], - maturity: e.maturity as PluginEntry['maturity'], + bucket: e.bucket, + confidence: e.confidence, + maturity: e.maturity, }); } return preserved; @@ -413,8 +459,72 @@ function assertUniqueNames(entries: PluginEntry[]): void { } // --------------------------------------------------------------------------- -// YAML output — written by hand (rather than via `yaml.stringify`) to keep -// the file format byte-identical across runs with no upstream changes. +// Shrinkage guards — protect against silently committing a collapsed seed. +// +// Both upstream catalogs are eventually-consistent GitHub repos served via +// `gh api`; a mid-deploy push, an empty `plugins: []` blob, or a transient +// 200-with-bad-body could all reduce a catalog to zero or near-zero entries +// without raising an exception. `writeFileSync` overwrites `seed.yaml` +// unconditionally, so a passing run could quietly turn 238 entries into ~32. +// These guards refuse to write under those conditions; `--allow-shrink` +// bypasses them for the rare case where shrinkage is real. +// --------------------------------------------------------------------------- + +/** Drop ratio above which `assertNoCatastrophicShrinkage` refuses. */ +const MAX_SHRINK_RATIO = 0.2; + +/** + * Refuse to write if a catalog returned 0 plugins (the most likely cause of + * a silently-collapsed seed). `--allow-shrink` bypasses for the rare case + * where an upstream catalog is genuinely empty. + */ +export function assertCatalogNonEmpty( + catalog: Catalog, + pluginCount: number, + allowShrink: boolean, +): void { + if (pluginCount > 0 || allowShrink) return; + throw new Error( + `Catalog ${catalog.owner}/${catalog.name} returned 0 plugins. ` + + `Refusing to overwrite seed.yaml with a likely-empty catalog. ` + + `Re-run when the catalog is populated, or pass --allow-shrink to override.`, + ); +} + +/** + * Refuse to overwrite the existing seed if the new entry count would drop by + * more than `MAX_SHRINK_RATIO` (default 20%). `--allow-shrink` bypasses. + * + * Bootstrap case: an existing count of 0 is treated as "no prior seed to + * shrink from" and always allowed. + */ +export function assertNoCatastrophicShrinkage( + existingCount: number, + newCount: number, + allowShrink: boolean, +): void { + if (allowShrink || existingCount === 0) return; + const dropRatio = (existingCount - newCount) / existingCount; + if (dropRatio > MAX_SHRINK_RATIO) { + const dropPct = (dropRatio * 100).toFixed(1); + throw new Error( + `Refusing to shrink seed.yaml by ${dropPct}% (${existingCount} → ${newCount} entries; ` + + `threshold ${(MAX_SHRINK_RATIO * 100).toFixed(0)}%). Likely a transient upstream issue. ` + + `Investigate before re-running with --allow-shrink to override.`, + ); + } +} + +export function parseRunArgs(argv: readonly string[]): { allowShrink: boolean } { + return { allowShrink: argv.includes('--allow-shrink') }; +} + +// --------------------------------------------------------------------------- +// YAML output — written by hand (rather than via `yaml.stringify`) so the +// entry block stays diff-clean run-to-run unless upstream actually moved. +// Note: the header's date stamp and catalog HEAD SHAs *do* shift across days +// or upstream pushes; a future `--check` drift mode would have to ignore +// those header lines. // --------------------------------------------------------------------------- interface ImportCounts { @@ -426,15 +536,16 @@ interface ImportCounts { function buildHeader(officialSha: string, kwSha: string, counts: ImportCounts): string { const date = new Date().toISOString().slice(0, 10); return [ - `# Tracked plugins for \`vat corpus scan\`.`, - `# Source is the unique key. Each entry can carry an optional \`validation:\``, - `# block with the same shape as \`skills.defaults.validation\` in`, - `# vibe-agent-toolkit.config.yaml — used to silence findings on this`, - `# plugin when we've decided the rule is wrong (or not yet right enough).`, + `# Tracked plugins for \`vat corpus scan\`. Source is the unique key.`, `#`, `# Last imported from upstream marketplaces on ${date} by`, - `# packages/dev-tools/src/import-marketplace.ts`, - `# (SHAs reflect upstream state at import time and drift fast; re-import freely.)`, + `# packages/dev-tools/src/import-marketplace.ts.`, + `#`, + `# Each entry's \`source\` URL points at an upstream repo and (where the`, + `# upstream specifies one) a fragment ref — typically the default branch,`, + `# NOT a per-entry commit SHA. The catalog SHAs below are the audit`, + `# provenance of *this importer run* (which catalog HEADs were read);`, + `# entries themselves are not pinned and drift with upstream branches.`, `#`, `# Sources:`, `# anthropics/claude-plugins-official @ ${officialSha} — ${counts.official} entries`, @@ -443,7 +554,7 @@ function buildHeader(officialSha: string, kwSha: string, counts: ImportCounts): `# Hand-curated entries (preserved on re-import): ${counts.preserved} at top.`, `# An existing entry is preserved iff its \`source\` URL isn't one the importer`, `# would generate this run (i.e. it doesn't live in either upstream catalog).`, - `# Re-import: bun run import-marketplace`, + `# Re-import: bun run import-marketplace [--allow-shrink]`, ``, ``, ].join('\n'); @@ -467,6 +578,7 @@ function stringifyEntries(entries: PluginEntry[]): string { // --------------------------------------------------------------------------- function run(): void { + const { allowShrink } = parseRunArgs(process.argv.slice(2)); const seedPath = safePath.join(PROJECT_ROOT, 'corpus', 'seed.yaml'); log('Reading existing seed.yaml for preserved entries…', 'cyan'); @@ -475,6 +587,8 @@ function run(): void { log('Fetching upstream manifests via gh CLI…', 'cyan'); const official = fetchManifest(CATALOG_OFFICIAL); const kw = fetchManifest(CATALOG_KNOWLEDGE_WORK); + assertCatalogNonEmpty(CATALOG_OFFICIAL, official.plugins.length, allowShrink); + assertCatalogNonEmpty(CATALOG_KNOWLEDGE_WORK, kw.plugins.length, allowShrink); const officialSha = fetchCatalogSha(CATALOG_OFFICIAL); const kwSha = fetchCatalogSha(CATALOG_KNOWLEDGE_WORK); @@ -536,6 +650,8 @@ function run(): void { mungedCount > 0 ? 'yellow' : 'reset', ); + assertNoCatastrophicShrinkage(existing.length, final.length, allowShrink); + const output = buildHeader(officialSha, kwSha, { preserved: preserved.length, @@ -572,6 +688,23 @@ if (invokedDirectly) { run(); } catch (err) { log(`✗ ${(err as Error).message}`, 'red'); + // `CommandExecutionError.stderr` carries the real reason for `gh` failures + // (HTTP 403 rate-limit, "not authenticated", etc.) — the bare `.message` + // is usually just "Command failed". + if (err instanceof CommandExecutionError) { + const stderr = + typeof err.stderr === 'string' ? err.stderr : err.stderr.toString('utf8'); + if (stderr.trim().length > 0) { + log(stderr.trimEnd(), 'red'); + } + } + // Zod errors raised outside `fetchManifest` (e.g. from `loadExistingSeed`) + // surface here; print issues so the user knows which field failed. + if (err instanceof z.ZodError) { + for (const issue of err.issues) { + log(` ${issue.path.join('.') || '(root)'}: ${issue.message}`, 'red'); + } + } process.exit(1); } } diff --git a/packages/dev-tools/test/import-marketplace.test.ts b/packages/dev-tools/test/import-marketplace.test.ts index 47041f07..0bfeab9f 100644 --- a/packages/dev-tools/test/import-marketplace.test.ts +++ b/packages/dev-tools/test/import-marketplace.test.ts @@ -10,11 +10,14 @@ import { describe, expect, it } from 'vitest'; import { CATALOG_KNOWLEDGE_WORK, CATALOG_OFFICIAL, + assertCatalogNonEmpty, + assertNoCatastrophicShrinkage, combineAndDedupe, composeSourceUrl, deriveConfidence, mapEntry, mungeName, + parseRunArgs, partitionPreserved, type UpstreamEntry, } from '../src/import-marketplace.js'; @@ -301,3 +304,72 @@ describe('partitionPreserved', () => { expect(() => partitionPreserved(withValidation, new Set())).toThrow(/validation block/); }); }); + +describe('assertCatalogNonEmpty', () => { + it('throws when a catalog returned 0 plugins and allowShrink is false', () => { + expect(() => assertCatalogNonEmpty(CATALOG_OFFICIAL, 0, false)).toThrow( + /returned 0 plugins/, + ); + }); + + it('does not throw when a catalog returned 0 plugins but allowShrink is true', () => { + expect(() => assertCatalogNonEmpty(CATALOG_OFFICIAL, 0, true)).not.toThrow(); + }); + + it('does not throw when a catalog returned any plugins', () => { + expect(() => assertCatalogNonEmpty(CATALOG_OFFICIAL, 1, false)).not.toThrow(); + expect(() => assertCatalogNonEmpty(CATALOG_OFFICIAL, 200, false)).not.toThrow(); + }); +}); + +describe('assertNoCatastrophicShrinkage', () => { + it('throws when the new count drops by more than 20% vs. existing', () => { + // 238 → 32 is ~87% drop, the silent-data-loss scenario. + expect(() => assertNoCatastrophicShrinkage(238, 32, false)).toThrow( + /Refusing to shrink seed\.yaml/, + ); + }); + + it('allows a drop of exactly 20% (the threshold itself)', () => { + // 100 → 80 is exactly 20% — must not trip the >20% guard. + expect(() => assertNoCatastrophicShrinkage(100, 80, false)).not.toThrow(); + }); + + it('throws on a drop fractionally above 20%', () => { + // 100 → 79 is 21% — should trip. + expect(() => assertNoCatastrophicShrinkage(100, 79, false)).toThrow( + /Refusing to shrink seed\.yaml/, + ); + }); + + it('does not throw on growth or no change', () => { + expect(() => assertNoCatastrophicShrinkage(100, 100, false)).not.toThrow(); + expect(() => assertNoCatastrophicShrinkage(100, 250, false)).not.toThrow(); + }); + + it('bypasses with allowShrink=true even for catastrophic drops', () => { + expect(() => assertNoCatastrophicShrinkage(238, 0, true)).not.toThrow(); + }); + + it('treats an existing count of 0 as bootstrap (always allowed)', () => { + expect(() => assertNoCatastrophicShrinkage(0, 0, false)).not.toThrow(); + expect(() => assertNoCatastrophicShrinkage(0, 50, false)).not.toThrow(); + }); + + it('includes the actual counts in the error message for triage', () => { + expect(() => assertNoCatastrophicShrinkage(238, 32, false)).toThrow(/238/); + expect(() => assertNoCatastrophicShrinkage(238, 32, false)).toThrow(/32/); + }); +}); + +describe('parseRunArgs', () => { + it('defaults allowShrink to false when --allow-shrink is absent', () => { + expect(parseRunArgs([])).toEqual({ allowShrink: false }); + expect(parseRunArgs(['--verbose'])).toEqual({ allowShrink: false }); + }); + + it('sets allowShrink to true when --allow-shrink is present', () => { + expect(parseRunArgs(['--allow-shrink'])).toEqual({ allowShrink: true }); + expect(parseRunArgs(['--other', '--allow-shrink'])).toEqual({ allowShrink: true }); + }); +});