Add first-class HTML resource support to @vibe-agent-toolkit/resources (links, anchors, well-formedness, structure-preserving rewrite)

> **Contributions welcome.** This issue contains a complete, approved design spec for first-class HTML
> resource support in `@vibe-agent-toolkit/resources`. The full spec is inline below so you can drop it
> straight into Claude Code (or work it by hand) and implement against it. All decisions are locked; §12 has
> a file touch-list and §10 has the test plan. The spec has been independently reviewed twice against the
> live codebase (file:line citations verified). Please follow the repo's `CLAUDE.md` workflow
> (`bun run validate`, zero-duplication policy, test pyramid) and open a PR.

---

# Design Spec: First-Class HTML Resources in `@vibe-agent-toolkit/resources`

**Status:** Approved design — ready for implementation. Contributions welcome.
**Date:** 2026-06-01
**Package:** `packages/resources` (with touchpoints in `packages/agent-skills` and `packages/cli`)

> **Orientation for implementers — read this first.** The live validation/packaging pipeline is built on
> the **`ResourceMetadata`** model (`packages/resources/src/schemas/resource-metadata.ts`), produced by
> `parseMarkdown()` (`link-parser.ts:52`) and assembled in `ResourceRegistry.addResource()`
> (`resource-registry.ts:300`). **That is where HTML plugs in.**
>
> There is a *second, unrelated* type system in `packages/resources/src/types/resources.ts` — the
> `ResourceType` enum, the `Resource`/`MarkdownResource` discriminated union, and
> `detectResourceType`/`parse*Resource`. **Do not build this feature there.** Those symbols are consumed
> only by `packages/resource-compiler` (markdown→TypeScript compilation) and by the `types.ts` barrel — the
> registry, link validator, and skill packager never touch them. Adding an `HtmlResource` to that union
> would compile but would not make HTML files validate or bundle. (Teaching `resource-compiler` about HTML
> is a separate, out-of-scope enhancement.)

> **⚠️ Updated for the post-#114 validation framework (2026-06-02).** PR #114 (validation-code
> consolidation) landed *after* this spec was first written. It (a) replaced the resources package's old
> free-form `ValidationIssue.type` (`z.string()`) model with the unified, registry-backed `ValidationIssue`
> from `@vibe-agent-toolkit/agent-schema`, (b) promoted every resources code into the canonical
> `CODE_REGISTRY` (`LINK_BROKEN_FILE`, `LINK_BROKEN_ANCHOR`, `LINK_UNKNOWN`, `FRONTMATTER_*`,
> `EXTERNAL_URL_DEAD/TIMEOUT/ERROR`), and (c) made `vat resources validate` derive its exit code purely from
> the framework's severity-based `hasErrors`. **§6 and §12 below are rewritten for this model**; HTML now
> adds exactly one new registry code (`MALFORMED_HTML`) and writes **no** exit-code plumbing. All `:line`
> citations elsewhere in this spec predate #114 (which rewrote `link-validator.ts`, `resource-registry.ts`,
> `validate.ts`, and `validation-result.ts`) — treat them as *approximate* and re-locate symbols by name.

---

## 1. Summary

Today the resources package is markdown-first. `.html`/`.htm` files are not resources: they are not
discovered, parsed, validated, link-checked, or rewritten on bundle. This spec makes **local HTML files
first-class resources** — files that produce a `ResourceMetadata` just like markdown does — so they
participate fully in:

1. **Discovery** (collection `include` globs / path-arg scan)
2. **Link validation** (links inside HTML, validated through the *existing* link validator)
3. **Anchor/fragment integrity** (cross-format link graph: `md → html → md`)
4. **Well-formedness** reporting (parse errors)
5. **Link rewriting on bundle** (`vat skills build` / `linkFollowDepth`), byte-for-byte structure-preserving

HTML **metadata/schema validation is explicitly deferred**, but the design reserves a pluggable seam so it
can land later without rework.

### Non-goals (v1)

- **Remote HTML** (fetching `http(s)` HTML pages as resources). Documented as a future extension; the link
  *graph* still checks remote URLs that HTML links point at, via the existing `--check-external-urls`
  mechanism — that is unchanged.
- **HTML metadata/schema validation** (`<meta>`, JSON-LD, OpenGraph, microdata). Deferred; seam reserved.
- **Non-`<a>`/`<img>` URL-bearing elements** (`<link>`, `<script>`, `<iframe>`, `<source>`, media). Deferred.
- **HTML → text extraction for RAG.** Out of scope.
- **HTML in `resource-compiler`** (the `Resource` union / `ResourceType` enum subsystem). Out of scope.
- **On-demand parsing of out-of-scope link targets.** Anchor validation only runs against files that were
  discovered and parsed into the index (see §5).
- **HTML reformatting / prettifying.** We never re-serialize HTML.

---

## 2. Scope Decisions (locked)

| Decision | Choice | Rationale |
|---|---|---|
| Resource source | **Local files only** in v1; remote documented as future | Keeps v1 tractable; the primary use case is local HTML docs in a KB |
| Resource model | **`ResourceMetadata`** (the live pipeline model), via a `parseHtml` branch in `addResource` | Not the `resource-compiler` `Resource` union (see Orientation) |
| Metadata validation | **Deferred**, seam reserved | HTML has many metadata surfaces; premature to pick one. Must not preclude pluggable extractors |
| v1 validation | **Links + anchors/fragments + well-formedness** | Well-formedness falls out by necessity (we must parse to extract) |
| HTML parser | **`parse5`** | WHATWG-spec, pure-JS (no native deps), exposes per-attribute source locations + `onParseError` |
| Link elements extracted | **`<a href>` + `<img src>`** | Navigational graph + broken-image detection; matches a docs/KB use case |
| Link rewriting mechanism | **Offset splicing, never AST serialize** | parse5 serialize is lossy; offset splice preserves bytes |

---

## 3. Architecture

HTML files become a second input format that produces the **same `ResourceMetadata` shape** markdown does.
The integration point is a single extension branch in `ResourceRegistry.addResource`.

### 3.1 The model: `ResourceMetadata` (unchanged shape, one new optional field)

`ResourceMetadata` (`schemas/resource-metadata.ts`) is what the registry indexes and what the validator and
packager consume. Its relevant fields:

```typescript
// existing (schemas/resource-metadata.ts)
ResourceMetadata = {
  id: string;
  filePath: string;
  links: ResourceLink[];        // rich objects: { href, type, line, text, nodeType, ... }
  headings: HeadingNode[];      // markdown headings (slug-based)
  frontmatter?: ...;
  sizeBytes: number;
  estimatedTokenCount: number;
  modifiedAt: Date;
  checksum: ...;
  collections?: string[];
  // ...
}
```

**Two additive changes** (both optional, so markdown is unaffected):

```typescript
/** Fragment identifiers an HTML file exposes: element `id`s + <a name> values. Undefined for markdown. */
anchors?: string[];

/** Well-formedness errors from the HTML parser. Undefined for markdown. */
parseErrors?: HtmlParseError[];
```

> There is **no `type` discriminator** on `ResourceMetadata` and we do **not** add one. Format-dependent
> behavior (e.g. anchor case-sensitivity in §5) keys off the **file extension**, consistent with how the
> rest of the pipeline already works.
>
> A reserved future seam — `metadata?: Record<string, unknown>` for pluggable HTML metadata extractors — is
> described in §8 but **not added in v1**.

### 3.2 `html-parser.ts` (new module, parallels `parseMarkdown` in `link-parser.ts`)

`parseHtml` returns the **same field shape `addResource` already destructures from `parseMarkdown`**, so
`addResource` can build a `ResourceMetadata` uniformly regardless of format. Uses **parse5** with
`sourceCodeLocationInfo: true` and an `onParseError` hook.

```typescript
export interface HtmlParseError {
  code: string;          // parse5 error code, e.g. 'missing-end-tag'
  line: number;
  col: number;
}

export interface ParsedHtml {
  content: string;             // raw source, unmodified
  links: ResourceLink[];       // SAME type as parseMarkdown — { href, type, line, ... }
  headings: HeadingNode[];     // empty for HTML (no markdown headings); kept for shape parity
  anchors: string[];           // element ids + <a name>  → ResourceMetadata.anchors
  frontmatter: undefined;      // HTML has no frontmatter (v1)
  sizeBytes: number;
  estimatedTokenCount: number; // content.length / 4, matching markdown
  parseErrors: HtmlParseError[];
}

export async function parseHtml(absolutePath: string): Promise<ParsedHtml>;
```

**Extraction rules:**

- **Links → `ResourceLink[]`:** walk the parse5 AST; collect `<a href>` and `<img src>`. For each, build a
  `ResourceLink` whose `type` is assigned by the **existing `classifyLink(href)`** (`link-parser.ts:190`)
  and whose `line` comes from parse5's location info — so an HTML `<a href="../foo.md">` produces the same
  kind of `ResourceLink` a markdown link does (`local_file`/`anchor`/`external`/`email`/`unknown`) and flows
  through `validateLink` and the external-URL collector with **no changes** to either.
- **Anchors → `string[]`:** every element with an `id` attribute, plus `<a name="...">`. Raw strings, case
  preserved (see §5).
- **Parse errors:** captured from parse5's `onParseError`, filtered to a meaningful set (see §6).

> **Note:** `ParsedHtml.links` carries no byte offsets. The rewriter (§7) does its **own** parse5 pass over
> the source to recover attribute locations — exactly as the markdown rewriter re-scans the body rather than
> threading offsets through `ResourceLink`. This keeps `ResourceLink`/`ResourceMetadata` unchanged.

### 3.3 Integration point: extension branch in `addResource`

`ResourceRegistry.addResource` (`resource-registry.ts:300`) currently calls `parseMarkdown(absolutePath)`
unconditionally. Change it to dispatch by extension:

```typescript
// `path` is already imported in resource-registry.ts (:12); `path.extname` is the
// established pattern here (:1333) — it is NOT one of the safePath-enforced fns.
const ext = path.extname(absolutePath).toLowerCase();
const parseResult = (ext === '.html' || ext === '.htm')
  ? await parseHtml(absolutePath)
  : await parseMarkdown(absolutePath);
```

The rest of `addResource` is shape-compatible: `parseResult.links`, `.headings`, `.frontmatter`,
`.sizeBytes`, `.estimatedTokenCount` already feed the `ResourceMetadata` literal (`resource-registry.ts:323-339`).
Add `anchors` and `parseErrors` to that literal when present. ID generation falls back to the path/filename
stem when `frontmatter` is undefined (existing behavior), which is correct for HTML.

---

## 4. Data Flow

```
crawl (include globs) / addResource(path)
  → extension branch: .html/.htm → parseHtml(path)   (else parseMarkdown)
  → ResourceMetadata { links: ResourceLink[], anchors, parseErrors, ... }
  → indexResource()  → fragment index gets this file's anchors   (§5)
  → validate(): each ResourceLink → existing validateLink:
        local_file → existence + git-ignore safety + anchor (via generalized index)
        anchor     → fragment check against THIS file's anchors
        external   → existing --check-external-urls pipeline (unchanged)
        email      → ok
        unknown    → LINK_UNKNOWN warning
  → parseErrors → MALFORMED_HTML issues          (§6)
```

`classifyLink` and `validateLink` require **no changes** — HTML links are `ResourceLink`s from a different
parser.

---

## 5. The Cross-Format Core: Generalized Fragment Index

Today `validateAnchor` (`link-validator.ts:344`) consults `headingsByFile: Map<string, HeadingNode[]>`
(defined by `buildHeadingsByFileMap` at `resource-registry.ts:1245`, called at `:672`, and threaded into
`validateLink` at `:458`) and matches a heading slug case-insensitively (`link-validator.ts:355, :377-381`).
To make
`md → html → md` work, generalize the **fragment set** a file exposes:

```
fragmentsByFile: Map<string /* absolute file path */, Set<string> /* valid fragment ids */>
```

- **Markdown** contributes heading slugs (existing behavior via `github-slugger`).
- **HTML** contributes `ResourceMetadata.anchors` (element `id`s + `<a name>`).
- `validateAnchor(fragment, targetPath, index)` becomes a **set-membership check**, choosing case semantics
  by the **target file's extension**:
  - **`.md` target → case-insensitive** match (preserve current behavior; slugs are already lowercased).
  - **`.html`/`.htm` target → case-sensitive** match (HTML fragment `id` matching is case-sensitive per the
    HTML standard in no-quirks mode).

**Out-of-scope targets.** If a link's target file *exists* but was **not discovered/parsed** into the index
(e.g. an HTML file outside any collection, or a markdown file the scan didn't include), anchor validation is
**skipped** — it does **not** emit a false `LINK_BROKEN_ANCHOR`. (Today `validateAnchor` returns `false`
for a target absent from the map, which yields a false `LINK_BROKEN_ANCHOR`. Fixing this to *skip* also
smooths that latent markdown sharp-edge. #114 rewrote `link-validator.ts`, so re-locate the exact branch by
name rather than line.) File-existence and git-ignore safety still apply.

> Implementation latitude: the plan may change the value type of the existing map to `Set<string>`, or
> derive a parallel `fragmentsByFile` from both markdown headings and HTML anchors. The **contract** is one
> format-neutral fragment-set index consulted by `validateAnchor`. `ResourceMetadata.headings` stays as-is
> (consumed elsewhere).

---

## 6. Error Handling & Issue Types

> **Post-#114 model.** The resources package no longer has a free-form `ValidationIssue.type` (`z.string()`).
> Every validator now emits the unified `ValidationIssue` from `@vibe-agent-toolkit/agent-schema`, carrying a
> `code` from the canonical `CODE_REGISTRY` and a resolved `severity`. Adding a finding = **adding one registry
> code**, not documenting a string.

**New code:** `MALFORMED_HTML` — add it to `CODE_REGISTRY` in
`packages/agent-schema/src/validation-codes.ts` (alongside the already-present `LINK_BROKEN_*`,
`FRONTMATTER_*`, and `EXTERNAL_URL_*` entries).

- Emitted during `validate()` from each resource's `parseErrors` (curated set of meaningful parse5 codes —
  e.g. unexpected/missing tags, duplicate attributes — not every HTML5 recovery quirk). Carries `line`,
  the parse5 `code` in the message, and the file path.
- **Default severity:** `info` (advisory; `warning` is also acceptable). Stays user-overridable like every
  other registry code.

**Exit-code wiring — already fixed by #114, no work here.** The old brittle filter
(`issue.type !== 'external_url'`) is **gone**. `vat resources validate` now derives its exit code purely from
the framework's severity-based `hasErrors` (`process.exit(hasErrors ? 1 : 0)` in `validate.ts`): exit 1 iff
some emitted issue resolves to `error`. So `MALFORMED_HTML` at `info`/`warning` is automatically non-fatal —
there is **no predicate to write and no bug to fix**. The external-URL codes are likewise already registry
codes at `warning` default. The HTML implementer only needs to (a) add the `MALFORMED_HTML` registry entry
and (b) emit it from `parseErrors` during validation.

Reuses unchanged (already registry codes post-#114): `LINK_BROKEN_FILE`, `LINK_BROKEN_ANCHOR`, `LINK_UNKNOWN`,
and the `EXTERNAL_URL_*` family.

---

## 7. Link Rewriting & Round-Trip Fidelity (in-scope)

HTML resources must support the same **deterministic, structure-preserving link rewriting** markdown has, so
HTML files can be bundled into skills (`vat skills build`, `linkFollowDepth`) with their relative link
targets remapped.

### 7.1 The fidelity trap

The markdown rewriter (`content-transform.ts`, `transformContent` at `:398`, replace loop at `:430`) uses
`String.replaceAll(regex, callback)` and returns the original match for any link without a rewrite rule —
everything except the changed target is byte-identical. Frontmatter uses `FrontmatterEditor`, whose contract
is `openFrontmatter(x).toString() === x` byte-for-byte (`packages/resources/src/frontmatter-editor.ts:7`).

**The obvious HTML approach — parse5 → mutate AST → `serialize()` — violates this contract.** parse5's
serializer normalizes whitespace, attribute quoting, void elements, comments, and the doctype. A
parse→serialize round-trip is **lossy**. **We must never re-serialize the document.**

### 7.2 The mechanism: offset splicing

New module **`html-transform.ts`** (analog of `content-transform.ts`):

```typescript
export function rewriteHtmlLinks(
  source: string,
  rules: LinkRewriteRules,
  ctx: TransformContext,
): string;
```

It re-parses `source` with parse5 (`sourceCodeLocationInfo: true`) to obtain, for each `<a href>` / `<img src>`,
the attribute's source location.

**Load-bearing detail — value sub-range.** parse5's `element.sourceCodeLocation.attrs[name]` gives
`{ startOffset, endOffset, startLine, ... }` for the **entire attribute** (`href="value"` / `src='value'` /
`href=value`), **not** the value alone. The rewriter must compute the value sub-range itself:
- Slice the attribute span from `source`.
- Locate `=` after the attribute name, then the first non-whitespace char: if it's `"` or `'`, the value is
  between that quote and its match; otherwise the value is unquoted and runs to the attribute `endOffset`.
- Record `(valueStart, valueEnd)` (JS string indices, **not bytes**) and the quote char.

Algorithm:

1. For each link, compute the new target via the same rule/template machinery markdown uses
   (`linkRewriteRules`, `resourceRegistry`, relative-path computation). Links with no applicable rule are
   left untouched.
2. Collect `(valueStart, valueEnd, newValue)` edits.
3. Apply edits to the **original source string** in **descending `valueStart` order** (so earlier edits
   don't shift later offsets). Replace **only** the value characters, preserving the original quote char.
   HTML-escape the written value minimally (`&`, and the active quote char); computed relative paths
   normally need no escaping.
4. The document is **never re-serialized** — comments, whitespace, indentation, and every untouched
   attribute remain byte-identical.

### 7.3 Round-trip identity contract (tested)

- `rewriteHtmlLinks(src, /* no matching rules */, ctx)` === `src`, **byte-for-byte**.
- Rewriting a single link changes **only** that target; surrounding comments, whitespace, and other
  attributes are unchanged (mirror `packager-frontmatter-rewrite.integration.test.ts`).

### 7.4 Packager integration

**The real gate is the early-return guard at `skill-packager.ts:1031`**, not the markdown rewrite block
below it:

```typescript
// skill-packager.ts:1030-1034 — today this binary-copies EVERY non-.md file,
// so .html never reaches the rewrite path (the openFrontmatter/transformContent
// block at :1053-1056 is markdown-only).
if (!sourcePath.endsWith('.md') || !ctx.rewriteLinks) {
  await copyFile(sourcePath, targetPath);
  return;
}
```

Make this guard **format-aware** so HTML reaches a rewrite path instead of being binary-copied:

- **`.html`/`.htm`** (when `ctx.rewriteLinks`): read the file, look up the resource in `ctx.fromRegistry`
  (as the markdown path does at `:1041`), then `writeFile(targetPath, rewriteHtmlLinks(content, rules, ctx))`.
  Do **not** call `openFrontmatter` — HTML has no frontmatter and must not go through the frontmatter split.
- **`.md`**: existing `openFrontmatter` + `transformContent` path (`:1053-1056`), unchanged.
- **everything else / `rewriteLinks` disabled**: binary copy (unchanged).

This makes **HTML resources participate in `linkFollowDepth` bundling** exactly like markdown.

### 7.5 Known limitations (documented)

- Duplicate or malformed attributes use parse5's reported location; pathological hand-authored HTML is
  best-effort.
- Unquoted attribute values: write the new value with double quotes (deterministic), or preserve unquoted
  when the new value needs no quoting — the plan picks one rule and documents it.

---

## 8. Future-Proofing Seam: Pluggable Metadata (documented, not built)

A future `ResourceMetadata.metadata?: Record<string, unknown>` field (do **not** add in v1) would be
populated by a **pluggable extractor registry**:

```typescript
interface HtmlMetadataExtractor {
  name: string;                          // e.g. 'meta-tags', 'json-ld', 'opengraph'
  extract(parsed: ParsedHtml, raw: string): Record<string, unknown>;
}
```

A future config surface would select/compose extractors per collection, populate `metadata`, and run the
**existing** collection `frontmatterSchema` validation against that object. v1 ships **zero** extractors.

**v1 behavior to document explicitly:** a collection's `frontmatterSchema` does **not** apply to HTML files
in v1. HTML files in such a collection are link/anchor/well-formedness-checked but **not** schema-validated.
State this in user docs to avoid surprise.

---

## 9. Discovery & CLI

- Discovery stays **glob-driven**. A user opts HTML in via a collection `include` such as
  `["docs/**/*.{md,html}"]`, or via the path-arg recursive scan. **The default crawl `include` is
  `['**/*.md']`** (`resource-registry.ts:391`) — leave the default markdown-only; HTML is opt-in by glob.
- **The real dispatch fix is in `addResource`** (`resource-registry.ts:300`), which today calls
  `parseMarkdown` for every file (§3.3). Once it branches by extension, any `.html`/`.htm` that the crawl
  yields is parsed correctly. Audit `crawl`/`indexResource` for any *other* hardcoded `.md` assumption and
  ensure unsupported extensions are **skipped silently** (not errored).
- **No new CLI flags.** `vat resources validate` picks up HTML automatically; `--check-external-urls`
  (`cli/.../resources/index.ts:74`) already covers HTML external links because the external-URL collector
  iterates `resource.links` filtering `link.type === 'external'` across **all** resources
  (`resource-registry.ts:786-787`) — not markdown-gated.
- **No config schema change** required for v1.

---

## 10. Testing

Follow the repo test pyramid (unit > integration > system) and the duplication policy. **All fixtures are
generic synthetic documents — no proprietary or organization-specific content.**

**Unit (`packages/resources/test/`):**
- `parseHtml`: link extraction (`<a href>`, `<img src>`) into `ResourceLink[]` with correct `type`/`line`,
  anchor extraction (`id`, `<a name>`), parse-error capture.
- Generalized fragment matching: case-insensitive for `.md` targets, case-sensitive for `.html` targets;
  skip (not fail) for out-of-scope targets.
- `addResource` extension branch: `.html` produces a `ResourceMetadata` with `anchors` populated and
  `frontmatter` undefined.
- `rewriteHtmlLinks`: value sub-range computation across quoted/single-quoted/unquoted attributes;
  round-trip identity (no-op === byte-identical); single-link rewrite preserves surroundings.
- Severity wiring: `MALFORMED_HTML` resolves to a non-`error` severity (so it never flips the exit code),
  while `LINK_BROKEN_FILE`/`LINK_BROKEN_ANCHOR` resolve to `error`; assert via the framework's `hasErrors`.

**Integration (`packages/resources/test/integration/`):**
- A synthetic fixture dir exercising `md → html → md` with: a valid cross-format link, a broken local-file
  link, a broken fragment (case-mismatched HTML id), and a malformed HTML file. Assert the exact issue set
  and that exit code reflects only the real errors.

**Integration (`packages/agent-skills/test/integration/`):**
- Bundle a skill whose resource graph includes an HTML file via `linkFollowDepth`; assert HTML links are
  rewritten to bundled-relative targets **and** comments/whitespace are preserved (mirror
  `packager-frontmatter-rewrite`).

---

## 11. Dependency

Add **`parse5`** to `packages/resources/package.json`. Pure-JS, no native deps, MIT. It is the candidate
parser exposing per-attribute source locations (required by §7) plus `onParseError` (required by §6). Verify
the exact `sourceCodeLocation.attrs[name]` field shape against the installed version during implementation.

---

## 12. File Touch List (for the plan)

| File | Change |
|---|---|
| `packages/resources/src/html-parser.ts` | **NEW** — `parseHtml` (parse5): `ResourceLink[]` links, `anchors`, `parseErrors`, shape-parity with `parseMarkdown` |
| `packages/resources/src/html-transform.ts` | **NEW** — `rewriteHtmlLinks` (re-parse for offsets, value sub-range computation, offset splice, never serialize) |
| `packages/resources/src/resource-registry.ts` | `addResource` (~`:300`) extension branch → `parseHtml`/`parseMarkdown`; add `anchors`/`parseErrors` to the `ResourceMetadata` literal; generalize the fragment index (`buildHeadingsByFileMap`, ~`:672`) to include HTML anchors |
| `packages/resources/src/schemas/resource-metadata.ts` | Add optional `anchors?: string[]` and `parseErrors?: HtmlParseError[]` to `ResourceMetadata`; export `HtmlParseError` type |
| `packages/resources/src/link-validator.ts` | `validateAnchor` (`:344`) → set-membership w/ per-format case rules; **skip** (don't fail) out-of-scope targets |
| `packages/agent-schema/src/validation-codes.ts` | **Add `MALFORMED_HTML` to `CODE_REGISTRY`** (default severity `info`). Post-#114 this is the home for all resources/link codes; the old free-form `type`-string model no longer exists. |
| `packages/resources/src/schemas/validation-result.ts` | **No code-list change** — it now re-exports `ValidationIssue`/`ValidationIssueSchema` from `@vibe-agent-toolkit/agent-schema`. |
| `packages/cli/src/commands/resources/validate.ts` | **Emit `MALFORMED_HTML` from each resource's `parseErrors`.** No exit-code change — #114 made the exit code purely severity-based (`hasErrors`); the `external_url` filter is gone and `info`/`warning` codes are already non-fatal. |
| `packages/agent-skills/src/skill-packager.ts` | Make the `.md`-only early-return guard (`:1031`) format-aware: `.html`/`.htm` → read + lookup + `rewriteHtmlLinks` (no `openFrontmatter`); `.md` unchanged (`:1053-1056`); else binary copy |
| `packages/resources/package.json` | Add `parse5` dependency |
| docs | Document HTML support + "frontmatterSchema does not apply to HTML in v1" caveat |
| `docs/validation-codes.md` | **Add a `#malformed_html` heading section.** A test (`packages/agent-schema/test/docs/validation-codes.test.ts`) asserts every `CODE_REGISTRY` entry has a matching doc anchor — a new registry code without its doc section fails CI. |

---

## 13. Open Questions

None blocking. Resolved during design:
- Resource model: `ResourceMetadata` via `addResource` branch — **not** the `resource-compiler` `Resource` union (locked).
- Parser: parse5 (locked).
- Link elements: `<a href>` + `<img src>` (locked).
- Rewriting: offset splice with computed value sub-range, never serialize (locked).
- Metadata: deferred with reserved seam (locked).





File	Change
`packages/resources/src/html-parser.ts`	NEW — `parseHtml` (parse5): `ResourceLink[]` links, `anchors`, `parseErrors`, shape-parity with `parseMarkdown`
`packages/resources/src/html-transform.ts`	NEW — `rewriteHtmlLinks` (re-parse for offsets, value sub-range computation, offset splice, never serialize)
`packages/resources/src/resource-registry.ts`	`addResource` (~`:300`) extension branch → `parseHtml`/`parseMarkdown`; add `anchors`/`parseErrors` to the `ResourceMetadata` literal; generalize the fragment index (`buildHeadingsByFileMap`, ~`:672`) to include HTML anchors
`packages/resources/src/schemas/resource-metadata.ts`	Add optional `anchors?: string[]` and `parseErrors?: HtmlParseError[]` to `ResourceMetadata`; export `HtmlParseError` type
`packages/resources/src/link-validator.ts`	`validateAnchor` (`:344`) → set-membership w/ per-format case rules; skip (don't fail) out-of-scope targets
`packages/agent-schema/src/validation-codes.ts`	Add `MALFORMED_HTML` to `CODE_REGISTRY` (default severity `info`). Post-#114 this is the home for all resources/link codes; the old free-form `type`-string model no longer exists.
`packages/resources/src/schemas/validation-result.ts`	No code-list change — it now re-exports `ValidationIssue`/`ValidationIssueSchema` from `@vibe-agent-toolkit/agent-schema`.
`packages/cli/src/commands/resources/validate.ts`	Emit `MALFORMED_HTML` from each resource's `parseErrors`. No exit-code change — #114 made the exit code purely severity-based (`hasErrors`); the `external_url` filter is gone and `info`/`warning` codes are already non-fatal.
`packages/agent-skills/src/skill-packager.ts`	Make the `.md`-only early-return guard (`:1031`) format-aware: `.html`/`.htm` → read + lookup + `rewriteHtmlLinks` (no `openFrontmatter`); `.md` unchanged (`:1053-1056`); else binary copy
`packages/resources/package.json`	Add `parse5` dependency
docs	Document HTML support + "frontmatterSchema does not apply to HTML in v1" caveat
`docs/validation-codes.md`	Add a `#malformed_html` heading section. A test (`packages/agent-schema/test/docs/validation-codes.test.ts`) asserts every `CODE_REGISTRY` entry has a matching doc anchor — a new registry code without its doc section fails CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first-class HTML resource support to @vibe-agent-toolkit/resources (links, anchors, well-formedness, structure-preserving rewrite) #112

Design Spec: First-Class HTML Resources in `@vibe-agent-toolkit/resources`

1. Summary

Non-goals (v1)

2. Scope Decisions (locked)

3. Architecture

3.1 The model: `ResourceMetadata` (unchanged shape, one new optional field)

3.2 `html-parser.ts` (new module, parallels `parseMarkdown` in `link-parser.ts`)

3.3 Integration point: extension branch in `addResource`

4. Data Flow

5. The Cross-Format Core: Generalized Fragment Index

6. Error Handling & Issue Types

7. Link Rewriting & Round-Trip Fidelity (in-scope)

7.1 The fidelity trap

7.2 The mechanism: offset splicing

7.3 Round-trip identity contract (tested)

7.4 Packager integration

7.5 Known limitations (documented)

8. Future-Proofing Seam: Pluggable Metadata (documented, not built)

9. Discovery & CLI

10. Testing

11. Dependency

12. File Touch List (for the plan)

13. Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Decision	Choice	Rationale
Resource source	Local files only in v1; remote documented as future	Keeps v1 tractable; the primary use case is local HTML docs in a KB
Resource model	`ResourceMetadata` (the live pipeline model), via a `parseHtml` branch in `addResource`	Not the `resource-compiler` `Resource` union (see Orientation)
Metadata validation	Deferred, seam reserved	HTML has many metadata surfaces; premature to pick one. Must not preclude pluggable extractors
v1 validation	Links + anchors/fragments + well-formedness	Well-formedness falls out by necessity (we must parse to extract)
HTML parser	`parse5`	WHATWG-spec, pure-JS (no native deps), exposes per-attribute source locations + `onParseError`
Link elements extracted	`<a href>` + `<img src>`	Navigational graph + broken-image detection; matches a docs/KB use case
Link rewriting mechanism	Offset splicing, never AST serialize	parse5 serialize is lossy; offset splice preserves bytes

Add first-class HTML resource support to @vibe-agent-toolkit/resources (links, anchors, well-formedness, structure-preserving rewrite) #112

Description

Design Spec: First-Class HTML Resources in @vibe-agent-toolkit/resources

1. Summary

Non-goals (v1)

2. Scope Decisions (locked)

3. Architecture

3.1 The model: ResourceMetadata (unchanged shape, one new optional field)

3.2 html-parser.ts (new module, parallels parseMarkdown in link-parser.ts)

3.3 Integration point: extension branch in addResource

4. Data Flow

5. The Cross-Format Core: Generalized Fragment Index

6. Error Handling & Issue Types

7. Link Rewriting & Round-Trip Fidelity (in-scope)

7.1 The fidelity trap

7.2 The mechanism: offset splicing

7.3 Round-trip identity contract (tested)

7.4 Packager integration

7.5 Known limitations (documented)

8. Future-Proofing Seam: Pluggable Metadata (documented, not built)

9. Discovery & CLI

10. Testing

11. Dependency

12. File Touch List (for the plan)

13. Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Design Spec: First-Class HTML Resources in `@vibe-agent-toolkit/resources`

3.1 The model: `ResourceMetadata` (unchanged shape, one new optional field)

3.2 `html-parser.ts` (new module, parallels `parseMarkdown` in `link-parser.ts`)

3.3 Integration point: extension branch in `addResource`