Skip to content

process: Add docs config redesign plan spec#117

Draft
jlevy wants to merge 32 commits into
mainfrom
claude/review-config-format-2wxh8
Draft

process: Add docs config redesign plan spec#117
jlevy wants to merge 32 commits into
mainfrom
claude/review-config-format-2wxh8

Conversation

@jlevy

@jlevy jlevy commented May 7, 2026

Copy link
Copy Markdown
Owner

Summary

Adds docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md, a planning spec for a redesign of the docs/config system (the area PR #87 attempted to address but never merged).

The spec:

  • Reviews the current f03 system and the unmerged PR feat: external docs repos — multi-source doc architecture with review fixes #87 f04 design, calling out the architectural smells (three coexisting layers — sources + files + lookup_path — that produced 12 bug-fix commits, repo sync wired only in config not pipeline, local-source type spec-only, doc types still a closed enum).
  • Restates the 8 goals from the user request explicitly (G1–G8).
  • Proposes 5 additional goals for review (G9 reproducibility, G10 provenance, G11 clean schema break, G12 atomic sync, G13 auth-ready).
  • Outlines three candidate design approaches with pros/cons:
  • Lists 9 open questions to resolve before any code is written.
  • Sketches a two-phase implementation plan (schema + sync + migration; then override / promotion / roundtrip).

This is a planning spec at the design-options stage — no code changes.

Test plan

  • Spec read for clarity and completeness
  • User confirms which of G9–G13 to keep
  • User picks an approach (A/B/C) before follow-up implementation spec is written
  • Open questions resolved before beads are created

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP


Generated by Claude Code

Plan spec proposing a unified ordered source list (bundled/local/git/url)
to replace the current f03 files+lookup_path scheme and the unmerged
PR #87 f04 design. Outlines three candidate approaches (finish PR #87
as-is, clean f05 re-cut, or fully pluggable source-type providers) with
pros/cons before nailing details, restates the 8 user-requested goals
explicitly, and proposes 5 additional goals (reproducibility,
provenance, no-zombie schema, atomic sync, auth-ready) for review.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
@deepsource-io

deepsource-io Bot commented May 7, 2026

Copy link
Copy Markdown

DeepSource Code Review

We reviewed changes in 38dca9e...f113cef on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
Secrets May 8, 2026 6:08a.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

…fault

Make explicit that mirrored docs live in the gitignored .tbd/docs/ cache
and don't churn the repo on upstream changes. Promotion to git-tracked
is the G4 (override / shadcn-style fork) path.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

Coverage Report for packages/tbd

Status Category Percentage Covered / Total
🔵 Lines 32.23% 2168 / 6726
🔵 Statements 32.49% 2253 / 6934
🔵 Functions 40.09% 336 / 838
🔵 Branches 28.46% 970 / 3408
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/tbd/src/cli/commands/doctor.ts 0.5% 0% 0% 0.5% 58-1274, 1288-1289
packages/tbd/src/docmap/index.ts 100% 100% 100% 100%
packages/tbd/src/docmap/resolve.ts 94.64% 96.77% 86.66% 95.65% 136-138
packages/tbd/src/docmap/schemas.ts 95.83% 100% 100% 95.83% 83
packages/tbd/src/docref/index.ts 100% 100% 100% 100%
packages/tbd/src/docref/parser.ts 100% 100% 100% 100%
packages/tbd/src/utils/lockfile.ts 89.18% 87.5% 100% 88.88% 130-135, 157
Generated in workflow #805 for commit f113cef by the Vitest Coverage Report Action

claude added 20 commits May 7, 2026 05:41
Make explicit that the runtime sees only one valid schema (f05), but
format detection + one-shot migration handles f03/f04 configs reliably.
Spell out the migration contract: round-trip tests on representative
configs, surfaced warnings, strict schema validator that rejects
unknown fields to prevent zombies from creeping back.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
…al goals

G13 is now: tbd never handles credentials. Public URLs and public git
repos just work; private sources rely on the underlying tool's own auth
(git config, gh CLI auth, AWS_PROFILE, etc.) inherited from the env.
There will be no auth: field in the source schema, ever.

G9-G13 promoted from "proposed" to first-class goals alongside G1-G8.
Open question 7 updated to reflect the new auth contract (failure
messages name the underlying tool, never prompt for credentials).

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
- G14: Bundles are the first-class organizing unit. Every doc belongs
  to one bundle (a GitHub repo, a website, "local", etc.). Bundles drive
  listing, override semantics, provenance display, and status output.
  Schema field is `bundle:` (not `prefix:`); CLI surfaces "bundle" too.
- G15: Bundle names are auto-suggested at add time from the source
  URL/path, can be explicitly overridden, and are previewed before the
  config change is persisted.
- Drop Approach A (don't merge PR #87) — design is now a single
  proposal with a deferred future direction (pluggable source-type
  providers).
- Rename `type: bundled` → `type: builtin` to avoid term-collision
  with the bundle concept.
- Update open questions, implementation plan, migration notes, and
  provenance sidecar fields to use bundle terminology consistently.
- Add open questions for bundle-name auto-derivation rules and
  bundle:source cardinality.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Encode two new constraints surfaced by concrete bundle examples
(jlevy/coding-guidelines, jlevy/writing-guidelines):

- G16: External doc source repos require no tbd-specific format. No
  manifest, no required frontmatter, no required folder names. Upstream
  is a bag of files; the consumer's tbd config maps files to doc types.
  An optional opt-in upstream tbd.yml manifest is noted as a future
  direction but not part of the core design.
- G17: Bundles and doc types are orthogonal axes. One bundle (e.g.,
  coding-guidelines) typically contributes docs of multiple types
  (guidelines, references, shortcuts, etc.). Installing one bundle
  enables all of its docs across whatever types they fit into.

Schema design updated:
- `git`/`url` sources gain a `contents:` field — either `- auto` (walk
  upstream and match subdir names against the doc_types registry) or
  explicit `{ path, type, as? }` rules to map / filter / rename.
- Concrete worked examples for jlevy/coding-guidelines (auto mode)
  and jlevy/writing-guidelines (explicit mapping).
- The "landed canonical layout" guarantee is preserved, but the
  upstream layout is fully flexible.

Open questions added for `contents` syntax details, optional upstream
manifest, and rename semantics.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Pull the format specification (URI grammar, manifest schema, lockfile,
doc map, resolution algorithm) out of the plan-spec into its own
architecture document so it's reusable as a separable artifact.

New: docs/project/architecture/current/arch-docspec-format.md (~720 lines)
  - Umbrella name: docspec (version docspec/0.1)
  - Sec 1: docspec URI grammar (./, /, https:, github:, git: schemes;
    @ref + //path conventions; URL normalization; out-of-band auth)
  - Sec 2: manifest schema (sources, bundles, doc_types, contents
    mapping, auto-detection)
  - Sec 3: lockfile schema (revisions, content hashes, materialization
    kinds)
  - Sec 4: doc map schema (generated index with three-layer metadata
    resolution)
  - Sec 5: item addressing and resolution algorithm (canonical keys,
    docrefs, six-step resolution chain, collision handling)
  - Sec 6: sync semantics (sync / update / status / build)
  - Sec 7: directory layout
  - Sec 8: failure model
  - Sec 10: tbd-specific extensions called out as not part of core

Plan-spec updates:
- Add G18: format spec is an extractable, reusable artifact
- Replace ad-hoc type:/url:/ref: schema with docspec: URIs
- Migration description updated for the schema rewrite
- Implementation plan reorganized to call out format-level tasks
  (URI parser, scheme fetchers, lockfile, doc map) vs tbd-specific
  workflows
- References section points at the format spec as authoritative

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Relocate the format spec from docs/project/architecture/current/ to
packages/tbd/docs/, alongside tbd-design.md / tbd-docs.md / tbd-prime.md
which serve the same role (design-level reference docs for tbd).

- Renamed: arch-docspec-format.md → design-docspec-format.md
- Updated all 7 references in plan-2026-05-07-docs-config-redesign.md

Naming note: kept "docspec" as the format name despite the active npm
package "docspec" v0.14 (winton, "specification format and toolchain
for documentation maintained by agents") and the older python-docspec
on PyPI. The conflicts are real but adjacent rather than identical
domain; if/when we extract the format as a separate npm package we
can scope it (e.g. @tbd/docspec) or revisit the brand. Keeping the
internal name avoids churn for now.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
… overlap

User-confirmed name change after npm conflict check showed "docspec" is
actively published (winton, v0.14, "specification format and toolchain
for documentation maintained by agents") plus python-docspec on PyPI
in adjacent territory. The npm "docref" v0.0.6 (2018, jsDoc niche)
is stale and unambiguous.

- Renamed via repren: 82 occurrences across the format spec and the
  plan-spec.
- Renamed file: design-docspec-format.md → design-docref-format.md.

URI qualifier dropped per the question of whether "URI" accurately
describes a format that admits bare paths like ./foo and /foo/bar:
strictly no (RFC 3986 requires a scheme). The grammar follows npm's
"package specifier" tradition — URI-shaped forms (https:, github:,
git:) coexist with bare filesystem paths and short index forms
(canonical keys, basenames, aliases). The doc now says this
explicitly in the Overview, Terminology, and §1.

Terminology collision after the rename (the original spec used
"docspec" for source addresses and "docref" for lookup queries —
both became "docref") resolved by reframing §5: a docref has two
complementary uses sharing one grammar — **source docrefs** in the
manifest and **lookup docrefs** in CLI queries. Some forms are valid
in both contexts; most are unambiguous from grammar.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Brief reservation note added as §1.9 of the docref format. Mirrors URI
and Markdown convention: <docref>[#<fragment>] for addressing content
within a doc (section anchors, line ranges, named regions — to be
defined later). Fragment grammar deliberately left open.

In v0.1, docrefs containing an unescaped `#` are a parse error so
nothing accidentally relies on a future-incompatible meaning. Literal
`#` in paths/refs must be percent-encoded as %23 per URI convention.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Change v0.1 behavior for fragments: instead of erroring on `#`, parse
and discard the fragment portion. Matches URI-client convention when
a fragment grammar isn't recognized (return the whole resource), and
is forward-compatible — docrefs written today with fragments will
"upgrade" to honoring those fragments once a future version defines
them, without the docref string changing.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Three changes bundled:

1. Split the format spec into two layered design docs:
   - design-docref-format.md: just the URI-like single-string grammar
     for addressing a resource (docref/0.1). Small (~300 lines), could
     live as its own micro-library.
   - design-docmap-format.md: the manifest, lockfile, doc map,
     addressing/resolution algorithm, and sync semantics built on top
     of docref (docmap/0.1). The manifest top-level field is `docmap:`
     with `schema: docmap/0.1`.
   Plan-spec G18 updated to reference both layers; in-spec links and
   the YAML example updated to use the new docmap: wrapper.

2. Add GitLab as a first-class scheme in docref. New §1.5 alongside
   §1.4 (GitHub). Convention is identical:
   gitlab:owner/repo[@ref][//path], with support for nested group
   paths (group/subgroup/project). Input normalization for GitLab
   web URLs (/-/tree/, /-/blob/) added. §1.9 (Extensibility) makes
   the host-scheme pattern explicit so adding bitbucket:, codeberg:,
   etc. follows the same shape. Reserved-prefix list updated.

3. Install std-doc-guidelines.md as a tbd guideline:
   - File saved at packages/tbd/docs/guidelines/std-doc-guidelines.md
     (preserved name from the upstream gist).
   - Registered in .tbd/config.yml docs_cache.files so `tbd guidelines
     std-doc-guidelines` works.
   - Applied the guidelines to both new design docs:
     - American em-dash style (no spaces around em dash)
     - Drop "deliberately" qualifier
     - Required footer added at bottom

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
…rage

Implement the docref and docmap formats as standalone TypeScript
modules under packages/tbd/src/, with tests that mirror every example
in the design specs. Both modules are dependency-light and could be
extracted as separate npm packages without modification:

- docref module (depends only on the language):
  - parseDocref(input) — single-string parser for the docref grammar.
    Two structural forms: filesystem paths (./, /, ../) and
    scheme-prefixed (<scheme>:<body>). Permissive: unknown schemes
    are NOT parse errors; consumers gate on which schemes they
    support. Scheme is normalized lowercase; reserved fragment
    portion (#...) silently dropped per spec §1.10.
  - parseGitBody(body) — convention-only sub-parser for git-style
    bodies (<rest>[@<ref>][//<path>]). Disambiguates SSH-style git@
    user-host from @ref separator, and embedded scheme:// from the
    //path separator.
  - 31 tests covering every spec example, including edge cases
    (nested URLs, SSH-style remotes, fragment dropping, unknown
    schemes, GitLab nested groups).

- docmap module (depends on zod + docref):
  - Zod schemas for the manifest, lockfile, and doc map. Refines
    docref strings via parseDocref. Strict bundle-name and doc-type
    validation. Materialization is a discriminated union.
  - resolveLookupKey() — the §4.3 six-step resolution algorithm
    (repo-subpath → bundle scope → exact key → basename → alias →
    NotFound) with typed LookupNotFound / LookupAmbiguous errors.
  - parseLookupKey() — exposed for consumers who need the parsed
    shape directly.
  - 21 tests covering schema validation against the spec's example
    manifest/lockfile/doc map and the resolution algorithm against
    a representative index.

Spec ↔ implementation synchrony is enforced by the test suite:
spec changes require matching test updates and vice versa. Both
design docs now point at the modules as the canonical reference
implementations.

Plan-spec Phase 1 updated to mark these modules as already done;
the remaining Phase 1 work is wiring the docs: block in
.tbd/config.yml and URL→docref normalization helpers.

Bonus: minor auto-generated changes to AGENTS.md and the tbd skill
file from the prepare hook (refreshed shortcut directory).

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
…r present

The corrupted-data test scenario creates an issue file with a too-long
title bypassing `tbd create`. This triggers two diagnostics: "Issue
validity" (non-fixable, status=error) and "ID mapping coverage"
(fixable). Previously hasFixable=true caused the outer summary to say
"Run with --fix to repair", but --fix can't fix the validity error.

Now the outer summary prefers "manual intervention" whenever any
non-fixable error is present, matching user intent: --fix won't
make the issue go away, so don't suggest it.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
Major plan-spec extension covering the implementation strategy now
that docref and docmap modules are committed.

New sections:

- **Doc States and Transitions**: Three-state model (A. Cached, B.
  Tracked Override, C. Pure Local) with a transition matrix. States
  B and C are physically identical; the difference is computed at
  read time from the source list. Eject is A→B; unfork is B→A; bundle
  removal is B→C.

- **Workflow Catalog**: 16 user-visible workflows (W1–W16), each with
  a one-line scenario, two or three design options with tradeoffs, a
  tentative design, and open questions. Workflows are grouped by the
  phase in which they land.

Restructured implementation plan into the user's three phases:

- **Phase 1**: Basic capabilities and migration. Existing UX preserved;
  new schema and modules backing. Filesystem fetcher only. Migration
  f03/f04 → f05. (W1, W2, W3, W12 partial.)
- **Phase 2**: External bundles and override roundtrip. github:/
  gitlab:/git:/https: fetchers. RepoCache port from PR #87. tbd source
  add/list/remove, eject/diff/upstream/unfork. tbd doc status. (W4–W13.)
- **Phase 3**: Migrate bundled docs to an external repo
  (github:jlevy/tbd-docs). Shadow-then-cut release cycle. (W14–W16.)

Each phase is independently releaseable and useful.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
The withLockfile() helper uses mkdir as the lock-acquisition primitive
because POSIX guarantees mkdir is atomic and surfaces contention as
EEXIST. On Windows, the filesystem layer can return EPERM for the
same scenario (concurrent mkdir on the same path) — a known Windows
quirk distinct from a real permission denial. The previous code
treated EPERM as a fatal error, breaking the high-concurrency
Promise.all save test on windows-latest CI.

Now we treat EEXIST as contention everywhere, and EPERM as contention
only on Windows. Other error codes (ENOENT, ENOSPC, etc.) still fail
fast as before.

Failure on PR #117 was reproducible from this branch's CI run; the
lockfile flake is unrelated to the docref/docmap work but surfaced
because that PR triggered Windows runners.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
The doc was imported from a gist without frontmatter, so the
auto-generated guideline directory in AGENTS.md and the tbd skill
file showed an empty description column. Added title, description,
and author frontmatter consistent with other guidelines.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
The previous fix treated EPERM as contention unconditionally on Windows,
which caused a busy loop if mkdir returned EPERM for any non-contention
reason (real permission denial, missing parent, etc.) — the stat-fails
branch then continued without sleep, pinning CPU. The Windows CI
runner hung for 46 minutes before being killed.

Now we verify the lock dir actually exists before treating EPERM as
contention. If stat shows the dir doesn't exist, we propagate the
original EPERM as a real error. If stat shows it exists, we fall
through to the existing contention-handling path (stale check, sleep,
retry).

EEXIST behavior on POSIX is unchanged.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
@jlevy

jlevy commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Senior design review after reading the current PR, the existing f03/f04 context, and the related KDEX draft in jlevy/kdex.

Summary

The direction is mostly right: replacing files + lookup_path with a manifest/lockfile/generated-map model is the right architectural move, and splitting docref from docmap is a good start toward extraction. I would not implement the remaining phases directly from this design yet, though. The biggest gaps are around resolver semantics for overrides, source-vs-bundle modularity, and explicit provenance for the add/remove/update/eject workflows.

My suggested high-level direction is to make this a three-layer system:

  1. docref: pure address grammar and normalization. No fetching, no tbd policy.
  2. docgraph: manifest normalization plus generated inventory/provenance graph. It contains all sources, bundles, documents, blobs, aliases, lock state, and override/fork edges. It never hides anything.
  3. docmap: a resolved view over a DocGraph using a policy: source priority, type filters, alias rules, and shadowing behavior. tbd workflows (source add/remove/update, eject, diff, upstream, unfork) sit above this layer.

That distinction would make the design more reusable and gives tbd the workflow semantics it needs without forcing every future DocMap consumer to inherit tbd-specific override behavior.

Strong parts

  • The clean break from f03/f04 is correct. The old coexistence of sources, files, and lookup_path is exactly the kind of compatibility layer that tends to resurrect deprecated state.
  • sync vs update follows the npm/uv mental model and is the right contract: sync installs the locked state; update intentionally moves it forward.
  • docref as a very small parser with unknown schemes delegated to consumers is a good extraction boundary.
  • KDEX's better ideas are visible here: a manifest, generated map, lockfile, offline resolution, status, whole-repo aggregate entries, and a thin CLI over a library. Keep leaning into that.

High-impact issues to resolve

1. Override-by-priority currently contradicts the resolver implementation

The design says order is lookup order and first source wins for unqualified names (packages/tbd/docs/design-docmap-format.md:139, packages/tbd/docs/design-docmap-format.md:463, docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:490). That is also the core mechanism for local overrides (docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:520).

But the reference resolver and tests do the KDEX-style thing: basename collisions are ambiguous (packages/tbd/src/docmap/resolve.ts:132, packages/tbd/tests/docmap-resolve.test.ts:102). The spec itself also says multiple basename matches are ambiguous at packages/tbd/docs/design-docmap-format.md:440, so the doc contradicts itself.

This is not a small wording issue. If duplicate <type>/<name> entries are ambiguous, then the proposed shadcn-style override workflow does not work. If first match wins, then the current resolver and tests are wrong.

Recommended fix: make the raw graph and resolved view explicit.

  • The DocGraph should retain every item, including shadowed items.
  • The DocMap effective view should resolve by (type, name) plus source priority and return the winning item plus shadow metadata.
  • Resolver APIs should accept context such as { type?: 'guideline', bundle?: 'coding', mode?: 'effective' | 'all' | 'strict' }.
  • Existing typed commands like tbd guidelines typescript-rules should pass a type constraint; otherwise typescript could collide across guideline, shortcut, and reference.
  • Ambiguity should remain for cases priority cannot or should not resolve: duplicate aliases in the same effective scope, conflicting canonical keys, or explicit mode: 'strict' lookups.

That gives tbd override behavior without losing KDEX's useful “full map is lossless” property.

2. bundle = source is too tight for modularity

The current format defines bundle as attached to a source and says one source equals one bundle (packages/tbd/docs/design-docmap-format.md:48, packages/tbd/docs/design-docmap-format.md:147, docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:218). The plan even flags this as open (docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:642). I would resolve that open question the other way: split bundles from sources now.

Real use cases where a bundle should have multiple sources:

  • An org docs bundle composed from a repo plus a few canonical web URLs.
  • A tbd bundle with a small package-local core plus an external docs repo during migration.
  • A product bundle where reference docs, examples, and code live in different repositories.
  • Multiple single-file URL sources that should present as one user-visible bundle.

Suggested schema shape:

docmap:
  schema: docmap/0.1
  bundles:
    - name: coding
      priority: 20
    - name: proj
      priority: 10
      local_root: ./docs/agent/
  sources:
    - id: coding-main
      bundle: coding
      docref: github:jlevy/coding-guidelines@main
      contents: [auto]
    - id: coding-readme
      bundle: coding
      docref: https://example.com/coding-style.md
      type: guideline
      as: coding-style

You can still keep the common CLI path simple (tbd source add ... auto-creates a bundle), but the format should not bake in 1:1 cardinality if we want DocMap reusable outside tbd.

3. Lockfile identity is under-specified

The lockfile examples and schema record docref, hash, materialization, and timestamp, but not a stable source id, bundle, or source-config fingerprint (packages/tbd/docs/design-docmap-format.md:257, packages/tbd/src/docmap/schemas.ts:104). That is fragile once there are multiple sources with the same docref, a source moves bundles, filters change, or contents remaps the same upstream tree into different landed docs.

Recommended lock entry fields:

  • source_id: stable manifest source id.
  • docref: normalized source address.
  • source_config_hash: hash of the materialization-affecting source config (docref, contents, glob, ignore, depth, transforms).
  • revision: resolved immutable upstream revision where applicable.
  • content_hash: hash of the selected/materialized content, not just the repo checkout.
  • materialization: fetch/cache strategy.

Then sync, update, remove, and orphan cleanup operate on source ids rather than trying to infer identity from docref or bundle name.

4. The override/fork relationship needs explicit provenance

The state model says tracked override vs pure local is computed by checking whether another bundle contributes the same <type>/<name> (docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:671). W8 says eject may record the original revision in frontmatter or a sidecar (docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:926), but this needs to be a first-class part of the workflow design.

Without explicit provenance, these cases get muddy:

  • A pure local doc becomes a supposed override just because a later-added upstream bundle happens to have the same name.
  • Removing an upstream bundle turns an override into pure local and loses the data needed for diff or unfork.
  • Upstream renames or deletes a doc; the local file still exists but the relationship is no longer discoverable by name.
  • source upstream needs to know the exact upstream source/path/revision to patch.

Recommendation: record an override edge, not just infer one. For example, in a sidecar or tbd overlay:

overrides:
  - local_key: proj:guideline/typescript
    upstream_key: coding:guideline/typescript
    upstream_source_id: coding-main
    upstream_docref: github:jlevy/coding-guidelines@main//guidelines/typescript.md
    ejected_revision: <sha>
    ejected_content_hash: sha256:...

The effective DocMap can still compute shadowing by priority, but roundtrip workflows should use explicit override provenance.

5. The Zod schemas are not enforcing the clean-break contract

G11 says the f05 validator rejects unknown fields to prevent zombie schema fields (docs/project/specs/active/plan-2026-05-07-docs-config-redesign.md:131). The new schemas use plain z.object(...), which strips unknown keys by default in Zod (packages/tbd/src/docmap/schemas.ts:63, packages/tbd/src/docmap/schemas.ts:78). That means old fields can silently survive parse/load paths unless every caller has a separate unknown-field detector.

Also missing or under-specified cross-field validation:

  • Remote sources can omit bundle even though the spec says bundle is required for remote sources (packages/tbd/src/docmap/schemas.ts:65).
  • contents[].type and source.type are not checked against declared doc_types.
  • as does not require type, and the spec is unclear whether as means “aggregate source name”, “single-file rename”, or KDEX-style as: repo (packages/tbd/docs/design-docmap-format.md:163).
  • Bundle uniqueness, reserved names, duplicate source ids, and canonical-key collision checks are not represented at the schema/normalization boundary.

Use .strict() on the format schemas, then add a normalization/validation pass with superRefine or a separate validateManifest() for cross-object rules. If extension fields are needed for future consumers, make that explicit with an extensions: bag rather than accepting arbitrary top-level keys.

6. as should be made less overloaded before implementation

KDEX used as: repo for whole-repo aggregate treatment. This PR changes as into a string name, while also saying the aggregate key becomes the bundle name (packages/tbd/docs/design-docmap-format.md:163, packages/tbd/docs/design-docmap-format.md:364). That is confusing: if as is not the key, what does the value name?

I would split this into a discriminated mode:

mode: files        # default: index files selected by contents/glob
mode: file         # one source becomes one doc; requires type/name
mode: repo         # one source becomes aggregate repo entry

Or keep KDEX's as: repo and use contents: [{ path: README.md, type: reference, as: writing-overview }] for single-file renames. Either is clearer than as meaning both “aggregate mode” and “rename”.

Workflows

Add source

The proposed preview-first add flow is sensible. I would make the dry-run inventory step the core operation:

  1. Parse and normalize docref.
  2. Choose or create a bundle.
  3. Create a source id.
  4. Dry-run fetch/list enough to show landed docs and conflicts.
  5. Persist manifest only after preview confirmation.
  6. Run sync and build, or print the exact next command if non-interactive mode disables side effects.

For CI/non-interactive use, support --yes and --no-sync.

Remove source or bundle

Default-safe behavior is right: refuse removal if explicit override edges point at that source unless --force-orphan is passed. If bundles and sources split, expose both:

  • tbd source remove <source-id> removes one fetch unit.
  • tbd bundle remove <bundle> removes all sources in a bundle, with the same override safety checks.

Cache directory and lockfile entry deletion should be source-id based.

Update source

Keep sync and update sharply separated. sync must reproduce the lockfile. source update [source|bundle] advances revisions and rewrites lock entries. Status should distinguish:

  • missing cache
  • cache hash mismatch
  • locked and present
  • upstream has newer revision
  • local override diverges from current upstream
  • explicit override is orphaned

Eject/diff/upstream/unfork

The workflow is directionally good, but it needs explicit override provenance as above. Also, two-way diff is okay for v1 only if we record enough data to add three-way later without changing the sidecar format. Store eject-time revision/hash now even if v1 only displays local-vs-current-upstream.

KDEX comparison

Useful KDEX ideas to borrow:

  • A library API boundary: load, sync, build, resolve, read, status (attic/kdex/docs/project/design/kdex-spec.md:198). This PR has schemas and a resolver, but not yet the core API surface that future extraction would want.
  • ResolvedItem includes both source address and local path (attic/kdex/docs/project/design/kdex-spec.md:134). DocMap entries should carry enough provenance to be useful without re-reading the manifest.
  • Prompt-friendly map rendering with a budget (attic/kdex/docs/project/design/kdex-spec.md:802) is worth keeping in mind for tbd prime / agent context injection.
  • Whole-repo aggregate sources with subpath reads are valuable and should remain in scope, but the mode should be explicit.

Do not copy KDEX's ambiguity semantics wholesale. KDEX is a neutral knowledge index; tbd specifically needs override priority and typed command resolution. That difference is why I think DocGraph plus policy-driven DocMap is the better reusable abstraction.

Smaller code/package notes

  • If docref and docmap are intended to be externally consumable soon, consider exporting subpaths like get-tbd/docref and get-tbd/docmap or documenting that they are internal until extraction. Right now they are not exposed by packages/tbd/package.json.
  • The module-level index.ts barrel files are okay if these are future package roots, but the repository's TypeScript guideline discourages duplicate/internal index.ts barrels. If these remain internal to tbd for a while, consider more descriptive filenames or make the package-boundary intent explicit.
  • The tests are useful spec mirrors, but they currently encode the resolver contradiction. Add tests for the winning override case before implementing source walking.

Bottom line

I would keep the PR's general direction, but I would change the design before building Phase 1/2 around it:

  1. Introduce DocGraph as the lossless, provenance-rich inventory.
  2. Make DocMap an effective policy view over that graph.
  3. Split bundles from sources and add stable source ids.
  4. Strengthen lockfile identity and schema strictness.
  5. Make override provenance explicit.
  6. Resolve the priority-vs-ambiguity contradiction in the spec and tests.

Those changes should make add/remove/update/eject workflows much more predictable, and they also make future extraction into one or two reusable repos feel natural instead of retrofitted.

Small/clear fixes from the review now applied. Larger architectural
items recorded as Q15-Q19 open questions in the plan-spec for joint
design review.

Resolver semantics now match the spec's stated intent:
- resolveLookupKey applies priority-wins when basename matches share
  the same (type, name) — the local-override mechanism the design
  promised but the previous resolver contradicted.
- Genuine ambiguity is preserved when matches span multiple types
  (priority alone can't disambiguate a typed lookup).
- Same priority semantics now apply to alias matches.
- Documents array is documented as required to be in source priority
  order; helper resolveByPriorityOrAmbiguous() encapsulates the rule.

Tests updated to assert the new behavior plus a fresh test for the
cross-type ambiguity case (added a `coding:shortcut/typescript`
fixture entry to make this testable).

Spec §4.4 / §4.5 rewritten to remove the priority-vs-ambiguity
contradiction. The new §4.4 explicitly distinguishes:
- canonical-key collisions (fatal at build time)
- same (type, name) across bundles (priority resolves)
- same basename across different (type, name) buckets (ambiguous)
- a forward pointer to Q15 for the future mode= parameter.

Zod schemas now `.strict()` everywhere (G11 enforcement). Unknown
fields like a stray `lookup_path:` from f04 are rejected, not
silently dropped. Two new schema tests exercise this.

Cross-field validation added via superRefine: a non-local docref
(github:/gitlab:/git:/https:/...) must declare an explicit bundle.
Local docrefs (./, /, ../) may omit it. Two new tests cover both
sides.

Five architectural items deferred for joint review as Q15-Q19 in the
plan-spec, each with 2-4 design options and tradeoffs:
- Q15 Resolver semantics: keep current vs add `mode` parameter vs
  full DocGraph + DocMap policy-view split.
- Q16 Bundle ↔ source cardinality: 1:1 vs split bundles from
  sources vs optional grouping.
- Q17 Lockfile identity: docref-only vs add source_id vs full
  source_config_hash + content_hash.
- Q18 Override provenance: computed-by-name vs frontmatter pointer
  vs sidecar edge vs tbd-internal overlay.
- Q19 The overloaded `as` field: keep vs split into a discriminated
  `mode:` on sources vs KDEX-aligned literals.

Q15-Q19 are linked: Q17/Q18 depend on whether Q16 introduces stable
source ids.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP

jlevy commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the thorough design review — the framing of DocGraph as the lossless inventory and DocMap as a policy view is exactly the right architectural lens, and the call-out on the resolver/spec contradiction is correct. Pushed 87a3d9a with the small/clear fixes; the larger architectural items are now Q15–Q19 in the plan-spec for joint design review (linked below) so we can settle them together before Phase 2.

Fixes applied in this commit

1. Resolver semantics now match the spec's stated intent (matches your point 1 partially)

resolveLookupKey previously threw Ambiguous whenever a basename matched in two bundles, contradicting the override-by-priority design. It now distinguishes:

  • Same (type, name) across bundles → priority wins (first source listed beats later ones). This is the local-override mechanism the design promised.
  • Same basename across different (type, name) buckets → genuine ambiguity (priority alone can't disambiguate a typed lookup).

A new helper resolveByPriorityOrAmbiguous() encapsulates the rule. Same priority semantics now apply to alias matches. Documents array is documented as required to be in source priority order.

Tests updated:

  • "throws Ambiguous on a basename that matches multiple bundles" → split into a "priority wins when same (type, name)" test and a separate "throws Ambiguous when basename matches across different types" test (with a fresh coding:shortcut/typescript fixture entry to make the cross-type case testable).
  • Same split for alias resolution.
  • "resolves a bundle-scoped basename" → split into the unique-within-bundle case (clean resolution) and the cross-type case (ambiguous).

2. Spec §4.4 / §4.5 rewritten to remove the contradiction

The new §4.4 "Collisions and priority" spells out four explicit cases (canonical-key collisions, same (type, name) across bundles, same basename across types, status reporting) and forward-references Q15 for the mode= parameter. Old §4.5 folded into §4.4. Stale §4.5 reference in resolve.ts updated.

3. Zod schemas now .strict() everywhere (your point 5)

Every object schema is .strict(). A stray lookup_path: from f04 now throws unrecognized at parse time rather than being silently dropped. New tests exercise this for top-level and per-source cases.

4. Cross-field validation: bundle required for non-local docrefs (also your point 5)

SourceSchema.superRefine checks: if docref.kind !== 'path' (i.e. https:/github:/gitlab:/git:/etc.), bundle is required. Local docrefs may omit it (consumer applies the local default). New tests for both sides.

Deferred to joint review as Q15–Q19 in the plan-spec

The bigger structural items live in plan-2026-05-07-docs-config-redesign.md § "Architectural questions surfaced in design review". Each lists 2–4 design options with tradeoffs; nothing tentative yet, just the option space:

  • Q15. Resolver semantics (your point 1, full version) — keep current shape vs add mode parameter ('effective' | 'all' | 'strict') vs full DocGraph + DocMap policy-view split.
  • Q16. Bundle ↔ source cardinality (your point 2) — keep 1:1 vs split bundles from sources with stable source ids vs optional grouping. Lists the four real use cases you cited.
  • Q17. Lockfile identity (your point 3) — docref-only (current) vs add source_id only vs full source_config_hash + content_hash. Q17 depends on Q16's outcome (source ids).
  • Q18. Override provenance (your point 4) — computed-by-name (current) vs frontmatter pointer in the override doc vs sidecar edge vs tbd-internal .tbd/ overlay file. Q18 depends on Q16 if we want stable source ids in the edge.
  • Q19. The overloaded as field (your point 6) — keep + document vs split into a mode: discriminator on sources (files/file/repo) with as only on contents rules vs KDEX-aligned literals.

The spec notes that Q15–Q19 are linked: Q17/Q18 substantially benefit from whatever Q16 lands on; Q15's full DocGraph/DocMap shape is the natural home for Q18's override edges.

Smaller notes

  • Module barrels / package boundaries. Agreed they're not yet exposed via package.json exports. Will keep them internal-but-cleanly-separable until the architectural questions settle (Q15 in particular informs whether docmap stays one module or splits into DocGraph + DocMap layers). Once we pick a shape, exposing them as get-tbd/docref and get-tbd/docmap (or as their own packages) is a localized change.
  • Tests-as-spec-mirror discipline. The new tests for priority-wins, cross-type ambiguity, schema strictness, and the bundle/docref cross-field rule are intentionally written to fail if the spec changes — they're the synchrony enforcement.

Let me know which of Q15–Q19 you want to dig into first; the resolver shape (Q15) and the bundle/source cardinality (Q16) seem like the foundational ones since Q17 and Q18 depend on them.


Generated by Claude Code

claude added 3 commits May 7, 2026 16:02
…) and Design Principles

Q20 covers a related cluster of refinements raised after the initial
review:

- Q20a: Rename `type` → `category`. Plural form matches CLI commands
  (`tbd guidelines`), avoids singular/plural friction, and frees up
  "type" (overloaded across the codebase).
- Q20b: Drop auto-detection magic. Globs in `contents:` are the only
  matching primitive. Eliminates the "what does this source actually
  pick up?" mystery.
- Q20c: Rename `path:` → `glob:` on contents rules to be self-
  describing; one glob per rule, no per-source pre-filter.
- Q20d: Folders mirror upstream structure within a bundle (provider
  decides shape); category is assigned independently in config or
  frontmatter. Doc map records both `path` (on-disk) and `key`
  (canonical lookup address) — they don't have to align.
- Q20e: Existing typed CLI commands (`tbd guidelines`) become
  validated aliases over a single `tbd doc list --category=X` family.
  New categories surface their CLI alias automatically from the
  `categories:` config row. G7 ("extensible, not hardcoded")
  becomes real.
- Q20f: Frontmatter `category:` as an opt-in provider-side
  declaration. Three-layer precedence (per-file metadata override →
  frontmatter → contents rule), matching how title/description/when
  already resolve. Unclassified docs are surfaced in `tbd doc status`.

Design Principles section added between the Goals and Non-Goals to
codify the values the design serves:

- P1: Simple things simple, complex things possible (Larry Wall).
- P2: Upstream unconstrained; consumer owns the mapping. Provider
  cooperation is opt-in (frontmatter, manifest, conventional dirs)
  and gives zero-config consumer setup, but never required.
- P3: Explicit beats implicit, but conventions earn defaults.
- P4: Lossless inventory, policy-driven views (foreshadows Q15's
  DocGraph + DocMap split direction).
- P5: Reproducible from config (G9 restated as principle).
- P6: tbd never holds credentials (G13 restated).
- P7: The format is a separable artifact (G18 restated).
- P8: Hard cuts on format versions, reliable migration (G11 restated).
- P9: Tests are spec mirrors.

Q1–Q20 should be resolved consistent with P1–P9.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
claude added 6 commits May 7, 2026 17:11
The user pointed out the new Design Principles in the docs-config plan
should be cross-referenced and consolidated with the existing tbd-design
principles. Three changes:

1. tbd-design.md §1.5 (Design Principles) extended from 6 to 10:
   - #1 (Simplicity first) extended to spell out "simple things simple,
     complex things possible".
   - #3 (Git for sync) extended with the reproducible-from-config
     contract.
   - #7 added: Auth is always out-of-band — tbd never holds credentials.
   - #8 added: Hard cuts on format versions with reliable migration —
     already practiced for f02→f03; making it an explicit principle.
   - #9 added: Spec ↔ implementation synchrony via tests.
   - #10 added: Layered architecture, separable artifacts.

   These four new principles emerged from the docs-config redesign work
   but apply tbd-wide.

2. tbd-design.md §1.4 Design Goals: added goal #8 (extensible knowledge
   subsystem), which links forward to the plan-spec and the docref/
   docmap design docs as the authoritative location for that subsystem's
   design.

3. plan-spec Design Principles intro: now explicitly notes that P1, P5,
   P6, P7, P8, P9 are restatements/elaborations of tbd-design §1.5
   principles, while P2, P3, P4 are docs/config-specific and have no
   direct system-wide analog. Each restated principle gets an inline
   "(extends tbd-design §1.5 #N)" cross-reference. tbd-design.md is
   declared authoritative for system-wide values.

This consolidates principles in one foundational location (tbd-design.md)
while keeping the docs-config plan readable on its own.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
External capabilities (scripts, third-party CLIs, MCP servers, package
ecosystems) are integrated by importing docs that describe them — not
by adding parallel subsystems for plugins, executables, or
distribution. A repo with docs + scripts is just a docmap source;
the docs explain installation/execution; tbd doesn't need to know
about distribution mechanisms.

This rules out the "tbd plugin" direction explicitly:
- New principle in tbd-design.md §1.5 #11 (system-wide).
- New principle in plan-spec (P10, cross-referenced).
- New Non-Goals entry: no separate plugin/skill subsystem.

Result: the docs subsystem we're already building is the universal
extension point. Any capability expressible in prose can be added by
importing one or more docs. Same docs serve agents and humans.

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
The test exercises 10 simultaneous mkdir calls on the same lock path.
Windows' filesystem semantics around concurrent mkdir contention
differ from POSIX (EPERM under contention rather than EEXIST). We
have a Windows-aware EPERM-as-contention path in
`src/utils/lockfile.ts`, but Windows scheduling and AV-scan timing
still occasionally let one of the concurrent waiters drift into a
slow retry path that pushes the GitHub-Actions runner past its
communication-loss threshold (~46m) and hangs the job.

The test asserts POSIX-style concurrent-merge semantics that are
documented as different on Windows, so skipping there is
appropriate. Coverage is retained on macOS and Ubuntu runners.

Comment in the test explains the rationale and points at when to
revisit (e.g., if we ever ship Windows as a first-class production
platform, switch to a real OS-level lock primitive).

https://claude.ai/code/session_01PhbYdWX7DUBpUBVuUesVuP
jlevy pushed a commit that referenced this pull request Jun 11, 2026
…y flags

Records resolutions from design review: docs/tbd default eject dir
(setup-editable, persisted to config), README index with npx get-tbd
pointer, .tbd/eject-base placement, tbd docs manual alias with no compat
for the old bare-docs viewer, and eject/uneject verbs. Folds the
standalone rebase subcommand into mutually exclusive update strategy
flags (--merge combines with conflict markers; --rebase keeps local
content and advances the fork point). Adopts a proper f04->f05 format
bump with metadata-only migration, following the f04 precedent; notes
PR #117's draft format id shifts to f06+. Two questions remain open
(--relevant as setup default; packs in code vs frontmatter tags).

https://claude.ai/code/session_017S2YnirprCMG8ihkGdRBsS
jlevy pushed a commit that referenced this pull request Jun 11, 2026
Generalizes beyond bundled/ejected locations by adopting two conventions
from the PR #117 branch: docref becomes the universal source-address
grammar now (parser ported as-is; internal: and URLs already conform;
replaces ad-hoc blob-URL conversion), and docmap's generated-map schema
becomes the --json output contract for docs list/status (the read side),
while its manifest/lockfile/sync write side stays in the f06 framework.
Adds docs_cache.local_dirs for serving docs from other in-repo
directories and tbd docs add <docref> consolidating per-kind --add.
Both format docs ship as bundled reference-kind docs with status
banners.

https://claude.ai/code/session_017S2YnirprCMG8ihkGdRBsS
jlevy pushed a commit that referenced this pull request Jun 11, 2026
Per design review: docref-everywhere is recorded as a hard rule (every
document reference in config, manifests, CLI args, JSON output, and docs
is a docref string). docmap is redefined as just the essential concept -
a machine-readable inventory of a doc collection (identity, location,
provenance docref, metadata per entry) - producer-agnostic and
hand-authorable, with an inline v0.1 schema example. The shipped
docmap-format reference doc will be authored fresh rather than adapted,
citing the speculative PR #117 draft only as exploratory background;
manifests/lockfiles/sync are deferred as future operations over docmaps.
Drops the provisional bundle field from v1 JSON output.

https://claude.ai/code/session_017S2YnirprCMG8ihkGdRBsS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants