Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions openspec/changes/remove-link-validation/.openspec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-19
52 changes: 52 additions & 0 deletions openspec/changes/remove-link-validation/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## Context

Docforge processes markdown documentation from distributed repositories. During markdown processing, the link resolver identifies absolute URLs that don't point to known repository resources and forwards them to a link validator. The validator issues outbound HTTP HEAD/GET requests to check link reachability.

In practice, all known consumers run with `skip-link-validation: true`, meaning the feature is never exercised. The feature also introduces a blind SSRF vector since attacker-controlled URLs in markdown can trigger server-side HTTP requests.

## Goals / Non-Goals

**Goals:**
- Eliminate the SSRF attack surface by removing the link validation feature entirely
- Simplify the codebase by removing unused code paths
- Ensure zero change to documentation output (the feature only logs warnings, never modifies content)

**Non-Goals:**
- Replacing link validation with a safer alternative (not needed since it's unused)
- Changing link resolution behavior (relative links, repository links stay untouched)
- Modifying the downloader or any other HTTP-making component

## Decisions

### 1. Full removal vs. hardening

**Decision**: Remove entirely rather than adding IP filtering/allowlists.

**Rationale**: The feature is unused in all known deployments. Adding SSRF protections (private IP blocking, DNS rebinding protection, metadata endpoint blocking) adds complexity to dead code. Removal is simpler, smaller, and more secure by default.

**Alternative considered**: Add a comprehensive denylist (RFC 1918, link-local, cloud metadata IPs) — rejected because it still requires maintenance and the feature provides no value.

### 2. Breaking CLI flags vs. soft deprecation

**Decision**: Remove flags immediately with no deprecation period.

**Rationale**: The feature runs disabled. Any config that sets `skip-link-validation: true` will get an unknown-flag error, but this is easy to fix (delete the line). A deprecation period adds code for a feature nobody uses.

### 3. Remove SkipValidation from Node struct

**Decision**: Remove the `SkipValidation` field from the manifest Node type.

**Rationale**: With no validator, there's nothing to skip. The field serves no purpose. YAML manifests using this field will have it silently ignored by Go's YAML decoder (it uses struct tags, unknown fields are dropped).

## Risks / Trade-offs

- **[Unknown consumers]** → If any undiscovered deployment uses link validation actively, they lose the feature. Mitigation: the feature only logs warnings, never fails the build or modifies output. Impact is zero functional change.
- **[Config file errors]** → Configs with `--skip-link-validation` or `--validation-workers` will fail on CLI parse. Mitigation: easy one-line fix; document in release notes.
- **[Future need]** → If link validation is ever needed again, it must be reimplemented. Mitigation: git history preserves the code; reimplementation with proper SSRF protections would be a better starting point anyway.

## Verification

Build docforge after removal and run it against the gardener/documentation project:
- https://github.com/gardener/documentation

Delete `hugo/` if it exists, run docforge to produce the output, SHA-256 hash the `hugo/` directory, then delete it. Repeat with the pre-change binary. Compare hashes to confirm zero output difference.
31 changes: 31 additions & 0 deletions openspec/changes/remove-link-validation/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Why

The link validation feature makes outbound HTTP HEAD/GET requests to validate absolute URLs found in markdown documents. This feature is not used in production (always run with `--skip-link-validation: true`) and exposes the host environment to blind SSRF — any absolute URL in processed markdown triggers server-side HTTP requests with insufficient filtering.

## What Changes

- **BREAKING**: Remove the `--validation-workers` CLI flag
- **BREAKING**: Remove the `--skip-link-validation` CLI flag
- **BREAKING**: Remove the `--hosts-to-report` CLI flag
- **BREAKING**: Remove the `skipValidation` field from manifest node definitions
- Remove the entire `pkg/nodeplugins/markdown/linkvalidator/` package
- Remove link validation wiring from the markdown node plugin
- Remove `SkipValidation` propagation from the markdown manifest plugin

## Capabilities

### New Capabilities

_None — this is a removal._

### Modified Capabilities

- `markdown-processing`: The markdown document worker will no longer attempt to validate absolute URLs via HTTP. Links are still resolved and rewritten, but external reachability is no longer checked.

## Impact

- **CLI**: Three flags removed (`--validation-workers`, `--skip-link-validation`, `--hosts-to-report`). Existing configs that set these will produce unknown-flag errors.
- **Manifest schema**: `skipValidation` field on nodes becomes invalid YAML (silently ignored by strict parsing or erroring depending on decoder settings).
- **Code**: ~6 files modified, 1 package directory deleted.
- **Downstream consumers**: https://github.com/gardener/documentation runs docforge with link validation disabled (`skip-link-validation: true`), so no behavioral change.
- **Security**: Eliminates the blind SSRF attack surface entirely.
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## REMOVED Requirements

### Requirement: Link validation via HTTP requests
The system SHALL NOT validate absolute URL reachability by issuing outbound HTTP requests.

**Reason**: The feature is unused in all known deployments and introduces a blind SSRF vulnerability. All consumers run with link validation disabled.

**Migration**: Remove `skip-link-validation`, `validation-workers`, and `hosts-to-report` flags from configuration files. These flags are no longer recognized.

#### Scenario: Absolute URLs are not validated via HTTP
- **WHEN** a markdown document contains an absolute URL (e.g., `https://example.com/page`)
- **THEN** the system SHALL NOT issue any outbound HTTP request to that URL

#### Scenario: Removed CLI flags produce error
- **WHEN** docforge is invoked with `--skip-link-validation` or `--validation-workers` or `--hosts-to-report`
- **THEN** the CLI SHALL reject the command with an unknown flag error

### Requirement: SkipValidation manifest field removed
The system SHALL NOT recognize `skipValidation` as a node property in manifest YAML.

**Reason**: With no link validator, this field has no effect.

**Migration**: Remove `skipValidation` from manifest YAML files. The field is silently ignored by the Go YAML decoder if present.

#### Scenario: Manifest with skipValidation field
- **WHEN** a manifest YAML contains `skipValidation: true` on a node
- **THEN** the system SHALL silently ignore the field (no error, no behavioral effect)

## MODIFIED Requirements

### Requirement: Markdown link resolution
The markdown document worker SHALL resolve links in documents by rewriting relative paths and repository references. Absolute URLs that do not reference known repository resources SHALL be returned unmodified without any reachability check.

#### Scenario: Absolute URL not in repository
- **WHEN** a markdown document contains an absolute URL that is not a known repository resource
- **THEN** the system SHALL return the URL unchanged without issuing any HTTP request

#### Scenario: Repository resource URL
- **WHEN** a markdown document contains an absolute URL that matches a known repository resource
- **THEN** the system SHALL resolve it according to the manifest structure (existing behavior, unchanged)

#### Scenario: Relative link resolution
- **WHEN** a markdown document contains a relative link
- **THEN** the system SHALL resolve it relative to the node's position in the manifest tree (existing behavior, unchanged)
39 changes: 39 additions & 0 deletions openspec/changes/remove-link-validation/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## 1. Remove link validator package

- [ ] 1.1 Delete `pkg/nodeplugins/markdown/linkvalidator/` directory (validator.go, job.go, validator_test.go, linkvalidatorfakes/)

## 2. Remove validator from markdown node plugin

- [ ] 2.1 Remove validator parameters from `pkg/nodeplugins/markdown/plugin.go` NewPlugin() — drop `skipLinkValidation`, `validationWorkersCount`, `hostsToReport` params and validator creation
- [ ] 2.2 Remove validator task queue from plugin return values
- [ ] 2.3 Remove `linkvalidator` import from plugin.go

## 3. Remove validator from document worker

- [ ] 3.1 Remove `validator` field and `skipLinkValidation` field from Worker struct in `pkg/nodeplugins/markdown/document/document_worker.go`
- [ ] 3.2 Remove `ValidateLink()` call from `resolveLink()` — absolute URLs not in repository should just `return dest, nil` without validation
- [ ] 3.3 Remove validator parameter from `NewDocumentWorker()` constructor
- [ ] 3.4 Remove validator parameter from `New()` in `pkg/nodeplugins/markdown/document/job.go`
- [ ] 3.5 Update document worker tests to remove fake validator usage

## 4. Remove SkipValidation from manifest

- [ ] 4.1 Remove `SkipValidation` field from Node struct in `pkg/manifest/node.go`
- [ ] 4.2 Remove `propagateSkipValidation()` transformation from `pkg/manifestplugins/markdown/plugin.go`

## 5. Remove CLI flags and config

- [ ] 5.1 Remove `--validation-workers`, `--skip-link-validation`, `--hosts-to-report` flags from `cmd/app/flags.go`
- [ ] 5.2 Remove corresponding fields from Options struct in `cmd/app/types.go`
- [ ] 5.3 Remove validator-related parameters from `markdown.NewPlugin()` call in `cmd/app/exec.go`
- [ ] 5.4 Remove `skip-link-validation: true` from `test/e2e/docforge_config.yaml`
- [ ] 5.5 Update `docs/cmd-ref/docforge.md` to remove `--validation-workers` documentation

## 6. Build and verify

- [ ] 6.1 Run `go build ./...` to confirm clean compilation
- [ ] 6.2 Run `go test ./...` to confirm all tests pass
- [ ] 6.3 Clone https://github.com/gardener/documentation locally and delete `hugo/` if it exists
- [ ] 6.4 Build the docforge binary **before** this change and run it against the gardener/documentation repo to produce a baseline `hugo/` output; SHA-256 hash the directory contents, then delete `hugo/`
- [ ] 6.5 Build the docforge binary **after** this change and run it against the same repo; SHA-256 hash the resulting `hugo/` directory, then delete `hugo/`
- [ ] 6.6 Compare the two hashes — they MUST be identical to confirm zero output difference
20 changes: 20 additions & 0 deletions openspec/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
schema: spec-driven

# Project context (optional)
# This is shown to AI when creating artifacts.
# Add your tech stack, conventions, style guides, domain knowledge, etc.
# Example:
# context: |
# Tech stack: TypeScript, React, Node.js
# We use conventional commits
# Domain: e-commerce platform

# Per-artifact rules (optional)
# Add custom rules for specific artifacts.
# Example:
# rules:
# proposal:
# - Keep proposals under 500 words
# - Always include a "Non-goals" section
# tasks:
# - Break tasks into chunks of max 2 hours
Loading