Skip to content

documentation infrastructure for humans and agents. crawls, renders, normalizes, and caches live docs into structured, versioned docsets.

Notifications You must be signed in to change notification settings

ShernanJ/docforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docforge

structured documentation infrastructure for humans and agents.

docforge crawls, renders, versions, and caches live documentation — turning messy, hard-to-scrape docs into clean, searchable artifacts that both humans and agents can reason over.

agents get a stable http interface.
humans get a readable, inspectable UI.
docs stop being scraped repeatedly and start being infrastructure.


why this exists

live documentation is one of the worst inputs for agents:

  • js-heavy pages
  • inconsistent structure
  • high token cost to scrape repeatedly
  • no versioning or freshness guarantees

docforge fixes this by:

  • rendering docs once (properly)
  • extracting structure, not just text
  • storing versioned docsets with diffs
  • exposing a deterministic api agents can trust

what docforge is

  • an agent-native api for documentation
  • a docset store with versioning + freshness
  • a human-readable ui to inspect what agents actually see
  • infra, not a chatbot

architecture (high level)

  • next.js
    human ui + api gateway (agent entrypoint)

  • fastapi workers + playwright
    render + crawl js-heavy documentation

  • redis + bullmq
    async ingestion and crawling jobs

  • postgres
    metadata, versions, chunks, search

  • s3-compatible storage (r2 / minio)
    raw snapshots + extracted artifacts

About

documentation infrastructure for humans and agents. crawls, renders, normalizes, and caches live docs into structured, versioned docsets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published