LAMB

Local-first, safety-aware AI assistant for processing entire folders of documents.

Give LAMB a folder. It scans, parses, safety-checks, chunks, calls an LLM, exports results, and writes an audit manifest.

Why LAMB

Web AI tools are convenient for a single file, but they become fragile when the input is an entire folder.

Pain	What usually happens	How LAMB helps
Too many files	Files are uploaded one by one	Scan and process a local folder
Long documents	Context overflow and unstable answers	Chunking plus map-reduce style workflows
Messy outputs	Manual copy-paste into tables and reports	Export Markdown, CSV, JSON, and DOCX
No traceability	No reliable record of inputs or failures	Write a `manifest.json` audit trail for each run
Prompt injection	Documents may contain instructions such as "ignore previous instructions"	Treat documents as untrusted evidence and record risk findings

LAMB is not another chat interface. It turns a local document folder into a reproducible, auditable, reusable LLM document workflow.

What LAMB Does

folder
  -> scan files
  -> parse text
  -> safety check
  -> chunk long documents
  -> LLM batch processing / multi-document QA / field extraction
  -> export results
  -> write audit manifest

Core Capabilities

Multi-document research QA: ask questions across papers, reports, meeting notes, or course materials, then generate a cited Markdown report.
Batch document processing: summarize, translate, polish, review, or grade every file in a folder.
Structured field extraction: extract fields from homework, resumes, meeting notes, and reports into CSV or JSON.
Safety-aware processing: detect prompt injection, redact sensitive values, and skip high-risk documents in strict mode.
Long-document handling: split long inputs into chunks to reduce context-overflow risk.
Installable, importable, executable: use it as a Python library or as the lamb command-line tool.

Daily Use Cases

Scenario	Example command	Output
Grade assignments	`lamb extract homework --fields "Name,Score,Comment,Main Issues"`	CSV grading table
Read papers	`lamb research papers --question "How do these papers differ in methods?"`	Cited research report
Organize meeting notes	`lamb extract meetings --fields "Topic,Action Item,Owner,Deadline"`	Action-item table
Screen resumes	`lamb extract resumes --fields "Name,Education,Skills,Projects"`	Candidate table
Synthesize reports	`lamb research reports --question "What trends do these reports share?"`	Cross-document summary
Safe preview	`lamb research docs --question "What do these documents discuss?" --dry-run`	Plan without LLM calls

Quick Demo

Install:

git clone https://github.com/xr997/LAMB.git
cd LAMB
pip install -e .

Configure an OpenAI-compatible LLM:

cp .env.example .env

LLM_API_KEY=your_api_key_here
LLM_BASE_URL=https://api.deepseek.com

Scan a folder:

lamb scan data/inputs

Ask a question across documents:

lamb research data/inputs --question "What are the shared conclusions across these documents?"

Extract a grading table:

lamb extract data/inputs --fields "Name,Score,Comment,Main Issues" --redact

Summarize every file:

lamb batch data/inputs --instruction "Write a concise 200-word summary for this document." --format md

Preview a workflow without calling an LLM:

lamb research data/inputs --question "What do these documents discuss?" --dry-run

Python SDK

from lamb import answer_over_directory, extract_fields, process_directory, scan_documents

records = scan_documents("data/inputs")

qa = answer_over_directory(
    input_dir="data/inputs",
    question="What are the shared conclusions across these documents?",
    output_dir="data/outputs",
    redact=True,
)

extraction = extract_fields(
    input_dir="data/inputs",
    fields=["Name", "Score", "Comment"],
    output_dir="data/outputs",
)

batch = process_directory(
    input_dir="data/inputs",
    instruction="Generate a clear summary for each document.",
    output_format="md",
)

Safety Model

LAMB does not focus on transport-layer security. Its focus is LLM application-layer safety for real document workflows.

Document content is wrapped as <UNTRUSTED_DOCUMENT> and treated as evidence, not instructions.
Prompt builders explicitly forbid following commands embedded in document text.
Prompt-injection signals are detected and recorded.
--strict-security skips high-risk documents.
--redact masks emails, phone numbers, API keys, tokens, and JWT-like strings.
Hidden files are skipped by default and can be included explicitly with --include-hidden; .env, credential-like files, and symbolic links are still refused.
Every run writes an audit manifest with inputs, outputs, parameters, risk findings, failures, and elapsed time.

Outputs

Depending on the workflow, LAMB writes:

Markdown research reports
CSV or JSON extraction tables
Per-document Markdown, TXT, or DOCX outputs
*_manifest.json audit files

System Overview

Documentation

Development

python -m compileall lamb tests
python -m unittest discover -s tests

Run real LLM integration tests manually:

RUN_REAL_LLM_TESTS=1 python -m unittest tests.test_real_llm_integration

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
client		client
core		core
data		data
docs		docs
lamb		lamb
parsers		parsers
server		server
templates		templates
tests		tests
utils		utils
writers		writers
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAMB

Why LAMB

What LAMB Does

Core Capabilities

Daily Use Cases

Quick Demo

Python SDK

Safety Model

Outputs

System Overview

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LAMB

Why LAMB

What LAMB Does

Core Capabilities

Daily Use Cases

Quick Demo

Python SDK

Safety Model

Outputs

System Overview

Documentation

Development

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages