Local-first, safety-aware AI assistant for processing entire folders of documents.
Chinese version: README.zh.md
Give LAMB a folder. It scans, parses, safety-checks, chunks, calls an LLM, exports results, and writes an audit manifest.
Web AI tools are convenient for a single file, but they become fragile when the input is an entire folder.
| Pain | What usually happens | How LAMB helps |
|---|---|---|
| Too many files | Files are uploaded one by one | Scan and process a local folder |
| Long documents | Context overflow and unstable answers | Chunking plus map-reduce style workflows |
| Messy outputs | Manual copy-paste into tables and reports | Export Markdown, CSV, JSON, and DOCX |
| No traceability | No reliable record of inputs or failures | Write a manifest.json audit trail for each run |
| Prompt injection | Documents may contain instructions such as "ignore previous instructions" | Treat documents as untrusted evidence and record risk findings |
LAMB is not another chat interface. It turns a local document folder into a reproducible, auditable, reusable LLM document workflow.
folder
-> scan files
-> parse text
-> safety check
-> chunk long documents
-> LLM batch processing / multi-document QA / field extraction
-> export results
-> write audit manifest
- Multi-document research QA: ask questions across papers, reports, meeting notes, or course materials, then generate a cited Markdown report.
- Batch document processing: summarize, translate, polish, review, or grade every file in a folder.
- Structured field extraction: extract fields from homework, resumes, meeting notes, and reports into CSV or JSON.
- Safety-aware processing: detect prompt injection, redact sensitive values, and skip high-risk documents in strict mode.
- Long-document handling: split long inputs into chunks to reduce context-overflow risk.
- Installable, importable, executable: use it as a Python library or as the
lambcommand-line tool.
| Scenario | Example command | Output |
|---|---|---|
| Grade assignments | lamb extract homework --fields "Name,Score,Comment,Main Issues" |
CSV grading table |
| Read papers | lamb research papers --question "How do these papers differ in methods?" |
Cited research report |
| Organize meeting notes | lamb extract meetings --fields "Topic,Action Item,Owner,Deadline" |
Action-item table |
| Screen resumes | lamb extract resumes --fields "Name,Education,Skills,Projects" |
Candidate table |
| Synthesize reports | lamb research reports --question "What trends do these reports share?" |
Cross-document summary |
| Safe preview | lamb research docs --question "What do these documents discuss?" --dry-run |
Plan without LLM calls |
Install:
git clone https://github.com/xr997/LAMB.git
cd LAMB
pip install -e .Configure an OpenAI-compatible LLM:
cp .env.example .envLLM_API_KEY=your_api_key_here
LLM_BASE_URL=https://api.deepseek.comScan a folder:
lamb scan data/inputsAsk a question across documents:
lamb research data/inputs --question "What are the shared conclusions across these documents?"Extract a grading table:
lamb extract data/inputs --fields "Name,Score,Comment,Main Issues" --redactSummarize every file:
lamb batch data/inputs --instruction "Write a concise 200-word summary for this document." --format mdPreview a workflow without calling an LLM:
lamb research data/inputs --question "What do these documents discuss?" --dry-runfrom lamb import answer_over_directory, extract_fields, process_directory, scan_documents
records = scan_documents("data/inputs")
qa = answer_over_directory(
input_dir="data/inputs",
question="What are the shared conclusions across these documents?",
output_dir="data/outputs",
redact=True,
)
extraction = extract_fields(
input_dir="data/inputs",
fields=["Name", "Score", "Comment"],
output_dir="data/outputs",
)
batch = process_directory(
input_dir="data/inputs",
instruction="Generate a clear summary for each document.",
output_format="md",
)LAMB does not focus on transport-layer security. Its focus is LLM application-layer safety for real document workflows.
- Document content is wrapped as
<UNTRUSTED_DOCUMENT>and treated as evidence, not instructions. - Prompt builders explicitly forbid following commands embedded in document text.
- Prompt-injection signals are detected and recorded.
--strict-securityskips high-risk documents.--redactmasks emails, phone numbers, API keys, tokens, and JWT-like strings.- Hidden files are skipped by default and can be included explicitly with
--include-hidden;.env, credential-like files, and symbolic links are still refused. - Every run writes an audit manifest with inputs, outputs, parameters, risk findings, failures, and elapsed time.
Read more: Security Model
Depending on the workflow, LAMB writes:
- Markdown research reports
- CSV or JSON extraction tables
- Per-document Markdown, TXT, or DOCX outputs
*_manifest.jsonaudit files
python -m compileall lamb tests
python -m unittest discover -s testsRun real LLM integration tests manually:
RUN_REAL_LLM_TESTS=1 python -m unittest tests.test_real_llm_integrationMIT License.
