Pyragify turns a code repository into plain-text chunks that are easier to load into NotebookLM and other LLM tools. It extracts semantic units from source files, writes .txt output grouped by file type, and stores metadata for incremental re-runs.
- Chunks Python code into functions, classes, and comments
- Splits Markdown files by header sections
- Processes common repository files into LLM-friendly text output
- Respects
.gitignoreand.dockerignorepatterns - Tracks file hashes so unchanged files can be skipped on later runs
Pyragify has dedicated handling for:
- Python:
.py - Markdown:
.md,.markdown - HTML:
.html - CSS:
.css - Other common repository files are included as plain text when they can be read as UTF-8
uv pip install pyragifyor
pip install pyragifygit clone https://github.com/ThomasBury/pyragify.git
cd pyragify
uv sync --group devThe default entrypoint is pyragify.
uv run pyragify --config-file config.yamlYou can also run it as a module:
python -m pyragify --config-file config.yamlIf you do not use config.yaml, pass every setting you want to rely on directly on the command line.
uv run pyragify \
--repo-path /path/to/repository \
--output-dir /path/to/output \
--max-words 200000 \
--max-file-size 10485760 \
--skip-patterns "*.log" \
--skip-patterns "*.tmp" \
--skip-dirs "__pycache__" \
--skip-dirs "node_modules" \
--verbose- Use
pyragify --helpfor the full option list - Command-line options override values loaded from
config.yaml - Repeat
--skip-patternsonce per pattern - Repeat
--skip-dirsonce per directory name
Example config.yaml:
repo_path: /path/to/repository
output_dir: /path/to/output
max_words: 200000
max_file_size: 10485760 # 10 MB
skip_patterns:
- "*.log"
- "*.tmp"
skip_dirs:
- "__pycache__"
- "node_modules"
verbose: false- Point
repo_pathat the repository you want to process. - Choose an
output_dirwhere generated chunks and metadata should be written. - Run
uv run pyragify --config-file config.yamlor pass the same settings on the command line. - Open the generated files in
output/, especiallyoutput/remaining/chunk_0.txt, in NotebookLM or another LLM workflow.
The generated output is grouped by content type:
python/: Python functions, classes, and comment chunksmarkdown/: Markdown sections split by headershtml/: HTML script and style chunkscss/: CSS rule chunksother/: Readable files that do not have a dedicated parserremaining/: Overflow chunks once grouped outputs reach the word limitmetadata.json: Summary of processed fileshashes.json: MD5 hashes used for incremental processing
- Run Pyragify on the repository you care about.
- Upload one or more generated
.txtchunks to a NotebookLM notebook. - Ask questions about the codebase and use the generated citations to trace answers back to the source text.
Set up the local environment:
uv sync --group devRun the test suite:
uv run pytestRun a focused test slice while iterating:
uv run pytest tests/test_processor.py -k markdownContributions are welcome. Open an issue for bugs or feature requests, then send a pull request with focused changes and matching tests.
This project is licensed under the MIT License. See LICENSE.
