A cross-platform, local-first OpenClaw skill for building and maintaining a semantic index over a Zotero library.
It creates two complementary vector stores:
metadata_vectors.json— embeddings for Zotero item metadata (title, abstract, authors, tags, DOI, URL, etc.)fulltext_vectors.json— chunk embeddings extracted from PDF attachments in the Zotero storage directory
It also supports incremental updates with a safety-first workflow:
- detect missing items
- report the diff
- wait for user confirmation
- back up the store
- append the missing vectors
- retain only the latest and previous backup per file
zotero-vectorize/
├── README.md
├── LICENSE
├── .gitignore
├── .github/workflows/
├── dist/
├── tools/
│ ├── quick_validate.py
│ └── package_skill.py
└── skill/
└── zotero-vectorize/
├── SKILL.md
├── scripts/
└── references/
The actual skill lives under:
skill/zotero-vectorize/
This keeps the skill package itself clean and aligned with OpenClaw skill conventions, while the repository root can still contain GitHub-friendly files such as this README, LICENSE, CI workflow, and packaging helpers.
The skill provides:
- path detection for Zotero data directory, database, storage directory, and output directory
- SQLite snapshot creation before read-heavy operations
- full build for metadata vectors
- full build for PDF full-text vectors
- incremental diff checking
- incremental append after user confirmation
- store verification (counts, sizes, metadata)
- backup retention with only the latest and previous backup kept
The store uses explicit filenames:
metadata_vectors.jsonfulltext_vectors.jsonvector_store_metadata.jsonREADME.md
The skill is designed for:
- Windows
- macOS
- Linux
It supports:
- platform defaults for Zotero paths
- environment variable overrides
- explicit CLI flags for all critical paths
See:
skill/zotero-vectorize/references/windows.mdskill/zotero-vectorize/references/macos.mdskill/zotero-vectorize/references/linux.md
Typical Python dependencies:
sentence-transformerstorchPyMuPDFnumpy
git clone https://github.com/yckbz/zotero-vectorize
cd zotero-vectorizepython3 tools/package_skill.py skill/zotero-vectorize distThis creates:
dist/zotero-vectorize.skill
Install the generated .skill file using your preferred OpenClaw / ClawHub workflow.
- detect paths
- snapshot Zotero DB
- build metadata vectors
- build full-text vectors
- check incremental updates
- apply incremental updates only after user confirmation
- verify counts and file sizes
This repo includes self-contained tooling so contributors do not need a local OpenClaw source checkout just to validate/package the skill.
python3 tools/quick_validate.py skill/zotero-vectorizepython3 tools/package_skill.py skill/zotero-vectorize distThis skill is intentionally conservative:
- Zotero is treated as read-only input
- the skill snapshots the database before reads
- updates are not applied before reporting the missing items
- store files are backed up before rewrite
- only the latest and previous backup are retained
This repository contains a tested first release candidate:
- skill structure validated
- full build tested in an isolated output directory
- incremental update workflow tested
- backup retention behavior tested
- packaging tested
Future improvements may include:
- OCR fallback for scanned PDFs
- semantic search CLI helpers
- optional vector-database backend (Faiss / LanceDB / Qdrant / Chroma)
- collection-scoped indexing
- date-window incremental modes
MIT