-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Feature] reindex should support recursive directory traversal #1073
Description
Current Behavior
POST /api/v1/content/reindex with a URI only indexes the immediate directory — it reads .abstract.md and .overview.md for vector embedding, then scans direct files, but explicitly skips subdirectories:
# embedding_utils.py, index_resource()
if file_info.get("type") == "directory" or file_info.get("isDir"):
# TODO: Recursive indexing? For now, skip subdirectories to match previous behavior
continueProblem
After importing data or clearing the vectordb (e.g. to fix #1072), calling reindex on a top-level URI like viking://resources only indexes that one directory's metadata. All subdirectory content (memories, logs, sessions, etc.) remains unindexed.
With ~77 directories containing .abstract.md files, users have to manually call reindex on each one — and even that doesn't work because sub-paths like viking://user/zhuren/memories return NOT_FOUND from the API.
Expected Behavior
reindex should support a recursive: true option (default false for backward compatibility) that walks the entire directory tree and indexes all levels.
Workaround
Currently writing external scripts to enumerate directories and call vectorize APIs individually, which is fragile and slow.
Related
- [Bug] RocksDB lock contention: multiple _SingleAccountBackend instances open same DB path (local backend) #1072 (RocksDB lock fix — after fixing the lock bug, vectordb needs full reindex but can't do it recursively)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status