Skip to content

[Feature] reindex should support recursive directory traversal #1073

@plhys

Description

@plhys

Current Behavior

POST /api/v1/content/reindex with a URI only indexes the immediate directory — it reads .abstract.md and .overview.md for vector embedding, then scans direct files, but explicitly skips subdirectories:

# embedding_utils.py, index_resource()
if file_info.get("type") == "directory" or file_info.get("isDir"):
    # TODO: Recursive indexing? For now, skip subdirectories to match previous behavior
    continue

Problem

After importing data or clearing the vectordb (e.g. to fix #1072), calling reindex on a top-level URI like viking://resources only indexes that one directory's metadata. All subdirectory content (memories, logs, sessions, etc.) remains unindexed.

With ~77 directories containing .abstract.md files, users have to manually call reindex on each one — and even that doesn't work because sub-paths like viking://user/zhuren/memories return NOT_FOUND from the API.

Expected Behavior

reindex should support a recursive: true option (default false for backward compatibility) that walks the entire directory tree and indexes all levels.

Workaround

Currently writing external scripts to enumerate directories and call vectorize APIs individually, which is fragile and slow.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions