docs(search): add note about re-indexing when enabling Tika#2285
docs(search): add note about re-indexing when enabling Tika#2285michaelstingl wants to merge 1 commit intoopencloud-eu:mainfrom
Conversation
|
| opencloud search index --all-spaces | ||
| ``` | ||
|
|
||
| > **Note:** The re-index command skips files whose modification time has not changed since they were last indexed. If you changed the extractor type (e.g., from `basic` to `tika`), you need to delete the existing search index first to force a full content re-extraction: |
There was a problem hiding this comment.
@ScharfViktor Is that true? I think we need to verify this first.
There was a problem hiding this comment.
yes, that is true. I can reproduce it
This comment was marked as outdated.
This comment was marked as outdated.
| > **Note:** The re-index command skips files whose modification time has not changed since they were last indexed. If you changed the extractor type (e.g., from `basic` to `tika`), you need to delete the existing search index first to force a full content re-extraction: | ||
| > | ||
| > ```shell | ||
| > rm -rf $OC_BASE_DATA_PATH/search # default: /var/lib/opencloud/search |
There was a problem hiding this comment.
@micbar maybe it is bug? I expect re-index without deleting /search
There was a problem hiding this comment.
@aduffeck can you clarify? You know the implementation
There was a problem hiding this comment.
That's the current behavior, yes. I consider that a bug.
A re-index should unconditionally rebuild the index for the space/all space in my opinion. Maybe it would be helpful to also have a command or flag for just "syncing" the index, i.e. picking up changes that haven't been indexed yet (the current behavior), but that shouldn't be the default behavior of the index command.



Description
Add notes to the search service README clarifying that:
opencloud search index --all-spacescommand skips files with unchanged modification timeRelated Issue
Motivation and Context
When users enable Tika on an existing instance, they expect full-text search to work for all files. However,
opencloud search index --all-spacesskips files already in the index (mtime-based check inservices/search/pkg/search/service.go), so the Tika extractor is never called for previously indexed files. This is undocumented and confusing.How Has This Been Tested?
services/search/pkg/search/service.go(IndexSpace method, mtime skip logic at line ~495)--forceflag exists in the CLI or protobuf definition (IndexSpaceRequest)Types of changes
Checklist: