Conversation
…y context When using git URL context with subdirectory (:apps/semantic-search), the file path must be relative to that subdirectory, not repo root. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The label should be app.kubernetes.io/name=semantic-search, not the full deployment name Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Pre-download BAAI/bge-base-en-v1.5 model during Docker build so container doesn't need to download 420MB on every startup - Increase startupProbe to 10 minutes (from 5) for safety Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ables in Docker Compose
…erms and (2) make index save in disk -> not deleted by every deployment
- Restore deleted semantic-search module files (client.ts, controller.ts, requirements.txt) - Re-add semantic search routes to express loader - Restore ClassBrowser AI search UI components - Update fuzzy-find imports to use @repo/common - Add semantic-search to typedef validation exclusions - Restore semantic search config in packages/common Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change import from @repo/common to @repo/common/models - Add explicit type annotation for termsWithClasses.map Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Re-queue failed index builds with exponential backoff (up to 10 rounds) - Retry entire startup cycle when backend isn't ready yet - Enable PVC for dev environments so indexes persist across pod restarts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Datapuller needs this to call /refresh on the semantic search service after updating class data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No longer needed since we use hostPath instead of PVC. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nfra merge from main
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5bae94653
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const result = (await response.json()) as { size?: number }; | ||
| log.info( | ||
| `[Semantic Search] Refreshed ${term.name}: ${result.size ?? "unknown"} courses indexed` | ||
| ); | ||
| return true; |
There was a problem hiding this comment.
Handle async refresh status before marking term refreshed
refreshSemanticSearchForTerm treats every 2xx response as a successful rebuild, but /refresh is asynchronous and can return {"status":"already_building"} while another term is still in progress (see SemanticSearchEngine.refresh_async). In a multi-term run, later requests can be acknowledged but never queued/built, yet this code still counts them as refreshed, leaving semantic indexes stale for those terms.
Useful? React with 👍 / 👎.
| ); | ||
| const filteredClasses = useMemo(() => { | ||
| // If AI search is active and we have semantic results, filter by those | ||
| if (aiSearchActive && semanticResults.length > 0) { |
There was a problem hiding this comment.
Keep AI search mode from falling back on empty semantic hits
When AI search is enabled, filteredClasses only uses semantic results if semanticResults.length > 0; an empty semantic response therefore falls through to normal fuzzy search. That means users can run “AI Search” and still see keyword matches even though semantic search found nothing, which produces misleading results for no-hit queries.
Useful? React with 👍 / 👎.
Linting FailedNote: The status check will always pass. Run Click to expand lint output |
Overview and Problem statement
In our old version, we pre-downloaded the bge-base-en-v1.5 model at build time. This causes long build time and is not friendly for quick bug fix.
This PR basically just (1) remove model pre-download from dockerfile to speed up deploys on k8s or local (2) download the cpu version of torch to save disk space (3) fix semantic search bar.
Implementation
The model is downloaded on the very first time I build it up and cached to the host volume. All subsequent deploys, image rebuilds, and pod restarts load it from the volume — same pattern as the existing FAISS index persistence via hostPath.
Detail solution refers to this commit: 192ad51
Result
semantic-search image is even faster than other images. Total build time in github-action reduces ~86.5% (from 20 minutes to 2m 42s).

search bar:
