Fix/semantic search image build time by vaclisinc · Pull Request #1081 · asuc-octo/berkeleytime

vaclisinc · 2026-02-26T01:08:02Z

Overview and Problem statement

In our old version, we pre-downloaded the bge-base-en-v1.5 model at build time. This causes long build time and is not friendly for quick bug fix.

This PR basically just (1) remove model pre-download from dockerfile to speed up deploys on k8s or local (2) download the cpu version of torch to save disk space (3) fix semantic search bar.

Implementation

Dockerfile — Removed the RUN step that pre-downloaded the BAAI/bge-base-en-v1.5 model (~400MB) at build time. This was the main bottleneck slowing down every image build.
docker-compose.yml / K8s semantic-search.yaml — Added a volume mount persisting the HuggingFace model cache (/root/.cache/huggingface) to the host.
Before the sentence-transformer package defaults downloads the dependency of gpu version of torch (4.35GB), I let it download the torch cpu version (~1GB).

The model is downloaded on the very first time I build it up and cached to the host volume. All subsequent deploys, image rebuilds, and pod restarts load it from the volume — same pattern as the existing FAISS index persistence via hostPath.

Same in your local dev, only the first time you docker compose up --build -d it will download the model. After the first time, it will directly read from ./data/semantic-search/model-cache/

Detail solution refers to this commit: 192ad51

Result

semantic-search image is even faster than other images. Total build time in github-action reduces ~86.5% (from 20 minutes to 2m 42s).

search bar:

…nt variables

…ration

…y context When using git URL context with subdirectory (:apps/semantic-search), the file path must be relative to that subdirectory, not repo root. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The label should be app.kubernetes.io/name=semantic-search, not the full deployment name Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Pre-download BAAI/bge-base-en-v1.5 model during Docker build so container doesn't need to download 420MB on every startup - Increase startupProbe to 10 minutes (from 5) for safety Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ables in Docker Compose

…tic search

…erms and (2) make index save in disk -> not deleted by every deployment

…nfra

- Restore deleted semantic-search module files (client.ts, controller.ts, requirements.txt) - Re-add semantic search routes to express loader - Restore ClassBrowser AI search UI components - Update fuzzy-find imports to use @repo/common - Add semantic-search to typedef validation exclusions - Restore semantic search config in packages/common Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Change import from @repo/common to @repo/common/models - Add explicit type annotation for termsWithClasses.map Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Re-queue failed index builds with exponential backoff (up to 10 rounds) - Retry entire startup cycle when backend isn't ready yet - Enable PVC for dev environments so indexes persist across pod restarts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Datapuller needs this to call /refresh on the semantic search service after updating class data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

No longer needed since we use hostPath instead of PVC. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nfra

…nfra merge from main

…or local

fix search bar looks

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5bae94653

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-26T01:12:19Z

apps/datapuller/src/pullers/classes.ts

+    const result = (await response.json()) as { size?: number };
+    log.info(
+      `[Semantic Search] Refreshed ${term.name}: ${result.size ?? "unknown"} courses indexed`
+    );
+    return true;


Handle async refresh status before marking term refreshed

refreshSemanticSearchForTerm treats every 2xx response as a successful rebuild, but /refresh is asynchronous and can return {"status":"already_building"} while another term is still in progress (see SemanticSearchEngine.refresh_async). In a multi-term run, later requests can be acknowledged but never queued/built, yet this code still counts them as refreshed, leaving semantic indexes stale for those terms.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-26T01:12:19Z

apps/frontend/src/components/ClassBrowser/index.tsx

-  );
+  const filteredClasses = useMemo(() => {
+    // If AI search is active and we have semantic results, filter by those
+    if (aiSearchActive && semanticResults.length > 0) {


Keep AI search mode from falling back on empty semantic hits

When AI search is enabled, filteredClasses only uses semantic results if semanticResults.length > 0; an empty semantic response therefore falls through to normal fuzzy search. That means users can run “AI Search” and still see keyword matches even though semantic search found nothing, which produces misleading results for no-hit queries.

Useful? React with 👍 / 👎.

github-actions · 2026-03-05T09:53:36Z

Linting Failed

Note: The status check will always pass. Run npm run lint -- --continue to see the full output locally.

Click to expand lint output


> lint
> turbo run lint --continue --output-logs=errors-only


Attention:
Turborepo now collects completely anonymous telemetry regarding usage.
This information is used to shape the Turborepo roadmap and prioritize features.
You can learn more, including how to opt-out if you'd not like to participate in this anonymous program, by visiting the following URL:
https://turborepo.com/docs/telemetry

• Packages in scope: @repo/common, @repo/eslint-config, @repo/gql-typedefs, @repo/shared, @repo/sis-api, @repo/storybook, @repo/theme, @repo/typescript-config, ag-frontend, api-sandbox, backend, datapuller, frontend, staff-frontend
• Running lint in 14 packages
• Remote caching disabled
�[;31mfrontend:lint�[;0m
cache miss, executing 3dc199213b2eaa24

> lint
> eslint src/


/home/runner/work/berkeleytime/berkeleytime/apps/frontend/src/components/BubbleCard/index.tsx
  106:10  warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components

/home/runner/work/berkeleytime/berkeleytime/apps/frontend/src/components/Capacity/index.tsx
  9:14  warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components

/home/runner/work/berkeleytime/berkeleytime/apps/frontend/src/components/Chart/ChartContext.tsx
  7:17  warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components

/home/runner/work/berkeleytime/berkeleytime/apps/frontend/src/components/Chart/index.tsx
   5:10  warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components
   8:3   warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components
   9:3   warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components
  10:3   warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components
  11:3   warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components

/home/runner/work/berkeleytime/berkeleytime/apps/frontend/src/components/ClassBrowser/List/index.tsx
  323:3  error  Parsing error: '}' expected

/home/runner/work/berkeleytime/berkeleytime/apps/frontend/src/components/ScheduleSummary/index.tsx
  11:14  warning  Fast refresh only works when a file only exports components. Use a new file to share constants or functions between components  react-refresh/only-export-components

✖ 10 problems (1 error, 9 warnings)

npm error Lifecycle script `lint` failed with error:
npm error code 1
npm error path /home/runner/work/berkeleytime/berkeleytime/apps/frontend
npm error workspace frontend
npm error location /home/runner/work/berkeleytime/berkeleytime/apps/frontend
npm error command failed
npm error command sh -c eslint src/
[WARN] command finished with error, but continuing...
::error::frontend#lint: command (/home/runner/work/berkeleytime/berkeleytime/apps/frontend) /opt/hostedtoolcache/node/22.12.0/x64/bin/npm run lint exited (1)

 Tasks:    6 successful, 7 total
Cached:    0 cached, 7 total
  Time:    11.475s 
Failed:    frontend#lint

 ERROR  run failed: command  exited (1)

vaclisinc and others added 30 commits January 15, 2026 17:20

fix: github workflow for semantic search probe

8b7ff4f

use for debug

37c51ce

fix: module import bug

c95d04f

fix: enhance semantic search deployment and timeout handling

16d7995

fix: hardcode backend URL and add default semester to infra environme…

fd44c17

…nt variables

use for debugging

9a6c3be

fix: update Dockerfile and CI configuration for semantic search integ…

7a16dab

…ration

fix: add context for semantic search image build in CI workflow

9ead993

fix: make Dockerfile path conditional for semantic-search subdirector…

aee306b

…y context When using git URL context with subdirectory (:apps/semantic-search), the file path must be relative to that subdirectory, not repo root. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: correct kubectl label selector for semantic-search logs

f898282

The label should be app.kubernetes.io/name=semantic-search, not the full deployment name Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: add retry logic for fetching courses and update environment vari…

ace1515

…ables in Docker Compose

feat: add 'building' status when building index so that we can monitor

e70fda6

fix: implement asynchronous index refresh and error tracking in seman…

589dc94

…tic search

fix: correct backend port reference in semantic search configuration

36408d2

fix: (1) update semantic search index refresh logic to exclude past t…

31bf8df

…erms and (2) make index save in disk -> not deleted by every deployment

?

f90db19

Merge remote-tracking branch 'origin/main' into fix/semantic-search-i…

6df6aa2

…nfra

fix: correct imports in datapuller classes.ts

737a1c5

- Change import from @repo/common to @repo/common/models - Add explicit type annotation for termsWithClasses.map Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

add semantic search label and variable to Helm chart

ebdeac8

fix: add SEMANTIC_SEARCH_URL to datapuller ConfigMap

428f2ab

Datapuller needs this to call /refresh on the semantic search service after updating class data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: remove unused PVC storage value

84f759f

No longer needed since we use hostPath instead of PVC. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into fix/semantic-search-i…

e277ed2

…nfra

fix: PR review

4ec6012

fix: search bar

94c2259

Merge branch 'main' of https://github.com/asuc-octo/berkeleytime

bfc9134

fix: button arrangement

5ab914b

Merge remote-tracking branch 'origin/main' into fix/semantic-search-i…

14b44dc

…nfra merge from main

vaclisinc added 2 commits February 25, 2026 16:32

remove model pre-download from dockerfile to speed up deploys on k8s …

192ad51

…or local

Merge branch 'fix/semantic-search-bar' into fix/semantic-search-infra

e5bae94

fix search bar looks

vaclisinc changed the title ~~Fix/semantic search infra~~ Fix/semantic search image build long time Feb 26, 2026

vaclisinc changed the title ~~Fix/semantic search image build long time~~ Fix/semantic search image build time Feb 26, 2026

vaclisinc requested a review from ARtheboss February 26, 2026 01:10

chatgpt-codex-connector bot reviewed Feb 26, 2026

View reviewed changes

fix: search bar arrangement

7f445de

vaclisinc deployed to development February 26, 2026 01:38 — with GitHub Actions View deployment

fix: pin to download cpu version of torch - reduce from 4.35GB -> 1GB

4141a19

vaclisinc deployed to development February 26, 2026 01:53 — with GitHub Actions View deployment

Merge branch 'main' into fix/semantic-search-infra

6a9967a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/semantic search image build time#1081

Fix/semantic search image build time#1081
vaclisinc wants to merge 35 commits intomainfrom
fix/semantic-search-infra

vaclisinc commented Feb 26, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vaclisinc commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview and Problem statement

Implementation

Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 5, 2026

Linting Failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vaclisinc commented Feb 26, 2026 •

edited

Loading