Skip to content

Helm chart: add persistent volume for local model cache (reranker/embeddings) #860

@isac322

Description

@isac322

Problem

When using local reranker models (e.g., BAAI/bge-reranker-v2-m3) or local embedding models, the models are downloaded to /home/hindsight/.cache on every pod restart. This causes:

  • Slow startup: ~1GB+ model download on each pod restart
  • Unnecessary bandwidth: repeated downloads of the same models
  • Unreliable in air-gapped environments: no internet access to download models

Current behavior

The api-deployment.yaml and worker-statefulset.yaml templates have no volume or volumeMount definitions. There is no way to persist the model cache between pod restarts via Helm values.

Proposed solution

Add optional persistent volume support for the model cache directory:

values.yaml:

api:
  persistence:
    modelCache:
      enabled: false
      size: 5Gi
      storageClass: ""
      accessModes:
        - ReadWriteOnce

worker:
  persistence:
    modelCache:
      enabled: false
      size: 5Gi
      storageClass: ""
      accessModes:
        - ReadWriteOnce

api-deployment.yaml (when api.persistence.modelCache.enabled):

volumeMounts:
  - name: model-cache
    mountPath: /home/hindsight/.cache
volumes:
  - name: model-cache
    persistentVolumeClaim:
      claimName: {{ fullname }}-api-model-cache

worker-statefulset.yaml (when worker.persistence.modelCache.enabled):
Add to volumeClaimTemplates since worker is a StatefulSet.

Workaround

Currently using ArgoCD ServerSideApply to patch the Deployment with a PVC, but native Helm support would be cleaner.

Environment

  • Hindsight: 0.4.22
  • Kubernetes: GKE
  • Local reranker: BAAI/bge-reranker-v2-m3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions