Problem
When using local reranker models (e.g., BAAI/bge-reranker-v2-m3) or local embedding models, the models are downloaded to /home/hindsight/.cache on every pod restart. This causes:
- Slow startup: ~1GB+ model download on each pod restart
- Unnecessary bandwidth: repeated downloads of the same models
- Unreliable in air-gapped environments: no internet access to download models
Current behavior
The api-deployment.yaml and worker-statefulset.yaml templates have no volume or volumeMount definitions. There is no way to persist the model cache between pod restarts via Helm values.
Proposed solution
Add optional persistent volume support for the model cache directory:
values.yaml:
api:
persistence:
modelCache:
enabled: false
size: 5Gi
storageClass: ""
accessModes:
- ReadWriteOnce
worker:
persistence:
modelCache:
enabled: false
size: 5Gi
storageClass: ""
accessModes:
- ReadWriteOnce
api-deployment.yaml (when api.persistence.modelCache.enabled):
volumeMounts:
- name: model-cache
mountPath: /home/hindsight/.cache
volumes:
- name: model-cache
persistentVolumeClaim:
claimName: {{ fullname }}-api-model-cache
worker-statefulset.yaml (when worker.persistence.modelCache.enabled):
Add to volumeClaimTemplates since worker is a StatefulSet.
Workaround
Currently using ArgoCD ServerSideApply to patch the Deployment with a PVC, but native Helm support would be cleaner.
Environment
- Hindsight: 0.4.22
- Kubernetes: GKE
- Local reranker:
BAAI/bge-reranker-v2-m3
Problem
When using local reranker models (e.g.,
BAAI/bge-reranker-v2-m3) or local embedding models, the models are downloaded to/home/hindsight/.cacheon every pod restart. This causes:Current behavior
The
api-deployment.yamlandworker-statefulset.yamltemplates have no volume or volumeMount definitions. There is no way to persist the model cache between pod restarts via Helm values.Proposed solution
Add optional persistent volume support for the model cache directory:
values.yaml:
api-deployment.yaml (when
api.persistence.modelCache.enabled):worker-statefulset.yaml (when
worker.persistence.modelCache.enabled):Add to
volumeClaimTemplatessince worker is a StatefulSet.Workaround
Currently using ArgoCD ServerSideApply to patch the Deployment with a PVC, but native Helm support would be cleaner.
Environment
BAAI/bge-reranker-v2-m3