Feature: LLM Result Caching
Phase: 1 - Infrastructure & Core Pipeline
Module: catalog.store.llm_cache
Estimated Effort: Small (2-3 hours)
Business Value
Eliminates redundant LLM calls for repeated queries, reducing latency by 10-100x for cached results and cutting API costs. Essential for interactive search where users refine queries.
Technical Specification
New SQLAlchemy model and cache class:
class LLMCacheEntry(Base):
__tablename__ = "llm_cache_v2"
cache_key = Column(String, primary_key=True)
cache_type = Column(String, nullable=False) # 'expansion' | 'rerank'
result_json = Column(Text, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
class LLMCache:
def __init__(self, session: Session, ttl_hours: int = 168): ...
def get_expansion(self, query: str, model: str) -> dict | None: ...
def set_expansion(self, query: str, model: str, result: dict) -> None: ...
def get_rerank(self, query: str, doc_hash: str, model: str) -> float | None: ...
def set_rerank(self, query: str, doc_hash: str, model: str, score: float) -> None: ...
Acceptance Criteria
Dependencies
- catalog.store.database (existing)
Reference
See docs/features/rag-v2/phase-1-infrastructure.md for full specification.
Feature: LLM Result Caching
Phase: 1 - Infrastructure & Core Pipeline
Module:
catalog.store.llm_cacheEstimated Effort: Small (2-3 hours)
Business Value
Eliminates redundant LLM calls for repeated queries, reducing latency by 10-100x for cached results and cutting API costs. Essential for interactive search where users refine queries.
Technical Specification
New SQLAlchemy model and cache class:
Acceptance Criteria
Dependencies
Reference
See
docs/features/rag-v2/phase-1-infrastructure.mdfor full specification.