Skip to content

Dev#26

Closed
DuesselbergAdrian wants to merge 60 commits into
mainfrom
dev
Closed

Dev#26
DuesselbergAdrian wants to merge 60 commits into
mainfrom
dev

Conversation

@DuesselbergAdrian
Copy link
Copy Markdown
Collaborator

No description provided.

SmartGridsML and others added 30 commits December 15, 2025 15:54
Add CV document parser + /applications/parse upload endpoint
feat: config.py + llm_service.py | test for llm_service.py
Add /applications/generate orchestration with Redis caching, request
code still references settings.timeout_seconds, changed as per fix
A Day 3: JD analysis + grounded cover letter endpoints
…tem that verifies cover letters against extracted CV facts
feat(day4A): implemented the Auditor
   ## What Was Built

   ### 1. Golden Test Dataset (test_cases.json)
   - 8 comprehensive test cases: 1 happy path + 7 edge cases
   - Covers sparse CVs, skill mismatches, quantitative verification, international characters
   - Each test case includes expected hallucination rates and metrics

   ### 2. Automated Evaluation Pipeline (evaluation_suite.py)
   - End-to-end evaluation of cover letter generation pipeline
   - Tracks quality metrics: hallucination rate, confidence, support ratio
   - Tracks performance metrics: P50/P95/P99 latency
   - Tracks cost metrics: token usage, API costs
   - MLflow integration for experiment tracking
   - CLI with filtering options

   ### 3. Prometheus Production Metrics (prometheus_metrics.py)
   - 15+ production-ready metrics
   - Four Golden Signals: Latency, Traffic, Errors, Saturation
   - Auto-instrumentation via decorators
   - LLM-specific metrics: hallucination rate, token usage, API costs

   ### 4. Comprehensive Test Suite (test_evaluation_suite.py)
   - 14 unit tests covering all functionality
   - Tests initialization, execution, metrics, reporting, MLflow logging
   - All tests passing ✅

   ### 5. Documentation
   - EVALUATION_SUITE_USAGE.md: Quick start guide
   - demo_evaluation.py: Demo script (no API keys needed)
   - Pedagogical guides: evaluation strategies & production metrics

   ## Import Fixes
   - Updated imports from `app.*` to `backend.app.*` for proper module execution
   - Fixed in: auditor.py, fact_extractor.py, prompts.py, llm_service.py

   ## Testing
   - All 14 evaluation suite tests passing
   - Evaluation suite CLI functional
   - Demo script verified
SmartGridsML and others added 28 commits January 20, 2026 15:18
   ## Implementation

   Refactored LLM service to support automatic fallback from OpenAI to Google
   Gemini using LangChain, providing increased reliability and cost optimization.

   ### Changes

   1. **LLM Service (llm_service.py)**
      - Added LangChain integration (ChatOpenAI, ChatGoogleGenerativeAI)
      - Implemented automatic OpenAI → Gemini fallback on errors
      - Enhanced observability with provider tracking
      - Cost: ~90% savings with Gemini fallback ($0.01 vs $0.10)

   2. **Configuration (config.py)**
      - Added gemini_api_key field
      - Added gemini_model configuration (gemini-2.5-flash)

   3. **Tests (test_llm_service.py)**
      - Updated for LangChain mocking
      - Added fallback mechanism test
      - All 56 tests passing

   4. **Markdown Stripping**
      - Already in auditor.py and fact_extractor.py
      - Handles Gemini's code-wrapped JSON responses

   ### Testing

   Evaluation suite tested with real API calls:
   - ✅ Success Rate: 100%
   - ✅ Hallucination Rate: 0%
   - ✅ Cost per request: $0.01 (Gemini) vs $0.10 (OpenAI)
   - ✅ All fallback transitions logged to MLflow

   ### Dependencies Added

   - langchain
   - langchain-openai
   - langchain-google-genai
Person B Day 4: CV enhancement, results endpoint, structured logging
Implement Day 5 Person A: Evaluation Suite & Production Metrics
  -
- Configures pytest to add project root to Python path (pythonpath = .)
  - Now pytest works directly without needing python -m pytest
  - Sets default test paths and options
Document Generation + Download Endpoints + CI Pipeline DONT MERGE
- Load balancer configuration
- Environment variables management
- S3 for document storage
- CloudWatch logs configuration
- Prometheus + Grafana dashboards
- Alert rules
Day6_PersonA: - Terraform for AWS ECS
Frontend foundation: Vite + React + TypeScript + Tailwind
Results UI scaffold + backend contract alignment (WIP)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants