Dev#26
Closed
DuesselbergAdrian wants to merge 60 commits into
Closed
Conversation
Add CV document parser + /applications/parse upload endpoint
feat: config.py + llm_service.py | test for llm_service.py
Add /applications/generate orchestration with Redis caching, request
Day2 person a
code still references settings.timeout_seconds, changed as per fix
A Day 3: JD analysis + grounded cover letter endpoints
…tem that verifies cover letters against extracted CV facts
feat(day4A): implemented the Auditor
## What Was Built ### 1. Golden Test Dataset (test_cases.json) - 8 comprehensive test cases: 1 happy path + 7 edge cases - Covers sparse CVs, skill mismatches, quantitative verification, international characters - Each test case includes expected hallucination rates and metrics ### 2. Automated Evaluation Pipeline (evaluation_suite.py) - End-to-end evaluation of cover letter generation pipeline - Tracks quality metrics: hallucination rate, confidence, support ratio - Tracks performance metrics: P50/P95/P99 latency - Tracks cost metrics: token usage, API costs - MLflow integration for experiment tracking - CLI with filtering options ### 3. Prometheus Production Metrics (prometheus_metrics.py) - 15+ production-ready metrics - Four Golden Signals: Latency, Traffic, Errors, Saturation - Auto-instrumentation via decorators - LLM-specific metrics: hallucination rate, token usage, API costs ### 4. Comprehensive Test Suite (test_evaluation_suite.py) - 14 unit tests covering all functionality - Tests initialization, execution, metrics, reporting, MLflow logging - All tests passing ✅ ### 5. Documentation - EVALUATION_SUITE_USAGE.md: Quick start guide - demo_evaluation.py: Demo script (no API keys needed) - Pedagogical guides: evaluation strategies & production metrics ## Import Fixes - Updated imports from `app.*` to `backend.app.*` for proper module execution - Fixed in: auditor.py, fact_extractor.py, prompts.py, llm_service.py ## Testing - All 14 evaluation suite tests passing - Evaluation suite CLI functional - Demo script verified
## Implementation
Refactored LLM service to support automatic fallback from OpenAI to Google
Gemini using LangChain, providing increased reliability and cost optimization.
### Changes
1. **LLM Service (llm_service.py)**
- Added LangChain integration (ChatOpenAI, ChatGoogleGenerativeAI)
- Implemented automatic OpenAI → Gemini fallback on errors
- Enhanced observability with provider tracking
- Cost: ~90% savings with Gemini fallback ($0.01 vs $0.10)
2. **Configuration (config.py)**
- Added gemini_api_key field
- Added gemini_model configuration (gemini-2.5-flash)
3. **Tests (test_llm_service.py)**
- Updated for LangChain mocking
- Added fallback mechanism test
- All 56 tests passing
4. **Markdown Stripping**
- Already in auditor.py and fact_extractor.py
- Handles Gemini's code-wrapped JSON responses
### Testing
Evaluation suite tested with real API calls:
- ✅ Success Rate: 100%
- ✅ Hallucination Rate: 0%
- ✅ Cost per request: $0.01 (Gemini) vs $0.10 (OpenAI)
- ✅ All fallback transitions logged to MLflow
### Dependencies Added
- langchain
- langchain-openai
- langchain-google-genai
Person B Day 4: CV enhancement, results endpoint, structured logging
Implement Day 5 Person A: Evaluation Suite & Production Metrics
- - Configures pytest to add project root to Python path (pythonpath = .) - Now pytest works directly without needing python -m pytest - Sets default test paths and options
Fix/imports
Document Generation + Download Endpoints + CI Pipeline DONT MERGE
- Load balancer configuration - Environment variables management - S3 for document storage - CloudWatch logs configuration - Prometheus + Grafana dashboards - Alert rules
Day6_PersonA: - Terraform for AWS ECS
Frontend foundation: Vite + React + TypeScript + Tailwind
Results UI scaffold + backend contract alignment (WIP)
…olicies, ALB routing
1b310c3 to
479a54b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.