Skip to content

feat: Add CV/Resume upload functionality with context integration#144

Open
mcnugets wants to merge 7 commits intosohzm:masterfrom
mcnugets:main
Open

feat: Add CV/Resume upload functionality with context integration#144
mcnugets wants to merge 7 commits intosohzm:masterfrom
mcnugets:main

Conversation

@mcnugets
Copy link
Copy Markdown

@mcnugets mcnugets commented Oct 5, 2025

Changes Made

New Features Added

  1. Dual Mode System

    • Interview Mode: Uses Gemini Live 2.0 with audio recording enabled
    • Coding/OA Mode: Uses Gemini 2.5 Flash/Pro with audio disabled, screenshot capture enabled
    • Mode selection UI in Advanced Settings
  2. CV/Resume Upload System

    • PDF upload and text extraction using pdf-parse library
    • CV context integration into AI prompts for both modes
    • CV management UI with upload, preview, and clear functionality
  3. Rate Limit Monitoring

    • Real-time display of API usage limits for different models
    • Rate limit info shown in Advanced Settings

Files Added

  • src/components/views/CVUploadView.js - CV upload interface
  • src/utils/pdfProcessor.js - PDF parsing and context generation

Files Modified

  • src/utils/gemini.js - Added CV context to prompts, dual-mode session handling
  • src/components/views/AdvancedView.js - Added mode selection and rate limit display
  • src/components/app/CheatingDaddyApp.js - Added CV status management
  • src/components/views/MainView.js - Added CV upload button
  • src/utils/renderer.js - Added mode-specific audio handling
  • package.json - Added pdf-parse and multer dependencies

Technical Implementation

  • CV context is appended to custom prompts before being passed to getSystemPrompt()
  • Audio capture is conditionally disabled in coding mode
  • IPC handlers added for CV upload, status, and management
  • Debug logging added for CV context verification

Dependencies Added

  • pdf-parse@1.1.1 - PDF text extraction
  • multer@1.4.5-lts.1 - File upload handling

Testing

  • CV upload and parsing works
  • Mode switching functions correctly
  • CV context appears in AI responses
  • Rate limit monitoring displays properly

- Add PDF processing with pdf-parse library
- Create CVUploadView component for file management
- Integrate CV context into interview mode prompts
- Add CV status display in MainView
- Support CV context in both interview and coding modes
- Add IPC handlers for CV upload, status, and management
- Include debug logging for CV context verification
- Update package.json with new dependencies
@Kanishk1420
Copy link
Copy Markdown

Kanishk1420 commented Oct 7, 2025

@mcnugets bro i have tested your pr in coding oa questions honestly the work is good but the requests from gemini-2.5 pro coming is not impressive it cant able to solve hard contest problems correctly can you do more on it if possible.

@mcnugets
Copy link
Copy Markdown
Author

mcnugets commented Oct 7, 2025

I've tested models themselves, unfortunately, this is as good as it gets, even models like sonnet 4.5 and gpt-5 struggle with OAs hard questions.

Do you have oa questions you were testing on with you?

@mcnugets bro i have tested your pr in coding oa questions honestly the work is good but the requests from gemini-2.5 pro coming is not impressive it cant able to solve hard contest problems correctly can you do more on it if possible.

@Kanishk1420
Copy link
Copy Markdown

@mcnugets i donr have well but we can try to connect this software with database like firebase etc something where we store this screenshots there. the reason they lack because mostly good compaines oa coding questions are not from internet well so these models dont find the correct answer from their instead they will try to make their code by own pattern recoginzation from what they trained well. i have not tested gpt-5 and sonnet for oa questions well.

@Kanishk1420
Copy link
Copy Markdown

@mcnugets if you want to test then you can go for codeforces problems because they are already tough level problems or leetcode contest questions too well

@mcnugets
Copy link
Copy Markdown
Author

mcnugets commented Oct 7, 2025

I tested those, didn't see problems with Gemini 2.5 solving them

@mcnugets if you want to test then you can go for codeforces problems because they are already tough level problems or leetcode contest questions too well

- Add Jest configuration with test environment setup
- Implement unit tests for core functions (cleanExtractedText, validation, etc.)
- Add integration tests for LLM chaining and component interactions
- Create E2E tests for complete user workflows
- Add performance tests for system performance monitoring
- Implement smoke tests for critical functionality verification
- Add test setup with Electron IPC mocking and helper functions
- Update package.json with Jest dependencies and test scripts
- Create ipcUtils.js to resolve circular dependency issues

Test coverage includes:
- Unit tests: 28 tests across core functions
- Integration tests: LLM chaining, context-aware OCR, error handling
- E2E tests: Complete workflows, provider switching, error recovery
- Performance tests: Response times, memory usage, concurrent handling
- Smoke tests: Critical functionality verification
- Add OpenRouter client with support for multiple AI models
- Support DeepSeekR1, GPT-4o, Claude 3.5 Sonnet, and other models
- Implement streaming chat functionality with real-time responses
- Add model selection UI with provider-specific API key inputs
- Support both direct vision models and text-only models
- Add model information and capabilities detection
- Implement proper error handling and response parsing
- Add session management for OpenRouter connections
- Update MainView to show dynamic API key fields based on selected model
- Add model mapping for different providers and capabilities

Features:
- Multi-model support with different rate limits
- Streaming responses for real-time interaction
- Provider-specific API key management
- Model capability detection (vision vs text-only)
- Comprehensive error handling and logging
- Implement LLM chaining: Gemini Live extracts text from screenshots, DeepSeekR1 processes text
- Add context-aware OCR system that detects different screen regions
- Support terminal, video, code editor, web browser, and mixed content detection
- Implement region-specific text extraction strategies
- Add JSON parsing with markdown code block handling
- Create fallback to basic text extraction if context analysis fails
- Add text cleaning to remove JSON formatting and ensure plain text output
- Implement screenshot capture specifically for OpenRouter integration
- Add intelligent routing: direct vision for GPT-4o/Claude, LLM chaining for DeepSeekR1
- Update renderer to combine user messages with screenshot data for DeepSeekR1
- Add proper error handling and retry logic for OCR failures

Technical improvements:
- Context analysis using Gemini 2.0 Flash Experimental
- Region detection with confidence scoring
- Context-specific extraction instructions
- Robust JSON parsing with fallback handling
- Memory-efficient text processing
- Error recovery with multiple fallback strategies
- Add comprehensive RateLimiter class with model-specific limits
- Implement exponential backoff retry logic for 429 errors
- Add pre-request rate limit checking to prevent hitting limits
- Support different rate limits per model (DeepSeekR1: 60/min, GPT-4o: 10/min, Claude: 5/min)
- Add real-time rate limit monitoring in AdvancedView UI
- Implement automatic retry with increasing delays (1s, 2s, 4s, max 30s)
- Add rate limit status tracking and display
- Update UI to show current usage, remaining requests, and reset times
- Add IPC handler for real-time rate limit status updates
- Implement smart waiting to prevent rate limit violations
- Add comprehensive error handling for rate limit scenarios

Rate limiting features:
- Model-specific request tracking per minute
- Automatic retry with exponential backoff
- Real-time UI monitoring every 10 seconds
- Pre-request validation to prevent 429 errors
- Smart delay calculation based on usage patterns
- Comprehensive logging and error reporting
- Implement comprehensive ChatGPT-style response formatting
- Add syntax highlighting for code blocks using highlight.js
- Support language detection and proper code block rendering
- Add copy-to-clipboard functionality for code blocks
- Implement collapsible sections for long responses
- Add proper typography and spacing for better readability
- Support markdown rendering with marked library
- Add loading indicators with typing animation
- Implement proper response animation timing
- Add language labels for code blocks
- Support multiple programming languages and formats

UI improvements:
- Modern chat interface styling
- Syntax highlighting for 20+ programming languages
- Responsive design for different screen sizes
- Smooth animations and transitions
- Professional code block presentation
- Enhanced readability and user experience
- Consistent styling across all views
- Add multi-OS testing (Ubuntu, Windows, macOS)
- Support multiple Node.js versions (18, 20)
- Implement test categorization (smoke, unit, integration, performance, e2e)
- Add coverage reporting and upload to Codecov
- Optimize test execution with parallel jobs
- Add build verification step
- Implement fail-fast strategy for faster feedback
- Add proper test isolation and environment setup
- Support both existing Vitest tests and new Jest tests
- Add comprehensive error reporting and logging

CI improvements:
- Cross-platform compatibility testing
- Multiple Node.js version support
- Comprehensive test coverage reporting
- Optimized test execution times
- Better error reporting and debugging
- Automated build verification
@mcnugets
Copy link
Copy Markdown
Author

mcnugets commented Oct 10, 2025

🚀 Major Feature Enhancement: Multi-Model AI Support with Advanced OCR and Rate Limiting

📋 Overview

This PR significantly enhances the cheating-daddy application with comprehensive multi-model AI support, advanced screenshot analysis, intelligent rate limiting, and professional UI improvements. The changes transform the app from a single-model solution to a robust, production-ready AI assistant.

✨ Key Features Added

🤖 Multi-Model AI Integration

  • OpenRouter API Support: Integration with multiple AI providers
  • Model Variety: DeepSeekR1, GPT-4o, Claude 3.5 Sonnet, and more
  • Smart Model Selection: Automatic routing based on capabilities
  • Provider-Specific API Keys: Dynamic UI for different authentication methods

🔍 Advanced Screenshot Analysis (LLM Chaining)

  • Context-Aware OCR: Detects terminal, video, code editor, web browser regions
  • Intelligent Text Extraction: Region-specific strategies for optimal results
  • LLM Chaining: Gemini Live extracts text → DeepSeekR1 processes content
  • Fallback Systems: Multiple extraction strategies for reliability

⚡ Intelligent Rate Limiting

  • Model-Specific Limits: DeepSeekR1 (60/min), GPT-4o (10/min), Claude (5/min)
  • Automatic Retry: Exponential backoff for 429 errors (1s → 2s → 4s → 30s max)
  • Pre-Request Validation: Prevents hitting rate limits proactively
  • Real-Time Monitoring: Live usage tracking in AdvancedView UI

🎨 Professional UI Enhancements

  • ChatGPT-Style Formatting: Modern chat interface with proper typography
  • Syntax Highlighting: Support for 20+ programming languages
  • Code Block Features: Copy-to-clipboard, language detection, collapsible sections
  • Responsive Design: Optimized for different screen sizes

🧪 Comprehensive Testing Framework

  • 28 Test Cases: Unit, Integration, E2E, Performance, and Smoke tests
  • Jest Integration: Modern testing framework with proper mocking
  • CI/CD Pipeline: Multi-OS testing (Ubuntu, Windows, macOS)
  • Coverage Reporting: Automated test coverage tracking

🔧 Technical Improvements

Architecture Enhancements

  • Modular Design: Separated concerns with dedicated utility files
  • Error Handling: Comprehensive error recovery and user feedback
  • Memory Management: Efficient resource usage and cleanup
  • Performance Optimization: Faster response times and reduced latency

📊 Impact Metrics

Testing Coverage

  • Unit Tests: 28 tests covering core functions
  • Integration Tests: Component interaction verification
  • E2E Tests: Complete user workflow validation
  • Performance Tests: System performance monitoring
  • Smoke Tests: Critical functionality verification

🛠️ Files Changed

New Files

  • src/utils/openrouter.js - OpenRouter API client
  • src/utils/ipcUtils.js - IPC utility functions
  • jest.config.js - Jest testing configuration
  • src/__tests/ - Comprehensive test suite (5 test files)

Modified Files

  • src/utils/gemini.js - Enhanced with LLM chaining and rate limiting
  • src/utils/renderer.js - Updated for multi-model support
  • src/components/views/AdvancedView.js - Added rate limit monitoring
  • src/components/views/AssistantView.js - ChatGPT-style formatting
  • src/components/views/HistoryView.js - Enhanced UI components
  • src/components/views/MainView.js - Dynamic API key inputs
  • src/components/app/CheatingDaddyApp.js - Model selection logic
  • .github/workflows/test.yml - Enhanced CI/CD pipeline

🎯 Use Cases

For Developers

  • Coding Assistance: DeepSeekR1 for complex programming problems
  • Code Review: GPT-4o for detailed code analysis
  • Documentation: Claude 3.5 for comprehensive explanations
  • Debugging: Context-aware screenshot analysis

For Students

  • Interview Preparation: CV-based personalized responses
  • Online Assessments: Screenshot analysis for coding challenges
  • Learning Support: Multi-model explanations and examples
  • Real-Time Help: Live assistance during coding sessions

🔒 Security & Privacy

  • Local Processing: OCR and analysis happen locally when possible
  • API Key Management: Secure storage and validation
  • Rate Limiting: Prevents abuse and excessive API usage
  • Error Handling: No sensitive data in error messages

@Sarbojit357
Copy link
Copy Markdown

@mcnugets Hey your PR is just perfect and it is giving fast responses as compared to the original one I tested. Bro release in the form of exe for Windows so that it can be used easily.

@mcnugets
Copy link
Copy Markdown
Author

mcnugets commented Nov 6, 2025

Sure I'll try add/fix some features. I noticed some OAs nowadays detect app running so I thought maybe adding voice input

@mcnugets Hey your PR is just perfect and it is giving fast responses as compared to the original one I tested. Bro release in the form of exe for Windows so that it can be used easily.

@Sarbojit357
Copy link
Copy Markdown

@mcnugets Okay bro release the app once it's done as early as possible. And, add this feature so that any proctored exam that will be held from the browser can't detect the app running in bg

@Sarbojit357
Copy link
Copy Markdown

Sarbojit357 commented Nov 6, 2025

And bro one more thing just wanted to ask which ai are you using in this modified version ?

@mcnugets
Copy link
Copy Markdown
Author

mcnugets commented Nov 6, 2025

What do you mean

@Sarbojit357
Copy link
Copy Markdown

Am asking which AI model are you using to generate the answers ?
Gemini 2.5 Flash or 2.5 Pro
Or any other AI model?

@Sarbojit357
Copy link
Copy Markdown

Hey @mcnugets Can you share your Discord ID want to talk with you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants