-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: migrate mentor topic extraction from gpt-4 to gpt-4.1-mini (#4671) #4675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great initiative to reduce costs by migrating the topic extraction from gpt-4 to the much cheaper gpt-4.1-mini model. The refactoring to use the shared llm_mini client is a clean approach that also brings the benefit of standardized usage tracking. The new unit tests are comprehensive and cover the changes well. I have one high-severity suggestion to ensure the reliability of the topic extraction does not regress due to a change in model parameters.
|
|
||
| # Parse the response text as JSON | ||
| response_text = response.choices[0].message.content.strip() | ||
| response_text = llm_mini.invoke(prompt).content.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation explicitly set temperature=0.3 and max_tokens=150. The new implementation using the shared llm_mini client will use its default temperature (likely 0.7) and no token limit. For a structured data extraction task like this, a lower temperature is important for ensuring reliable and consistently parseable JSON output. The higher default temperature increases randomness, which could lead to malformed JSON and cause json.loads() to fail. To prevent this potential regression and maintain the deterministic nature of the output, I recommend explicitly passing the original parameters to the invoke call.
| response_text = llm_mini.invoke(prompt).content.strip() | |
| response_text = llm_mini.invoke(prompt, temperature=0.3, max_tokens=150).content.strip() |
|
lgtm |
Summary
OpenAI(model="gpt-4")client inextract_topics()with sharedllm_mini(gpt-4.1-mini) langchain clientosimport, andclientguard — aligns with codebase conventionsgpt-4.1-miniis ~75x cheaper on input tokens ($0.40/M vs $30/M) for a trivial JSON topic extraction taskImpact
_usage_callbackonllm_miniQuality Eval: gpt-5.1 Judge (10 samples)
gpt-4.1-mini wins or ties 7/10 head-to-head matchups and scores higher on all three quality dimensions. No quality regression.
Files Changed
backend/utils/mentor_notifications.py— swap gpt-4 raw client →llm_mini.invoke()backend/tests/unit/test_mentor_notifications.py— 7 unit tests (source-level + functional)backend/tests/integration/test_mentor_topics_eval.py— gpt-5.1 judge eval (10 samples + 3 edge cases)backend/test.sh— register new test fileTest plan
backend/test.sh)Closes #4671
🤖 Generated with Claude Code