feat(eval): expose user_simulator_config in generate_responses#5733
feat(eval): expose user_simulator_config in generate_responses#5733primenko-v wants to merge 15 commits into
Conversation
|
Hi @primenko-v , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh. |
Thank you for the review @rohityan ! I installed and ran the pre-commit hook, as the autoformat.sh doesn't seem to exist anymore. |
|
@rohityan let me know if there's anything else needed! |
|
Hi @ankursharmas , can you please review this. |
| class TestGenerateResponses: | ||
| """Test cases for EvaluationGenerator.generate_responses method.""" |
There was a problem hiding this comment.
We don't need this new class. The newly added test case can be a part of the existing class TestGenerateInferencesFromRootAgent
There was a problem hiding this comment.
Thank you for the feedback @ankursharmas !
The newly added test exercises generate_responses rather than _generate_inferences_from_root_agent. It seems to me that test_evaluation_generator.py mostly follows a one-class-per-method convention, where each test class targets a single method. For example:
| Test class | Method under test |
|---|---|
TestConvertEventsToEvalInvocation |
convert_events_to_eval_invocations |
TestGetAppDetailsByInvocationId |
_get_app_details_by_invocation_id |
TestGenerateInferencesForSingleUserInvocation |
_generate_inferences_for_single_user_invocation |
TestGenerateInferencesForSingleUserInvocationLive |
_generate_inferences_for_single_user_invocation_live |
Following that, wouldn't a separate TestGenerateResponses class for generate_responses fit better here?
While looking at this I also noticed that test_generates_inferences_with_user_simulator_live lives in TestGenerateInferencesFromRootAgent even though it tests _generate_inferences_from_root_agent_live, so the convention is already a bit mixed there.
On a separate note: there was a careless merge on my side which I have now fixed.
There was a problem hiding this comment.
Ah makes sense! Thank you for pointing that out.
There was a problem hiding this comment.
@ankursharmas do you consider the conversation resolved?
There was a problem hiding this comment.
@ankursharmas this PR is currently blocked by this open thread. Let me know if you need any further changes from my side; otherwise, could you please resolve the conversation?
|
I can see that the I have tried syncing my branch with the latest It looks like the reformatted files were brought by this commit that was pushed to I have opened #5962 to fix the formatting introduced by that commit, and #5963 to try to prevent these formatting issues. |
|
The formatting issues were fixed on |
Merge #5733 ### Link to Issue or Description of Change **1. Link to an existing issue (if applicable):** N/A **2. Or, if no issue exists, describe the change:** **Problem:** `EvaluationGenerator.generate_responses` constructs a `UserSimulatorProvider()` with no arguments, so the LLM-backed path always runs with the default `BaseUserSimulatorConfig`. There is no way for a caller to override the user-simulation model, max-allowed invocations, or custom instructions when driving multi-turn conversations through `LlmBackedUserSimulator`. **Solution:** Add an optional `user_simulator_config` parameter to `generate_responses` and forward it to `UserSimulatorProvider(...)`. Callers can now pass an `LlmBackedUserSimulatorConfig` to customize the LLM-backed simulator. The behavior is backward compatible: - When the argument is omitted, `UserSimulatorProvider` falls back to `BaseUserSimulatorConfig()` exactly as before. - Static eval cases are unaffected: the config is ignored by `StaticUserSimulator`. ### Testing Plan **Unit Tests:** - [x] I have added or updated unit tests for my change. - [x] All unit tests pass locally. A unit test for the proposed change was added to `tests/unittests/evaluation/test_evaluation_generator.py`: `TestGenerateResponses::test_generate_responses_forwards_llm_backed_user_simulator_config` All tests pass: ``` > uv run pytest tests/unittests/ -rs ... ================================== short test summary info ================================== SKIPPED [1] tests/unittests/integrations/crewai/test_crewai_tool.py:20: Requires Python 3.10+ ================ 5770 passed, 1 skipped, 2358 warnings in 129.40s (0:02:09) ================ ``` The skipped test is not related to this change — it skips on `main` as well. **Manual End-to-End (E2E) Tests:** A reference setup lives at https://github.com/primenko-v/adk-x-mlflow (tag `pr-demo/user-simulator-config`). It loads an `LlmBackedUserSimulatorConfig` from YAML and forwards it to `EvaluationGenerator.generate_responses` via the new `user_simulator_config` parameter — see [`src/mlflow_adk/simulate.py`](https://github.com/primenko-v/adk-x-mlflow/blob/pr-demo/user-simulator-config/src/mlflow_adk/simulate.py#L74-L79). To reproduce (requires GOOGLE_CLOUD_PROJECT and ADC via `gcloud auth application-default login`): ```bash git clone --recurse-submodules --branch pr-demo/user-simulator-config \ https://github.com/primenko-v/adk-x-mlflow.git cd adk-x-mlflow cp .env.example .env # fill in GOOGLE_CLOUD_PROJECT uv sync uv run python -m mlflow_adk.simulate --no-mlflow --output-traces traces.jsonl ``` ### Checklist - [x] I have read the [CONTRIBUTING.md](https://github.com/google/adk-python/blob/main/CONTRIBUTING.md) document. - [x] I have performed a self-review of my own code. - [x] I have commented my code, particularly in hard-to-understand areas. - [x] I have added tests that prove my fix is effective or that my feature works. - [x] New and existing unit tests pass locally with my changes. - [x] I have manually tested my changes end-to-end. - [x] Any dependent changes have been merged and published in downstream modules. Co-authored-by: Ankur Sharma <ankusharma@google.com> COPYBARA_INTEGRATE_REVIEW=#5733 from primenko-v:propagate-user-simulator-config 24209b6 PiperOrigin-RevId: 933503403
|
Thank you @primenko-v for your contribution! 🎉 Your changes have been successfully imported and merged via Copybara in commit e7a673c. Closing this PR as the changes are now in the main branch. |
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
N/A
2. Or, if no issue exists, describe the change:
Problem:
EvaluationGenerator.generate_responsesconstructs aUserSimulatorProvider()with no arguments, so the LLM-backed path always runs with the defaultBaseUserSimulatorConfig. There is no way for a caller to override the user-simulation model, max-allowed invocations, or custom instructions when driving multi-turn conversations throughLlmBackedUserSimulator.Solution:
Add an optional
user_simulator_configparameter togenerate_responsesand forward it toUserSimulatorProvider(...). Callers can now pass anLlmBackedUserSimulatorConfigto customize the LLM-backed simulator.The behavior is backward compatible:
UserSimulatorProviderfalls back toBaseUserSimulatorConfig()exactly as before.StaticUserSimulator.Testing Plan
Unit Tests:
A unit test for the proposed change was added to
tests/unittests/evaluation/test_evaluation_generator.py:TestGenerateResponses::test_generate_responses_forwards_llm_backed_user_simulator_configAll tests pass:
The skipped test is not related to this change — it skips on
mainas well.Manual End-to-End (E2E) Tests:
A reference setup lives at https://github.com/primenko-v/adk-x-mlflow (tag
pr-demo/user-simulator-config).It loads an
LlmBackedUserSimulatorConfigfrom YAML and forwards it toEvaluationGenerator.generate_responsesvia the newuser_simulator_configparameter — seesrc/mlflow_adk/simulate.py.To reproduce (requires GOOGLE_CLOUD_PROJECT and ADC via
gcloud auth application-default login):git clone --recurse-submodules --branch pr-demo/user-simulator-config \ https://github.com/primenko-v/adk-x-mlflow.git cd adk-x-mlflow cp .env.example .env # fill in GOOGLE_CLOUD_PROJECT uv sync uv run python -m mlflow_adk.simulate --no-mlflow --output-traces traces.jsonlChecklist