Multi-modal Optimizer + Context for Optimization#50
Multi-modal Optimizer + Context for Optimization#50allenanie wants to merge 71 commits intoexperimentalfrom
Conversation
…h`'s mock test to expect different kind of input
…into features/multimodal_opt
There was a problem hiding this comment.
Pull Request Overview
This PR implements multi-modal support for optimizers and introduces a context section to provide additional information during optimization. The changes enable image input handling, context passing, and improved structure for optimization prompts.
Key changes include:
- Multi-modal payload support for handling images alongside text queries
- Context section implementation for passing additional optimization context
- Optimizer API enhancements to support image and context inputs
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/test_priority_search.py | Added multi-modal message handling for test compatibility |
| opto/optimizers/utils.py | Added image encoding utility for base64 conversion |
| opto/optimizers/optoprime_v2.py | Main multi-modal and context implementation with API changes |
| opto/optimizers/opro_v2.py | Extended OPRO optimizer with context support |
| opto/features/flows/types.py | Added multi-modal payload types and query normalization |
| opto/features/flows/compose.py | Updated TracedLLM to handle multi-modal payloads |
| docs/tutorials/minibatch.ipynb | Updated escape sequences in notebook output |
| .github/workflows/ci.yml | Commented out optimizer test suite |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
TODO:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@allenanie I've opened a new pull request, #54, to work on those changes. Once the pull request is ready, I'll request review from you. |
[WIP] Add multi-modal optimizer and context support
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.
Comments suppressed due to low confidence (1)
opto/optimizers/optoprime_v2.py:236
- Call to method OptoPrime.extract_llm_suggestion with too few arguments; should be no fewer than 2.
return OptoPrime.extract_llm_suggestion(response)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
To increase backward compatibility, the When
When
For any Google models (starts with Even with this small change, a lot of details were handled:
In addition to This is not strictly necessary, but this helps us simplify Optimizer's design since it no longer needs to interact with raw LLM API response object. |
… Gemini-compatible history.
|
Multiturn conversation is tested. See test We store conversation history as structured data in
|
|
So far, all supporting functions for multi-modal capabilities are finished: Tests are finished: Remaining todos:
|
a62c202 to
c171201
Compare
ef542aa to
c0a0282
Compare
|
@chinganc I think this is ready for the first round of code review... Can you see if this notebook runs for you? My plan for the 2nd round:
|
…ion management tool. Expanded some functionality on `Chat`.
|
|
…ault because the `backbone.py` and other code already migrated. Can set `mm_beta=False` to go back to completion API.
Adding multi-modal support. Also introducing a context section.
For context, the design intention is that if the user provides context, it will appear in the user message; if no context is provided, the section will not be there.