Multi-modal Optimizer + Context for Optimization by allenanie · Pull Request #50 · AgentOpt/OpenTrace

allenanie · 2025-10-03T19:43:45Z

Adding multi-modal support. Also introducing a context section.

For context, the design intention is that if the user provides context, it will appear in the user message; if no context is provided, the section will not be there.

…h`'s mock test to expect different kind of input

…into features/multimodal_opt

Copilot

Pull Request Overview

This PR implements multi-modal support for optimizers and introduces a context section to provide additional information during optimization. The changes enable image input handling, context passing, and improved structure for optimization prompts.

Key changes include:

Multi-modal payload support for handling images alongside text queries
Context section implementation for passing additional optimization context
Optimizer API enhancements to support image and context inputs

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit_tests/test_priority_search.py	Added multi-modal message handling for test compatibility
opto/optimizers/utils.py	Added image encoding utility for base64 conversion
opto/optimizers/optoprime_v2.py	Main multi-modal and context implementation with API changes
opto/optimizers/opro_v2.py	Extended OPRO optimizer with context support
opto/features/flows/types.py	Added multi-modal payload types and query normalization
opto/features/flows/compose.py	Updated TracedLLM to handle multi-modal payloads
docs/tutorials/minibatch.ipynb	Updated escape sequences in notebook output
.github/workflows/ci.yml	Commented out optimizer test suite

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

opto/optimizers/optoprime_v2.py

opto/optimizers/opro_v2.py

opto/optimizers/optoprime_v2.py

allenanie · 2025-10-22T23:09:19Z

TODO:

~~Support Image URL directly loading (Adith)~~
~~Add support for in-memory image (like RGB/numpy) (Ching-An)~~
Node supports image (multi-modal) -- when we traverse it, we add the image as payload into the optimizer. Add a function to node to determine if it's image.

…cal file.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

allenanie · 2025-11-10T22:17:26Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2025-11-10T22:17:35Z

@allenanie I've opened a new pull request, #54, to work on those changes. Once the pull request is ready, I'll request review from you.

[WIP] Add multi-modal optimizer and context support

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (1)

opto/optimizers/optoprime_v2.py:236

Call to method OptoPrime.extract_llm_suggestion with too few arguments; should be no fewer than 2.

        return OptoPrime.extract_llm_suggestion(response)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

opto/features/flows/types.py

opto/optimizers/optoprime_v2.py

opto/optimizers/opro_v2.py

opto/optimizers/optoprime_v2.py

opto/features/flows/types.py

opto/features/flows/compose.py

opto/optimizers/optoprime_v2.py

opto/features/flows/types.py

opto/optimizers/utils.py

allenanie · 2025-12-27T05:34:54Z

To increase backward compatibility, the llm.py is designed in the following way:
(mm_beta means multi-modal beta version):

When mm_beta (multimodal) is enabled, we either use:

LiteLLM's response API (most compatible with OpenAI models, but also can work with others)

When mm_beta is disabled, for backward compatibility, we use:

LiteLLM's completion API (default)

For any Google models (starts with gemini in the model name), we use:
Google's generate_content API (LiteLLM's Gemini support is insufficient for our use case)

Even with this small change, a lot of details were handled:

OpenAI returns base64 string. Google GenAI library returns bytes.
Google GenAI library expects system_instruction explicitly passed in. OpenAI uses message role role="system".
(and other small quality of life updates)

In addition to llm.py changes, we update the AssistantTurn construction. Now it can take a raw response from the LLM API call and directly map the returned result into our class construct.

This is not strictly necessary, but this helps us simplify Optimizer's design since it no longer needs to interact with raw LLM API response object.

… Gemini-compatible history.

allenanie · 2025-12-27T07:07:14Z

Multiturn conversation is tested.

See test test_real_google_genai_multi_turn_with_images_updated in test_optimizer_backbone.py

We store conversation history as structured data in AssistantTurn and UserTurn. They are added to ConversationHistory object. When we need to pass them back into the LLM API call, we call history.to_messages() to automatically get the input, or explicitly call history.to_gemini_format() or history.to_litellm_format().

to_messages() will do a smart check to see the model used by the last AssistantTurn and automatically determine which format function is use. However, this is not 100% reliable (for example, if CustomLLM backend is used but a Gemini model is called, then this automatic conversion will fail because CustomLLM expects an OpenAI-compatible server).

allenanie · 2025-12-27T07:11:06Z

So far, all supporting functions for multi-modal capabilities are finished:

Tests are finished:

Remaining todos:

Integrate this into the optimizer class (i.e., support image parameter extraction)
Write a notebook to demonstrate usage of the backbone as well as new optimizer.

refactor and moved gemini input message history conversion to `ConversationHistory`

…-end now.

allenanie · 2026-01-09T05:16:21Z

@chinganc I think this is ready for the first round of code review...

Can you see if this notebook runs for you?
https://github.com/AgentOpt/OpenTrace/blob/9ea88c9079abc16558c139b0262f224620ffa751/examples/multimodal_html_example/example.ipynb

My plan for the 2nd round:

Add a new notebook for the new llm.py usage
Add a new notebook for the backbone.py usage

…ion management tool. Expanded some functionality on `Chat`.

allenanie · 2026-01-20T20:40:16Z

Chat now is a multi-modal multi-turn chat session management class with auto-truncation and tool support.

…ault because the `backbone.py` and other code already migrated. Can set `mm_beta=False` to go back to completion API.

… code from JSON.

allenanie added 11 commits October 3, 2025 15:40

initial changes

8961b02

make context optional

949d8ff

finish adding image support to optoprime_v2

e36dd7c

Finish updating OPRO to accept additional context

7dcb880

add context prompt into pickle save/load. Modify `test_priority_searc…

8a31e8b

…h`'s mock test to expect different kind of input

merge experimental branch updates

f8a0958

comment out the small-LLM test

c271d9c

Merge branch 'experimental' of https://github.com/AgentOpt/OpenTrace …

709a08a

…into features/multimodal_opt

add multi-modal support for the LLM model as well

148d4bc

update the image-context prompt on optimizer. Make LLM module better.

b340d91

fix a bug on QueryModel not handling Node as input

acae873

allenanie requested a review from Copilot October 13, 2025 17:48

Copilot AI reviewed Oct 13, 2025

View reviewed changes

chinganc self-assigned this Oct 13, 2025

allenanie and others added 3 commits November 10, 2025 17:11

add three types of image loading: from numpy array, from url, from lo…

1791b15

…cal file.

Merge branch 'experimental' into features/multimodal_opt

502dd7a

Update opto/optimizers/optoprime_v2.py

51dfc4b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Initial plan

978fdf6

Copilot AI mentioned this pull request Nov 10, 2025

[WIP] Add multi-modal optimizer and context support #54

Merged

allenanie and others added 4 commits November 10, 2025 17:17

Merge pull request #54 from AgentOpt/copilot/sub-pr-50

1fda4d7

[WIP] Add multi-modal optimizer and context support

Update opto/optimizers/optoprime_v2.py

34f9747

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update opto/optimizers/optoprime_v2.py

0118979

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update opto/optimizers/opro_v2.py

a32136e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

allenanie requested a review from Copilot November 10, 2025 22:20

Copilot started reviewing on behalf of allenanie November 10, 2025 22:20 View session

Copilot finished reviewing on behalf of allenanie November 10, 2025 22:23

Copilot AI reviewed Nov 10, 2025

View reviewed changes

allenanie added 2 commits December 27, 2025 00:41

fix test errors

8777528

updated google genai llm to turn litellm format history messages into…

2252ccd

… Gemini-compatible history.

amend

c171201

refactor and moved gemini input message history conversion to `ConversationHistory`

allenanie force-pushed the features/multimodal_opt branch from a62c202 to c171201 Compare December 27, 2025 15:27

allenanie force-pushed the experimental branch from ef542aa to c0a0282 Compare December 27, 2025 15:27

allenanie added 3 commits January 8, 2026 22:45

fixed a few bugs, multi-turn multi-modal optimizer is runnable end-to…

d9231c1

…-end now.

update tests to use the response API

2ae6d10

making the JSON extraction more robust. Adding a notebook for the demo.

9ea88c9

allenanie added 6 commits January 9, 2026 01:42

update oprov3

a72b446

move Content to backbone for better import to other optimziers

125d9a2

fix an Azure related bug

4a2682a

remove bug from __init__

59f0f32

add api_version explicitly to litellm

301ee12

Rename ConversationHistory to Chat because it is just a chat sess…

d4b536e

…ion management tool. Expanded some functionality on `Chat`.

allenanie added 9 commits January 20, 2026 22:22

Add jupyter visualization for multi-modal content rendering

130b4fc

fixed a few bugs

be419cc

update the import on jupyter notebook

839a7b9

have to keep mm_beta=True in llm.py so that Response API is the def…

080e301

…ault because the `backbone.py` and other code already migrated. Can set `mm_beta=False` to go back to completion API.

updating the parsing logic for Amazon Bedrock response API setup

adc405a

adding aws bedrock JSON format exemption for all optimizers

0e419b8

add None exception to bedrock model name check

46528bb

fix the test case

0cade07

update the dummy LLM to make the test pass

3cf760a

allenanie changed the title ~~[WIP] Multi-modal Optimizer + Context for Optimization~~ Multi-modal Optimizer + Context for Optimization Feb 6, 2026

Amelia added a fix to the newline continuation error after extracting…

c8e4d4f

… code from JSON.

Conversation

allenanie commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allenanie commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenanie commented Nov 10, 2025

Uh oh!

Copilot AI commented Nov 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allenanie commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenanie commented Dec 27, 2025

Uh oh!

allenanie commented Dec 27, 2025

Uh oh!

allenanie commented Jan 9, 2026

Uh oh!

allenanie commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allenanie commented Oct 22, 2025 •

edited

Loading

allenanie commented Dec 27, 2025 •

edited

Loading