Skip to content

Allow sampler to take in images#1103

Merged
copybara-service[bot] merged 1 commit intomainfrom
test_870890814
Mar 8, 2026
Merged

Allow sampler to take in images#1103
copybara-service[bot] merged 1 commit intomainfrom
test_870890814

Conversation

@copybara-service
Copy link
Copy Markdown

@copybara-service copybara-service Bot commented Feb 16, 2026

Allow sampler to take in images

Verification: https://colab.research.google.com/gist/abheesht17/e3e31d7ff5bb302928494dcf48b77e5c/tunix-vlm-text-generation.ipynb

In order to allow the text sampler to take in images, we only need to take care of the pre-fill phase, because the sampler will be text, image-in, text-out.

We do two things:

  • Call the image processor inside the sampler call method so as to process the images.
  • We add a method to the Gemma 3 model class - get_attention_mask. If the model has this class, it will be used inside the sampler during the pre-fill phase. It is necessary for vision models to have this class if any custom token processing is needed.

PiperOrigin-RevId: 880429201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant