Skip to content

feat: Adding support for gemma4 e2b models#1162

Merged
mkopcins merged 15 commits into
mainfrom
@mkopcins/gemma4
Jun 11, 2026
Merged

feat: Adding support for gemma4 e2b models#1162
mkopcins merged 15 commits into
mainfrom
@mkopcins/gemma4

Conversation

@mkopcins

@mkopcins mkopcins commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Description

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Test by running apps/llm app on llm screen (for text only model) and multimodal screen (for audio-vision-text model). Text model should work as any other llm model. Multimodal can process up-to-30sec audio chunks as well as image inputs, should be able to transcribe audio, describe pictures or similar.

Screenshots

Related issues

#1062

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@mkopcins mkopcins force-pushed the @mkopcins/gemma4 branch 5 times, most recently from bf62c0b to 938bc11 Compare May 25, 2026 09:41
@mkopcins mkopcins force-pushed the @mkopcins/gemma4 branch from ec22b0d to 66b3d24 Compare June 1, 2026 09:30
@msluszniak msluszniak marked this pull request as ready for review June 1, 2026 15:32
@msluszniak msluszniak added the feature PRs that implement a new feature label Jun 1, 2026
@msluszniak msluszniak linked an issue Jun 1, 2026 that may be closed by this pull request
@msluszniak msluszniak self-requested a review June 1, 2026 15:33
@mkopcins mkopcins force-pushed the @mkopcins/gemma4 branch from 12f147e to a606c79 Compare June 2, 2026 08:16
Comment thread packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.cpp Outdated
Comment thread packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.h Outdated
Comment thread packages/react-native-executorch/common/runner/encoders/audio_encoder.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/encoders/audio_encoder.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.cpp Outdated
Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated
Comment thread packages/react-native-executorch/src/controllers/LLMController.ts Outdated
msluszniak

This comment was marked as resolved.

msluszniak

This comment was marked as resolved.

@mkopcins mkopcins force-pushed the @mkopcins/gemma4 branch from 047315d to 25fb705 Compare June 9, 2026 10:53
Comment thread docs/docs/04-typescript-api/01-natural-language-processing/LLMModule.md Outdated
@mkopcins mkopcins force-pushed the @mkopcins/gemma4 branch from c41ec78 to 4b27994 Compare June 9, 2026 11:17
@mkopcins mkopcins force-pushed the @mkopcins/gemma4 branch from 4b27994 to 8038ffe Compare June 9, 2026 11:22
Comment thread packages/react-native-executorch/src/hooks/natural_language_processing/useLLM.ts Outdated
Comment thread docs/plans/2026-06-01-multimodal-prefiller-refactor-design.md Outdated
Comment thread docs/plans/2026-06-01-multimodal-prefiller-refactor.md Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not going to block the PR over this, however this is getting pretty similar to vision encoder, maybe we can lift something up

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe we should sanity check the inputs provided by user in setters here

Comment thread packages/react-native-executorch/common/runner/constants.h
Comment thread packages/react-native-executorch/common/runner/text_prefiller.cpp Outdated
Comment thread packages/react-native-executorch/src/hooks/natural_language_processing/useLLM.ts Outdated
@msluszniak msluszniak self-requested a review June 9, 2026 13:32
@msluszniak

This comment was marked as resolved.

msluszniak

This comment was marked as resolved.

msluszniak

This comment was marked as resolved.

@msluszniak msluszniak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needs to add some kind of symbol to messages with audio attached, now it is not clear for the user. Also, how big is the context in this model? It seems to be really small which makes it hard to keep conversation with this model using multimodal inputs. Another thing which is concerning is that the TTFT rapidly grows when we keep sending messages.

@msluszniak

Copy link
Copy Markdown
Member

Clicking on the send button does not block UI which is not intuitive for the user that something is already happening.

@mkopcins

mkopcins commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

context is 2048 tokens

@chmjkb chmjkb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll test the app on iOS

Comment thread packages/react-native-executorch/src/controllers/LLMController.ts Outdated
Comment thread packages/react-native-executorch/src/controllers/LLMController.ts Outdated
Comment thread packages/react-native-executorch/src/controllers/LLMController.ts Outdated
Comment thread packages/react-native-executorch/src/controllers/LLMController.ts Outdated
@chmjkb

chmjkb commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Something feels off about the demo app 😅

IMG_0157

@mkopcins mkopcins changed the title feat: @mkopcins/gemma4 feat: Adding support for gemma4 e2b models Jun 10, 2026

@msluszniak msluszniak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the latest state (ccb5526c1) across the native runner, TS layer, and the example app. Inline comments group into: confirmed correctness bugs — the prefiller return-handling (processMultimodalInput has no Error::Ok path + prefill() discards every helper's return; these are coupled), the PLE element-size shadowing that zeroes the per-chunk offset on long prompts, and the LLMModule imperative API dropping audioConfig — plus robustness/API gaps and nits. A few items are phrased as questions where I couldn't confirm intent (e.g. the audio-encoder scalar). Verified against the code; happy to discuss any of them.

Comment thread packages/react-native-executorch/common/runner/multimodal_prefiller.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/multimodal_prefiller.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/multimodal_prefiller.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/multimodal_prefiller.h Outdated
Comment thread packages/react-native-executorch/src/controllers/LLMController.ts Outdated
Comment thread apps/llm/app/multimodal_llm/index.tsx
setError(`Recording problems: ${result.message}`);
return;
}
setIsRecording(true);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two recording-robustness issues: (1) this result.status === 'error' branch is effectively dead — with no file output, AudioRecorder.start() always returns { status: 'success' }, so a real native start failure still flips the UI to 'recording' and yields empty audio; register recorder.current.onError(...) instead. (2) onAudioReady pushes a Float32Array per ~0.1s with no cap, so a long recording grows memory unbounded and hands a huge buffer to sendMessage despite the model's ~30s window — enforce a max-duration stop (and reject oversized decoded buffers in loadAudioFromUrl).

};

const startRecording = async () => {
if (!hasMicPermission) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mic permission is requested once on mount and this is a dead-end on denial: the button isn't disabled when !hasMicPermission, so tapping it only sets 'enable it in Settings' with no way to act, and there's no re-check or Linking.openSettings(). Re-request inside startRecording (await requestRecordingPermissions() and update state), offer Linking.openSettings() when Denied, and/or disable the button when permission is known-denied.

Comment thread apps/llm/app/multimodal_llm/index.tsx

@chmjkb chmjkb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides what Mateusz requested, i think everything is fine on my end

Comment thread packages/react-native-executorch/common/runner/sampler.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.cpp Outdated
Comment thread packages/react-native-executorch/common/runner/sampler.cpp Outdated

@msluszniak msluszniak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all issues that I raised are now addressed. Thank you for working on this one Mateusz. 🚀

@mkopcins mkopcins merged commit 7d145b5 into main Jun 11, 2026
5 checks passed
@mkopcins mkopcins deleted the @mkopcins/gemma4 branch June 11, 2026 08:51
mkopcins added a commit that referenced this pull request Jun 11, 2026
<!-- Provide a concise and descriptive summary of the changes
implemented in this PR. -->

- [ ] Yes
- [x] No

- [ ] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [x] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

- [x] iOS
- [x] Android

Test by running apps/llm app on llm screen (for text only model) and
multimodal screen (for audio-vision-text model). Text model should work
as any other llm model. Multimodal can process up-to-30sec audio chunks
as well as image inputs, should be able to transcribe audio, describe
pictures or similar.

<!-- Add screenshots here, if applicable -->

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [ ] My changes generate no new warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PRs that implement a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemma4 support

4 participants