feat: ExecuTorch 1.3 with MLX (iOS) and Vulkan (Android) backends + Gemma 4 E2B by NorbertKlockiewicz · Pull Request #1223 · software-mansion/react-native-executorch

NorbertKlockiewicz · 2026-06-09T10:56:36Z

Description

Bumps ExecuTorch to 1.3 and adds two GPU backends with Gemma 4 E2B support:

MLX (iOS / Apple GPU) — new backend, with metadata-driven chunked prefill. The MLX forward is exported with a sliding-window cap on the sequence dimension and a one-shot prefill spikes Metal memory, so MLX models are prefilled in steps of the forward's declared max input length (read from the method metadata). Non-MLX backends keep the original one-shot path.
Vulkan (Android GPU) — Gemma 4 E2B now runs on Vulkan. The prebuilt libexecutorch.so (arm64-v8a, x86_64) is rebuilt from the labs 1.3 fork with the Gemma4 Vulkan support: the aten.rms_norm lowering and the Gemma SDPA shaders, ported onto 1.3's tile-load helper API with the DHSB Q/K/V layout the Gemma4 export uses.

models.llm.gemma4_e2b is registered with mlx / xnnpack / vulkan variants and defaults to MLX on iOS and Vulkan on Android.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Build and run the LLM example app (apps/llm) on a physical device (Vulkan/MLX need a real GPU — not the simulator/emulator).
In the model picker, select Gemma 4 E2B.
Send a prompt and confirm coherent generation:
- iOS → runs on the MLX backend.
- Android → runs on the Vulkan backend.
Confirm generation does not stop immediately after prefill and produces multiple tokens.

Screenshots

Related issues

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

The vulkan gemma won't work until @mkopcins PR is merged.

Adds 'mlx' to the Backend union and backend-resolution order, and routes MLX models through a chunked prefill path. The MLX backend's forward is exported with a sliding-window cap on the sequence dimension and a one-shot prefill spikes Metal memory, so prefill is done in steps of the forward's declared max input length, read from the method metadata (input_tensor_meta sizes) rather than a fixed constant. Non-MLX backends pass a chunk size of 0 and keep the original one-shot path unchanged. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rebuilt the iOS xcframework and static libs and the Android libexecutorch.so (arm64-v8a, x86_64) from the labs ExecuTorch fork: ExecuTorch 1.3 with the MLX backend enabled for iOS and the Vulkan backend enabled for Android. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Registers gemma4_e2b under models.llm with mlx/xnnpack/vulkan variants served from the react-native-executorch-gemma-4 HF repo. Platform defaults select MLX (Apple GPU) on iOS and XNNPACK on Android, where MLX is unavailable. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…budget Registers gemma4_e2b with mlx/xnnpack/vulkan variants; Android now defaults to Vulkan (verified coherent end-to-end). For dynamic-shape PTEs the text runner now derives the generation budget from get_max_context_len rather than get_max_seq_len (the per-call decoder chunk size), which previously resolved max_new_tokens to ~0 and ended generation immediately after prefill. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rebuilt libexecutorch.so (arm64-v8a, x86_64) from the labs ExecuTorch 1.3 fork with the Gemma4 Vulkan support: aten.rms_norm lowering and the SDPA shaders, ported onto 1.3's tile-load helper API with the DHSB Q/K/V layout the Gemma4 export uses. Verified coherent generation on device. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… Multimodal category Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@mkopcins

…emma 4 E2B (#1223) Bumps ExecuTorch to 1.3 and adds two GPU backends with Gemma 4 E2B support: - **MLX (iOS / Apple GPU)** — new backend, with metadata-driven chunked prefill. The MLX `forward` is exported with a sliding-window cap on the sequence dimension and a one-shot prefill spikes Metal memory, so MLX models are prefilled in steps of the forward's declared max input length (read from the method metadata). Non-MLX backends keep the original one-shot path. - **Vulkan (Android GPU)** — Gemma 4 E2B now runs on Vulkan. The prebuilt `libexecutorch.so` (arm64-v8a, x86_64) is rebuilt from the labs 1.3 fork with the Gemma4 Vulkan support: the `aten.rms_norm` lowering and the Gemma SDPA shaders, ported onto 1.3's tile-load helper API with the DHSB Q/K/V layout the Gemma4 export uses. `models.llm.gemma4_e2b` is registered with `mlx` / `xnnpack` / `vulkan` variants and defaults to **MLX on iOS** and **Vulkan on Android**. - [ ] Yes - [x] No - [x] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [x] Other (chores, tests, code style improvements etc.) - [x] iOS - [x] Android 1. Build and run the LLM example app (`apps/llm`) on a physical device (Vulkan/MLX need a real GPU — not the simulator/emulator). 2. In the model picker, select **Gemma 4 E2B**. 3. Send a prompt and confirm coherent generation: - iOS → runs on the MLX backend. - Android → runs on the Vulkan backend. 4. Confirm generation does not stop immediately after prefill and produces multiple tokens.   - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [x] My changes generate no new warnings The vulkan gemma won't work until @mkopcins PR is merged. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

NorbertKlockiewicz changed the title ~~@nk/mlx support~~ feat: ExecuTorch 1.3 with MLX (iOS) and Vulkan (Android) backends + Gemma 4 E2B Jun 9, 2026

NorbertKlockiewicz marked this pull request as ready for review June 9, 2026 16:36

NorbertKlockiewicz requested a review from chmjkb June 10, 2026 09:48

NorbertKlockiewicz marked this pull request as draft June 10, 2026 12:09

NorbertKlockiewicz and others added 7 commits June 11, 2026 14:11

deps: bump executorch to 1.3 and add mlx libraries

d48c0da

Consolidate Gemma 4 E2B tokenizers and pin MLX multimodal to HF URLs

2cbd555

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

NorbertKlockiewicz force-pushed the @nk/mlx-support branch from 81c6113 to 2cbd555 Compare June 11, 2026 12:55

NorbertKlockiewicz marked this pull request as ready for review June 11, 2026 12:55

msluszniak reviewed Jun 11, 2026

View reviewed changes

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated

Address review: single Gemma 4 E2B export with platform defaults, LLM…

5e6cd00

… Multimodal category Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mkopcins approved these changes Jun 11, 2026

View reviewed changes

msluszniak approved these changes Jun 11, 2026

View reviewed changes

mkopcins merged commit 140bf84 into main Jun 11, 2026
4 checks passed

mkopcins deleted the @nk/mlx-support branch June 11, 2026 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ExecuTorch 1.3 with MLX (iOS) and Vulkan (Android) backends + Gemma 4 E2B#1223

feat: ExecuTorch 1.3 with MLX (iOS) and Vulkan (Android) backends + Gemma 4 E2B#1223
mkopcins merged 8 commits into
mainfrom
@nk/mlx-support

NorbertKlockiewicz commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NorbertKlockiewicz commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NorbertKlockiewicz commented Jun 9, 2026 •

edited

Loading