feat: Gemma 4 on-device inference via LiteRT-LM (Android) by sparkleMing · Pull Request #4 · memex-lab/memex

sparkleMing · 2026-04-08T04:31:08Z

Summary

Add support for running Gemma 4 models fully on-device on Android using the official LiteRT-LM Kotlin API.

What's new

Android native layer

LiteRtLmPlugin.kt: Flutter platform channel plugin wrapping LiteRT-LM. Supports queue-based inference, streaming tokens, tool calls, thinking mode, model download via OkHttp, and M4A→PCM WAV audio conversion via MediaCodec.

Dart layer

GemmaLocalClient: LLMClient implementation using platform channels. Acquires a global inference lock before each request.
GemmaModelManager: Engine lifecycle manager. Vision/audio backends enabled strictly on demand — only when the request contains image/audio content. Engine fully torn down and rebuilt when backend config changes. Rebuild always happens after acquiring the lock to prevent teardown during active inference.

Provider integration

New typeGemmaLocal provider type with model list (gemma-4-e2b, gemma-4-e4b)
Model download UI in setup and settings pages (Android only)
No API key or base URL required; LLM data sharing consent skipped for on-device models

Other fixes

asset_analysis_tool: Gemma 4 uses JPEG + 896px max side to avoid LiteRT-LM patch count overflow. Non-Gemma path unchanged.
pkm_skill / timeline_card_skill: Use state.metadata factId as fallback when model-provided fact_id is unreliable.

Dependency upgrades

drift 2.30 to 2.32.1, sqlite3_flutter_libs 0.5 to 0.6, drift_flutter 0.2 to 0.3

- Add LiteRtLmPlugin (Kotlin) wrapping official LiteRT-LM API with queue-based inference, download support, and audio PCM conversion - Add GemmaLocalClient (Dart) with per-request engine init/teardown - Add GemmaModelManager with on-demand backend init: vision/audio backends only enabled when request contains image/audio content, engine rebuilt (with full teardown) when config changes - Engine rebuild happens after acquiring inference lock to prevent teardown while another inference is in progress - Add gemma_local provider type with model list (gemma-4-e2b/e4b) - Add download UI in model config pages (Android only) - Skip LLM data sharing consent for on-device models - asset_analysis_tool: use JPEG + 896px cap for Gemma 4 to avoid LiteRT-LM patch count overflow; non-Gemma path unchanged - pkm_skill/timeline_card_skill: use state factId as fallback when model-provided fact_id is unreliable - Upgrade drift 2.30→2.32.1, sqlite3_flutter_libs 0.5→0.6, drift_flutter 0.2→0.3

matthewchan-g · 2026-04-16T00:44:14Z

Consider setting experimental flag enableConversationConstrainedDecoding to true to see if that helps with function calls generated by Gemma 4.

sparkleMing · 2026-05-11T01:48:57Z

这个PR主要因为当时测试下来难以保证全链路的Agent体验，所以没有合并进main分支

N3tn1la · 2026-06-11T15:05:37Z

Hi! Just wanted to check in — is this PR still being actively worked on? Any plans to merge it into main? Would love to see on-device Gemma inference land in the app!

sparkleMing added 2 commits April 8, 2026 12:29

docs: add Gemma on-device provider to README tables

e5a19f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Gemma 4 on-device inference via LiteRT-LM (Android)#4

feat: Gemma 4 on-device inference via LiteRT-LM (Android)#4
sparkleMing wants to merge 2 commits into
mainfrom
feat/gemma4-litert-lm

sparkleMing commented Apr 8, 2026

Uh oh!

matthewchan-g commented Apr 16, 2026

Uh oh!

sparkleMing commented May 11, 2026

Uh oh!

N3tn1la commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants