Skip to content

feat: Gemma 4 on-device inference via LiteRT-LM (Android)#4

Open
sparkleMing wants to merge 2 commits into
mainfrom
feat/gemma4-litert-lm
Open

feat: Gemma 4 on-device inference via LiteRT-LM (Android)#4
sparkleMing wants to merge 2 commits into
mainfrom
feat/gemma4-litert-lm

Conversation

@sparkleMing

Copy link
Copy Markdown
Collaborator

Summary

Add support for running Gemma 4 models fully on-device on Android using the official LiteRT-LM Kotlin API.

What's new

Android native layer

  • LiteRtLmPlugin.kt: Flutter platform channel plugin wrapping LiteRT-LM. Supports queue-based inference, streaming tokens, tool calls, thinking mode, model download via OkHttp, and M4A→PCM WAV audio conversion via MediaCodec.

Dart layer

  • GemmaLocalClient: LLMClient implementation using platform channels. Acquires a global inference lock before each request.
  • GemmaModelManager: Engine lifecycle manager. Vision/audio backends enabled strictly on demand — only when the request contains image/audio content. Engine fully torn down and rebuilt when backend config changes. Rebuild always happens after acquiring the lock to prevent teardown during active inference.

Provider integration

  • New typeGemmaLocal provider type with model list (gemma-4-e2b, gemma-4-e4b)
  • Model download UI in setup and settings pages (Android only)
  • No API key or base URL required; LLM data sharing consent skipped for on-device models

Other fixes

  • asset_analysis_tool: Gemma 4 uses JPEG + 896px max side to avoid LiteRT-LM patch count overflow. Non-Gemma path unchanged.
  • pkm_skill / timeline_card_skill: Use state.metadata factId as fallback when model-provided fact_id is unreliable.

Dependency upgrades

  • drift 2.30 to 2.32.1, sqlite3_flutter_libs 0.5 to 0.6, drift_flutter 0.2 to 0.3

- Add LiteRtLmPlugin (Kotlin) wrapping official LiteRT-LM API with
  queue-based inference, download support, and audio PCM conversion
- Add GemmaLocalClient (Dart) with per-request engine init/teardown
- Add GemmaModelManager with on-demand backend init: vision/audio
  backends only enabled when request contains image/audio content,
  engine rebuilt (with full teardown) when config changes
- Engine rebuild happens after acquiring inference lock to prevent
  teardown while another inference is in progress
- Add gemma_local provider type with model list (gemma-4-e2b/e4b)
- Add download UI in model config pages (Android only)
- Skip LLM data sharing consent for on-device models
- asset_analysis_tool: use JPEG + 896px cap for Gemma 4 to avoid
  LiteRT-LM patch count overflow; non-Gemma path unchanged
- pkm_skill/timeline_card_skill: use state factId as fallback when
  model-provided fact_id is unreliable
- Upgrade drift 2.30→2.32.1, sqlite3_flutter_libs 0.5→0.6,
  drift_flutter 0.2→0.3
@matthewchan-g

Copy link
Copy Markdown

Consider setting experimental flag enableConversationConstrainedDecoding to true to see if that helps with function calls generated by Gemma 4.

@sparkleMing

Copy link
Copy Markdown
Collaborator Author

这个PR主要因为当时测试下来难以保证全链路的Agent体验,所以没有合并进main分支

@N3tn1la

N3tn1la commented Jun 11, 2026

Copy link
Copy Markdown

Hi! Just wanted to check in — is this PR still being actively worked on? Any plans to merge it into main? Would love to see on-device Gemma inference land in the app!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants