Add reset(to:), processedTokenCount, and implicit prefix caching by stikves · Pull Request #51 · apple/coreai-models

stikves · 2026-06-16T21:13:13Z

Extends the InferenceEngine protocol with partial KV cache reset and automatic prefix detection:

reset(to: tokenIndex): truncate KV cache to keep first N positions
processedTokenCount: read-only, tracks engine's current cache position
TokenHistory: shared utility for memcmp-based prefix detection
Pipelined engine: implicit prefix caching — engine auto-detects common prefix between consecutive generate() calls, only processes new tokens
Sequential/Static: implicit rewind when input is shorter than cache (full divergence triggers zero-fill, prefix extension rewinds counter)

All engines pass explicit reset parity test. Pipelined passes implicit prefix caching test (extend + diverge). Sequential/Static pass by wiring explicit reset.

Extends the InferenceEngine protocol with partial KV cache reset and automatic prefix detection: - reset(to: tokenIndex): truncate KV cache to keep first N positions - processedTokenCount: read-only, tracks engine's current cache position - TokenHistory: shared utility for memcmp-based prefix detection - Pipelined engine: implicit prefix caching — engine auto-detects common prefix between consecutive generate() calls, only processes new tokens - Sequential/Static: implicit rewind when input is shorter than cache (full divergence triggers zero-fill, prefix extension rewinds counter) All engines pass 20/20 explicit reset parity test. Pipelined passes 11/11 implicit prefix caching test (extend + diverge). Sequential/Static pass 10/11 (divergence auto-reset works, extend works via counter rewind).

carinapeng

Thank you!

stikves force-pushed the sukru/engine-partial-reset branch 2 times, most recently from c3f7d0a to a5c51ce Compare June 17, 2026 02:45

stikves marked this pull request as ready for review June 17, 2026 03:42

stikves requested review from alejandro-isaza, carinapeng, kevchengcodes and tjia1818 June 17, 2026 03:42

carinapeng reviewed Jun 17, 2026

View reviewed changes

Comment thread swift/Sources/CoreAILanguageModels/InferenceEngines/CoreAIPipelinedEngine.swift

stikves force-pushed the sukru/engine-partial-reset branch 2 times, most recently from 433d83b to 5482d42 Compare June 17, 2026 18:17

stikves mentioned this pull request Jun 17, 2026

Fix pipeline race condition: rotate all buffers by pipeline depth #53

Merged

stikves force-pushed the sukru/engine-partial-reset branch from 8623ada to 8ddd9ba Compare June 17, 2026 19:50

stikves force-pushed the sukru/engine-partial-reset branch from 8ddd9ba to 9bc7aa3 Compare June 18, 2026 00:02

Merge branch 'main' into sukru/engine-partial-reset

beaaf86

alejandro-isaza approved these changes Jun 18, 2026

View reviewed changes

carinapeng self-requested a review June 18, 2026 18:20

carinapeng approved these changes Jun 18, 2026

View reviewed changes

stikves merged commit 18cd896 into apple:main Jun 18, 2026
3 checks passed

stikves deleted the sukru/engine-partial-reset branch June 18, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reset(to:), processedTokenCount, and implicit prefix caching#51

Add reset(to:), processedTokenCount, and implicit prefix caching#51
stikves merged 2 commits into
apple:mainfrom
stikves:sukru/engine-partial-reset

stikves commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

carinapeng left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stikves commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

carinapeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stikves commented Jun 16, 2026 •

edited

Loading