AIChatKitMLX

Adds on-device Apple MLX inference to any app already using AIChatKit. Models are downloaded from Hugging Face Hub on first use and cached locally. Supports both text-only LLMs and vision-language models (VLMs). Runs on Metal GPU and Apple Neural Engine — no network calls during inference.

Platforms: macOS 14+ · iOS 17+
Language: Swift 5.10+
Requires: Apple Silicon (M-series Mac or A-series iPhone/iPad)

Requires AIChatKit. Add both packages to your target.
Do not add this target to builds that must run on Intel Macs or Simulator.

Installation

// Package.swift
.package(url: "https://github.com/NerdSnipe-Inc/AIChatKit",    from: "0.1.0"),
.package(url: "https://github.com/NerdSnipe-Inc/AIChatKitMLX", from: "0.1.0"),

// Target dependencies
.product(name: "AIChatCore", package: "AIChatKit"),
.product(name: "AIChatUI",   package: "AIChatKit"),    // if using ChatSession / ChatView
.product(name: "AIChatMLX",  package: "AIChatKitMLX"),

Quick start

import AIChatMLX
import AIChatUI

// Automatically picks the best model for the current device (see Model selection below)
let provider = MLXProvider()

@StateObject private var session = ChatSession(
    provider: provider,
    model: "",  // MLXProvider ignores the model string; pass anything
    options: ChatRequestOptions(systemPrompt: "You are a helpful assistant.")
)

MLXProvider is an actor. The model downloads and loads on the first stream() call, or you can pre-warm it explicitly with loadModel(progressHandler:).

Model selection

MLXProvider automatically selects a model based on the device's available RAM:

Device	Model	Type	Download size
macOS ≥ 16 GB RAM	`mlx-community/diffusiongemma-26B-A4B-it-4bit`	VLM (text + images)	~8–10 GB
macOS < 16 GB / iOS	`mlx-community/gemma-4-e4b-it-4bit`	LLM (text only)	~2–3 GB

The 26B model is a Mixture-of-Experts architecture with ~4B active parameters per forward pass — faster and leaner than a dense 26B model while retaining broad capability. The factory is selected automatically: VLMs load via MLXVLM, text models via MLXLLM.

// Check which model will be used on the current device
let modelId = MLXProvider.recommendedModelId()

// Named constants
MLXProvider.smallModelId  // gemma-4-e4b-it-4bit
MLXProvider.largeModelId  // diffusiongemma-26B-A4B-it-4bit

Showing download progress

Call loadModel(progressHandler:) before starting a conversation to surface progress in your UI. The download only happens once — subsequent calls return immediately if the model is already in memory.

try await provider.loadModel { progress in
    Task { @MainActor in
        self.downloadProgress = progress.fractionCompleted  // 0.0–1.0
    }
}
// Model is now resident; session.send() responds immediately

Custom model

// Any mlx-community model by Hub ID — factory is selected automatically
let provider = MLXProvider(modelId: "mlx-community/Qwen3-4B-4bit")

// Pre-downloaded local directory
let provider = MLXProvider(modelPath: URL(fileURLWithPath: "/path/to/model-dir"))

The correct factory (VLM or LLM) is chosen at load time based on the model's config.json. No manual factory selection is needed.

Vision input (VLMs)

When the large model is selected, the underlying ModelContainer supports image input via UserInput. This is available directly through the perform { context in } API on the container. VLM-specific features (image understanding, document analysis) are accessible when using MLXProvider as part of a tool-call flow or by working directly with the container.

Sampling options

MLXProvider(
    modelId:           MLXProvider.recommendedModelId(),
    maxTokens:         nil,      // nil = unlimited
    temperature:       0.6,
    topP:              1.0,
    repetitionPenalty: nil       // nil = disabled
)

Model cache location

Models are cached by the Hugging Face Swift library.

App state	Cache path
Sandboxed (App Store / entitlements)	`~/Library/Containers/<bundle-id>/Data/Library/Caches/huggingface/hub/`
Not sandboxed	`~/.cache/huggingface/hub/`

The cache is shared with the Python huggingface_hub library — models already downloaded via Python tools are found without re-downloading.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.githooks		.githooks
Sources/AIChatMLX		Sources/AIChatMLX
Tests/AIChatMLXTests		Tests/AIChatMLXTests
.gitignore		.gitignore
.spi.yml		.spi.yml
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AIChatKitMLX

Installation

Quick start

Model selection

Showing download progress

Custom model

Vision input (VLMs)

Sampling options

Model cache location

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AIChatKitMLX

Installation

Quick start

Model selection

Showing download progress

Custom model

Vision input (VLMs)

Sampling options

Model cache location

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages