expert-caching

Here is 1 public repository matching this topic...

szibis / MLX-Flash

Run AI models too large for your Mac's memory — at near-full speed. Intelligent expert caching, speculative execution, and 15+ research techniques for MoE inference on Apple Silicon.

python macos rust machine-learning inference moe quantization mlx speculative-execution mixture-of-experts memory-optimization apple-silicon llm metal-gpu ssd-streaming expert-caching

Updated Apr 1, 2026
Python

Improve this page

Add a description, image, and links to the expert-caching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the expert-caching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expert-caching

Here is 1 public repository matching this topic...

szibis / MLX-Flash

Improve this page

Add this topic to your repo