Staging to Main#69
Merged
Merged
Conversation
…old start
User-facing
- HTML Challenge prompt library: 32 curated single-page prompts across 4
categories (Games / Simulations / Tech Demos / Creative Tools) behind an
Option-C tabbed picker (category tabs + search + card grid). New
challengePromptLibrary.ts + ChallengePromptLibraryModal.tsx, wired into
HtmlChallengeTab via a "Prompt library" button.
- GGUF MTP speculative-decoding toggle in the launch modal (FU-074) — was
backend-only (FU-047) with no UI; now shown for MTP-GGUF + llama.cpp.
- Qwen3.5/3.6 re-tagged multimodal across the catalog (FU-072) — upstream
unified them onto Qwen3_5ForConditionalGeneration with vision_config;
FU-040's text-only assumption was stale. Runtime supportsVision stays
per-engine gated, so badges never produce a broken Attach-image button.
Version + upstream deps
- Bump to 0.9.3 (package.json, pyproject, Cargo, tauri.conf).
- turboquant-mlx-full floor >=0.5.0 (FU-069, parallel expert prefetch).
- mlx-vlm floor >=0.5.0 (FU-063).
- ggml-org Qwen3.6-{27B,35B-A3B}-GGUF non-MTP catalog rows (FU-064).
MLX speculative decoding — genuine engagement (were silently falling back)
- FU-075: mlx_worker_lifecycle imported the removed top-level
configure_full_attention_split → ImportError disabled DFlash/DDTree/MTPLX
for everyone. Use the dflash-mlx 0.1.5 target_ops adapter; split only on
hybrid_gdn families.
- FU-076: MTP tensor probe missed top-level mtp.* keys (Qwen3.5/3.6) → MTPLX
never selected. Match a bare "mtp." prefix.
- FU-077: harden install-mtplx.sh verify to import the server module +
auto-retry; a truncated venv (missing numpy/fastapi/...) passed before.
- FU-078: MtplxEngine handed MTPLX a bare repo id; resolve the local HF
snapshot dir when the candidate isn't an on-disk path.
- FU-071: DDTree availability probe checked a pre-0.1.5 symbol name.
Startup load time (FU-080)
- Backend cold import 2.6s -> ~0.85s: cache-strategy is_available() probes
imported diffusers.hooks (pulling torch) at startup. New _diffusers_probe
gates on importlib.metadata version (no import); real import stays lazy.
torch/diffusers/mlx no longer in sys.modules after import backend_service.app.
Test infrastructure
- E2E suite: hardened DFlash + MTPLX checks (asserted structured engagement,
not note substrings that the fallback note also matched); net-new DDTree,
GGUF-MTP, and catalog-vision checks. CLI load surfaces treeBudget /
dflashDraftModel / visionEnabled.
- cache-strategy matrix: classify missing-download as SKIP not FAIL (FU-070);
fix tok/s capture + dflashAcceptanceRate; MTPLX cell targets canonical
Qwen/Qwen3.5-4B (FU-073).
- Startup-import-purity guards + version-probe + all the above unit tests.
Validation: pytest green, vitest 453, tsc clean, cache-matrix 11/11,
E2E 39/39 (every spec-dec lane genuinely engaged).
…n-any-HF, connect presets)
Five local-AI-app parity features to close gaps vs Ollama / LM Studio,
each reusing existing infra rather than adding new heavy subsystems.
1. Out-of-box RAG — one-click nomic-embed-text-v1.5 install
(/api/setup/install-embedding-model) + /api/rag/status. Chat doc
panel shows vector vs lexical mode and offers the upgrade
(RagStatusBadge). Retrieval was silently lexical-only without a model.
2. Server "Connect your app" presets — base_url + Python/JS snippets +
Open WebUI / Continue.dev / Ollama presets in ServerTab.
3. Ollama-compatible API — /api/{chat,generate,tags,show,version,
embeddings,embed} layered over the existing OpenAI generation path,
translating SSE to NDJSON. Inherits auth + format->json_schema.
Unlocks Ollama-preset tools (Open WebUI, Continue, Raycast, n8n).
4. Import Ollama / LM Studio models by reference — scans the Ollama blob
store (manifest -> blob) and LM Studio cache, symlinks into a managed
imported-models dir (no re-download), auto-registers for library scan.
5. Run any Hugging Face repo — /api/models/resolve-hf classifies backend,
picks the GGUF file, and infers context + capabilities from the repo's
own metadata; loads with canonicalRepo set to bypass the FU-041
catalog fuzzy-match that mis-tagged off-catalog models (RunFromHuggingFace).
Tests: +42 backend (test_embedding_setup, test_hf_resolve, test_model_import,
+ Ollama shim cases in test_backend_service); vitest 453 green; tsc clean;
i18n 100%; full E2E suite 8/8 phases pass incl. new phase-0 checks.
Known follow-ups: stage llama-embedding binary for packaged builds (#1);
Windows symlink privilege (#4); raw-safetensors repos flagged vLLM/CUDA (#5).
Feature/competitor parity wins
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.