feat: add calibrate() to OutlierTurboQuant for data-driven channel split by brosequist · Pull Request #64 · TheTom/turboquant_plus

brosequist · 2026-04-01T20:24:29Z

Summary

Adds OutlierTurboQuant.calibrate(calibration_vectors) for data-driven outlier channel detection.
Computes per-channel RMS across the calibration set and marks channels whose RMS exceeds 3× the median as outlier channels, updating the compressor's split in place.
Follows the dynamic-threshold approach from the LLM.int8() / SmoothQuant literature.

Test plan

pytest tests/test_outlier.py — includes tests for calibrate() updating the channel mask and handling all-inlier / all-outlier edge cases.

🤖 Generated with Claude Code

Previously the outlier/inlier channel split was set at construction time and never adjusted. calibrate(calibration_vectors) now computes per-channel RMS, flags channels whose RMS exceeds 3× the median as outliers, and updates the split on the compressor — matching the dynamic-threshold approach described in the LLM.int8() and SmoothQuant literature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Origin's April upstream-sync rebase interleaved two changes that left the Vulkan turbo3 KV path broken: * ggml-org/llama.cpp upstream PR #21572 (1f30ac0) moved fp16 RTE rounding to a runtime SPIR-V patch and dropped the _rte shader variants plus rte.glsl itself. * TheTom/llama-cpp-turboquant PR TheTom#62 (ff8bb73) added turbo3 KV support against a base that still had those variants. After the rebase, the tree had dangling cpy_f32_*_rte_len / _data references, a two-arg SET_ROWS macro called with one arg, a #include "rte.glsl" in a shader whose header no longer exists, and MMQ shader variants generated for turbo3_0 even though the flash_attn MMQ path has no turbo3 code. The result was that ggml-vulkan.cpp failed to compile on a clean checkout (spirv-headers + all of the above) and the shader-gen emitted garbage variants. Separately, turbo3 flash-attn pipelines were only wired up for FA_SCALAR. On a coopmat-capable device (e.g. RADV on a 7900 XTX) the tuning heuristic picks FA_COOPMAT1 for most shapes, which landed in ggml_vk_flash_attn with an uninitialized pipeline (wg_denoms={0,0,0}) and tripped the Br == wg_denoms[0] assertion as soon as a prefill ubatch was dispatched. End-to-end llama-cli on Vulkan + -ctk turbo3 aborted on the first real forward pass. Changes: * Drop the if (float_controls_rte_fp16) / else branches around cpy_f32_quant pipeline creation and collapse SET_ROWS to a single variant, matching upstream post-1f30ac0ce. * Remove the #include "rte.glsl" from copy_to_quant.comp. * Skip the MMQ flash_attn shader variant for turbo3_0 in the shader generator (no MMQ code path for it). * Register CREATE_FA(GGML_TYPE_TURBO3_0, turbo3_0, FA_COOPMAT1, _cm1) and the _cm2 counterpart alongside the other quant types. Verified on AMD 7900 XTX (gfx1100 / RADV NAVI31, ROCm 7.2.1 + Vulkan 1.4.341, spirv-headers 1.4.341.0): * Full HIP+Vulkan build is clean with no shader compile errors. * test-backend-ops -o SET_ROWS -b Vulkan0 : 147/147 * test-backend-ops -o FLASH_ATTN_EXT -b Vulkan0 -p type_KV=turbo3 : 530 cases pass (previously aborted on case 3). * test-backend-ops -o FLASH_ATTN_EXT -b ROCm0 -p type_KV=turbo3 : still green (no HIP regression). * llama-cli on Qwen3-8B Q4_K_M with -ngl 99 -fa on -ctk turbo3 -ctv turbo3 on Vulkan0 no longer aborts. The remaining head_dim=128 correctness issue on the Vulkan turbo3 decode path is pre-existing and orthogonal to this change. llama-bench on Qwen3.5-27B Q4_K_M, 7900 XTX OC, HIP backend: F16 tg128=20.98 turbo3 tg128=20.13 turbo4 tg128=20.17 Refs: TheTom/llama-cpp-turboquant issues TheTom#50, TheTom#64, TheTom#81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add calibrate() to OutlierTurboQuant for data-driven channel split#64

feat: add calibrate() to OutlierTurboQuant for data-driven channel split#64
brosequist wants to merge 1 commit intoTheTom:mainfrom
brosequist:feat/outlier-calibrate

brosequist commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brosequist commented Apr 1, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant