Skip to content

[ROCm] Avoid unsupported BF16 llm_int8 registration#55

Open
austin1997 wants to merge 1 commit into
ROCm:paddle_hackthonfrom
austin1997:rocm-bf16-llm-int8-registration
Open

[ROCm] Avoid unsupported BF16 llm_int8 registration#55
austin1997 wants to merge 1 commit into
ROCm:paddle_hackthonfrom
austin1997:rocm-bf16-llm-int8-registration

Conversation

@austin1997

Copy link
Copy Markdown

PR Category

Custom Device

PR Types

Bug fixes

Description

This PR avoids advertising BF16 support for the CUDA-only llm_int8_linear GPU kernel on ROCm.

The existing kernel implementation only provides a CUDA path and the non-CUDA path throws Unimplemented at runtime. On ROCm, the BF16 dtype registration allowed BF16 inputs to dispatch into that unsupported path. This change keeps the kernel touch symbol available for the generated code, but only registers phi::bfloat16 for CUDA builds. ROCm now reports BF16 llm_int8_linear as an unregistered kernel instead of entering the CUDA-only implementation.

The Python test skip condition is also centralized and updated so llm_int8_linear tests are skipped on ROCm.

Validation:

  • env TARGET=SKYLAKEX ninja -j 160 paddle_python
  • ROCm BF16 llm_int8_linear repro now fails with NotFound for the BF16 kernel instead of Unimplemented from the CUDA-only implementation
  • python3.12 -m unittest -v test_llm_int8_linear.py (9 skipped on ROCm)
  • prek run --files paddle/phi/kernels/gpu/llm_int8_linear_kernel.cu test/quantization/test_llm_int8_linear.py
  • git diff --check

是否引起精度变化

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant