Conversation
There was a problem hiding this comment.
Pull request overview
Addresses a reported heap buffer overflow in the Phi-4 vision preprocessing pipeline by preventing non-RGB (notably CMYK JPEG) inputs from propagating into transforms that assume 3-channel data.
Changes:
- Add channel-count validation in
Phi4VisionDynamicPreprocessto reject non-3-channel tensors. - Add JPEG decoder validation to reject unsupported JPEG color spaces based on
output_components. - Add a Python regression test that constructs a CMYK JPEG and asserts the pipeline rejects it.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
test/test_pp_api.py |
Adds a security regression test to ensure CMYK JPEG inputs are rejected. |
shared/api/image_transforms_phi_4.hpp |
Rejects non-3-channel tensors at the Phi-4 dynamic preprocess entry point. |
operators/vision/image_decoder.hpp |
Rejects JPEGs that decode to an unexpected channel count (e.g., CMYK=4). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Start decompression | ||
| jpeg_start_decompress(&cinfo); | ||
|
|
||
| // Security: reject non-RGB/grayscale color spaces (e.g. CMYK has 4 channels). | ||
| // Allowing other channel counts propagates attacker-controlled tensor shapes downstream, | ||
| // causing heap overflows in consumers that assume 3 channels (CVE-class: CWE-122). | ||
| if (cinfo.output_components != 3 && cinfo.output_components != 1) { | ||
| jpeg_destroy_decompress(&cinfo); | ||
| return {kOrtxErrorInvalidArgument, | ||
| "[ImageDecoder]: Unsupported JPEG color space. Only RGB and grayscale are supported."}; |
There was a problem hiding this comment.
The JPEG path allows output_components == 1 (grayscale), but DecodePNG always returns 3 channels and downstream code (e.g. DecodeImage BGR swap logic and many vision transforms) assumes 3-channel data. Returning a 1-channel tensor here can lead to out-of-bounds access/corruption when later code indexes as if channels==3. Consider forcing JPEG decode output to RGB (3 channels) or rejecting grayscale JPEGs to keep the output shape consistent and safe.
| // Start decompression | |
| jpeg_start_decompress(&cinfo); | |
| // Security: reject non-RGB/grayscale color spaces (e.g. CMYK has 4 channels). | |
| // Allowing other channel counts propagates attacker-controlled tensor shapes downstream, | |
| // causing heap overflows in consumers that assume 3 channels (CVE-class: CWE-122). | |
| if (cinfo.output_components != 3 && cinfo.output_components != 1) { | |
| jpeg_destroy_decompress(&cinfo); | |
| return {kOrtxErrorInvalidArgument, | |
| "[ImageDecoder]: Unsupported JPEG color space. Only RGB and grayscale are supported."}; | |
| // Force libjpeg to output 3-channel RGB regardless of the original color space | |
| // (e.g., grayscale and YCbCr will be expanded/converted to RGB). This keeps the | |
| // output tensor shape consistent with PNG decoding and downstream 3-channel | |
| // assumptions (H x W x 3). | |
| cinfo.out_color_space = JCS_RGB; | |
| // Start decompression | |
| jpeg_start_decompress(&cinfo); | |
| // Security: reject non-3-channel output. Allowing other channel counts propagates | |
| // attacker-controlled tensor shapes downstream, causing heap overflows in | |
| // consumers that assume 3 channels (CVE-class: CWE-122). | |
| if (cinfo.output_components != 3) { | |
| jpeg_destroy_decompress(&cinfo); | |
| return {kOrtxErrorInvalidArgument, | |
| "[ImageDecoder]: Unsupported JPEG color space. Decoder must produce RGB (3 channels)."}; |
| // Security: reject non-RGB/grayscale color spaces (e.g. CMYK has 4 channels). | ||
| // Allowing other channel counts propagates attacker-controlled tensor shapes downstream, | ||
| // causing heap overflows in consumers that assume 3 channels (CVE-class: CWE-122). | ||
| if (cinfo.output_components != 3 && cinfo.output_components != 1) { | ||
| jpeg_destroy_decompress(&cinfo); | ||
| return {kOrtxErrorInvalidArgument, | ||
| "[ImageDecoder]: Unsupported JPEG color space. Only RGB and grayscale are supported."}; | ||
| } | ||
|
|
There was a problem hiding this comment.
There is an early-return error path later in this function (if (srcManager.extError != kOrtxOK) return ...;) that exits without calling jpeg_finish_decompress/jpeg_destroy_decompress, which can leak libjpeg resources. Since this change adds a new early return with cleanup, it would be good to apply the same cleanup pattern to the existing extError return as well (e.g., ensure jpeg_destroy_decompress(&cinfo) runs on all return paths after jpeg_create_decompress).
| // Security: reject non-RGB inputs (e.g. CMYK JPEGs decoded as 4-channel tensors). | ||
| // Downstream allocation is hardcoded to 3 channels, so a 4-channel input would cause | ||
| // memcpy to write past the buffer end, enabling heap corruption / RCE (CWE-122). | ||
| int64_t c = ts_image.Shape()[2]; | ||
| if (c != 3) { | ||
| return {kOrtxErrorInvalidArgument, | ||
| "[Phi4VisionProcessor]: Expected 3-channel RGB input, got " + std::to_string(c) + " channels"}; | ||
| } |
There was a problem hiding this comment.
This check protects Phi4VisionDynamicPreprocess, but Phi4VisionProcessor::Compute in the same header still hardcodes 3 channels when allocating/copying tensors. For defense-in-depth (and to match the PR’s security goal), consider adding the same channels == 3 validation in Phi4VisionProcessor::Compute so it remains safe even if a non-RGB tensor reaches it through another path.
| f.write(buf.getvalue()) | ||
| tmp_path = f.name | ||
| try: | ||
| with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"): |
There was a problem hiding this comment.
The test uses assertRaises(Exception), which is very broad and can mask unrelated failures (e.g., file/JSON issues) while still passing. It would be more reliable to assert a narrower exception type (e.g., the RuntimeError raised by the pybind layer) and/or validate the error message (e.g., via assertRaisesRegex) to ensure the failure is specifically due to CMYK rejection.
| with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"): | |
| with self.assertRaises(RuntimeError, msg="CMYK JPEG must be rejected, not silently overflow the heap"): |
| proc = pp_api.create_processor(phi4_proc_path) | ||
| buf = io.BytesIO() | ||
| Image.new("CMYK", (64, 64), color=(128, 64, 32, 255)).save(buf, format="JPEG") | ||
| with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f: | ||
| f.write(buf.getvalue()) | ||
| tmp_path = f.name | ||
| try: | ||
| with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"): | ||
| images = pp_api.load_images([tmp_path]) | ||
| pp_api.image_pre_process(proc, images) | ||
| finally: | ||
| os.unlink(tmp_path) |
There was a problem hiding this comment.
This test creates a native processor via pp_api.create_processor(...) but never calls pp_api.delete_object(proc), so the underlying C++ object may leak for the duration of the test run. Consider using pp_api.ImageProcessor(phi4_proc_path) (which deletes in __del__) or explicitly deleting the processor handle in the finally block.
| proc = pp_api.create_processor(phi4_proc_path) | |
| buf = io.BytesIO() | |
| Image.new("CMYK", (64, 64), color=(128, 64, 32, 255)).save(buf, format="JPEG") | |
| with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f: | |
| f.write(buf.getvalue()) | |
| tmp_path = f.name | |
| try: | |
| with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"): | |
| images = pp_api.load_images([tmp_path]) | |
| pp_api.image_pre_process(proc, images) | |
| finally: | |
| os.unlink(tmp_path) | |
| proc = None | |
| buf = io.BytesIO() | |
| Image.new("CMYK", (64, 64), color=(128, 64, 32, 255)).save(buf, format="JPEG") | |
| with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f: | |
| f.write(buf.getvalue()) | |
| tmp_path = f.name | |
| try: | |
| proc = pp_api.create_processor(phi4_proc_path) | |
| with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"): | |
| images = pp_api.load_images([tmp_path]) | |
| pp_api.image_pre_process(proc, images) | |
| finally: | |
| os.unlink(tmp_path) | |
| if proc is not None: | |
| pp_api.delete_object(proc) |
|
Thanks for the fix! Have we tested that E2E still works fine with ORT GenAI? If not, the simplest way is to create a dev branch, change the Extensions commit in GenAI's |
A critical heap buffer overflow vulnerability exists in the ONNX Runtime Extensions library, specifically in the
Phi4VisionProcessor::Computemethod withinimage_transforms_phi_4.hpp. The issue arises from a parser confusion attack where a CMYK JPEG image bypasses RGB sanitization, leading to an out-of-bounds memory copy. The root cause includes a sanitization bypass inDecodeImageand a buffer miscalculation inPhi4VisionProcessor. Exploitation of this vulnerability can result in Remote Code Execution (RCE) due to heap corruption.