Skip to content

security bugfix for Phi4VisionProcessor#1034

Open
prathikr wants to merge 2 commits intomainfrom
prathikrao/cmyk-rcb-security-fix
Open

security bugfix for Phi4VisionProcessor#1034
prathikr wants to merge 2 commits intomainfrom
prathikrao/cmyk-rcb-security-fix

Conversation

@prathikr
Copy link
Contributor

@prathikr prathikr commented Mar 5, 2026

A critical heap buffer overflow vulnerability exists in the ONNX Runtime Extensions library, specifically in the Phi4VisionProcessor::Compute method within image_transforms_phi_4.hpp. The issue arises from a parser confusion attack where a CMYK JPEG image bypasses RGB sanitization, leading to an out-of-bounds memory copy. The root cause includes a sanitization bypass in DecodeImage and a buffer miscalculation in Phi4VisionProcessor. Exploitation of this vulnerability can result in Remote Code Execution (RCE) due to heap corruption.

@prathikr prathikr requested a review from a team as a code owner March 5, 2026 22:21
@vraspar vraspar requested a review from Copilot March 6, 2026 23:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses a reported heap buffer overflow in the Phi-4 vision preprocessing pipeline by preventing non-RGB (notably CMYK JPEG) inputs from propagating into transforms that assume 3-channel data.

Changes:

  • Add channel-count validation in Phi4VisionDynamicPreprocess to reject non-3-channel tensors.
  • Add JPEG decoder validation to reject unsupported JPEG color spaces based on output_components.
  • Add a Python regression test that constructs a CMYK JPEG and asserts the pipeline rejects it.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
test/test_pp_api.py Adds a security regression test to ensure CMYK JPEG inputs are rejected.
shared/api/image_transforms_phi_4.hpp Rejects non-3-channel tensors at the Phi-4 dynamic preprocess entry point.
operators/vision/image_decoder.hpp Rejects JPEGs that decode to an unexpected channel count (e.g., CMYK=4).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 134 to +143
// Start decompression
jpeg_start_decompress(&cinfo);

// Security: reject non-RGB/grayscale color spaces (e.g. CMYK has 4 channels).
// Allowing other channel counts propagates attacker-controlled tensor shapes downstream,
// causing heap overflows in consumers that assume 3 channels (CVE-class: CWE-122).
if (cinfo.output_components != 3 && cinfo.output_components != 1) {
jpeg_destroy_decompress(&cinfo);
return {kOrtxErrorInvalidArgument,
"[ImageDecoder]: Unsupported JPEG color space. Only RGB and grayscale are supported."};
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JPEG path allows output_components == 1 (grayscale), but DecodePNG always returns 3 channels and downstream code (e.g. DecodeImage BGR swap logic and many vision transforms) assumes 3-channel data. Returning a 1-channel tensor here can lead to out-of-bounds access/corruption when later code indexes as if channels==3. Consider forcing JPEG decode output to RGB (3 channels) or rejecting grayscale JPEGs to keep the output shape consistent and safe.

Suggested change
// Start decompression
jpeg_start_decompress(&cinfo);
// Security: reject non-RGB/grayscale color spaces (e.g. CMYK has 4 channels).
// Allowing other channel counts propagates attacker-controlled tensor shapes downstream,
// causing heap overflows in consumers that assume 3 channels (CVE-class: CWE-122).
if (cinfo.output_components != 3 && cinfo.output_components != 1) {
jpeg_destroy_decompress(&cinfo);
return {kOrtxErrorInvalidArgument,
"[ImageDecoder]: Unsupported JPEG color space. Only RGB and grayscale are supported."};
// Force libjpeg to output 3-channel RGB regardless of the original color space
// (e.g., grayscale and YCbCr will be expanded/converted to RGB). This keeps the
// output tensor shape consistent with PNG decoding and downstream 3-channel
// assumptions (H x W x 3).
cinfo.out_color_space = JCS_RGB;
// Start decompression
jpeg_start_decompress(&cinfo);
// Security: reject non-3-channel output. Allowing other channel counts propagates
// attacker-controlled tensor shapes downstream, causing heap overflows in
// consumers that assume 3 channels (CVE-class: CWE-122).
if (cinfo.output_components != 3) {
jpeg_destroy_decompress(&cinfo);
return {kOrtxErrorInvalidArgument,
"[ImageDecoder]: Unsupported JPEG color space. Decoder must produce RGB (3 channels)."};

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +145
// Security: reject non-RGB/grayscale color spaces (e.g. CMYK has 4 channels).
// Allowing other channel counts propagates attacker-controlled tensor shapes downstream,
// causing heap overflows in consumers that assume 3 channels (CVE-class: CWE-122).
if (cinfo.output_components != 3 && cinfo.output_components != 1) {
jpeg_destroy_decompress(&cinfo);
return {kOrtxErrorInvalidArgument,
"[ImageDecoder]: Unsupported JPEG color space. Only RGB and grayscale are supported."};
}

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an early-return error path later in this function (if (srcManager.extError != kOrtxOK) return ...;) that exits without calling jpeg_finish_decompress/jpeg_destroy_decompress, which can leak libjpeg resources. Since this change adds a new early return with cleanup, it would be good to apply the same cleanup pattern to the existing extError return as well (e.g., ensure jpeg_destroy_decompress(&cinfo) runs on all return paths after jpeg_create_decompress).

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +34
// Security: reject non-RGB inputs (e.g. CMYK JPEGs decoded as 4-channel tensors).
// Downstream allocation is hardcoded to 3 channels, so a 4-channel input would cause
// memcpy to write past the buffer end, enabling heap corruption / RCE (CWE-122).
int64_t c = ts_image.Shape()[2];
if (c != 3) {
return {kOrtxErrorInvalidArgument,
"[Phi4VisionProcessor]: Expected 3-channel RGB input, got " + std::to_string(c) + " channels"};
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check protects Phi4VisionDynamicPreprocess, but Phi4VisionProcessor::Compute in the same header still hardcodes 3 channels when allocating/copying tensors. For defense-in-depth (and to match the PR’s security goal), consider adding the same channels == 3 validation in Phi4VisionProcessor::Compute so it remains safe even if a non-RGB tensor reaches it through another path.

Copilot uses AI. Check for mistakes.
f.write(buf.getvalue())
tmp_path = f.name
try:
with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"):
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses assertRaises(Exception), which is very broad and can mask unrelated failures (e.g., file/JSON issues) while still passing. It would be more reliable to assert a narrower exception type (e.g., the RuntimeError raised by the pybind layer) and/or validate the error message (e.g., via assertRaisesRegex) to ensure the failure is specifically due to CMYK rejection.

Suggested change
with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"):
with self.assertRaises(RuntimeError, msg="CMYK JPEG must be rejected, not silently overflow the heap"):

Copilot uses AI. Check for mistakes.
Comment on lines +812 to +823
proc = pp_api.create_processor(phi4_proc_path)
buf = io.BytesIO()
Image.new("CMYK", (64, 64), color=(128, 64, 32, 255)).save(buf, format="JPEG")
with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f:
f.write(buf.getvalue())
tmp_path = f.name
try:
with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"):
images = pp_api.load_images([tmp_path])
pp_api.image_pre_process(proc, images)
finally:
os.unlink(tmp_path)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test creates a native processor via pp_api.create_processor(...) but never calls pp_api.delete_object(proc), so the underlying C++ object may leak for the duration of the test run. Consider using pp_api.ImageProcessor(phi4_proc_path) (which deletes in __del__) or explicitly deleting the processor handle in the finally block.

Suggested change
proc = pp_api.create_processor(phi4_proc_path)
buf = io.BytesIO()
Image.new("CMYK", (64, 64), color=(128, 64, 32, 255)).save(buf, format="JPEG")
with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f:
f.write(buf.getvalue())
tmp_path = f.name
try:
with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"):
images = pp_api.load_images([tmp_path])
pp_api.image_pre_process(proc, images)
finally:
os.unlink(tmp_path)
proc = None
buf = io.BytesIO()
Image.new("CMYK", (64, 64), color=(128, 64, 32, 255)).save(buf, format="JPEG")
with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f:
f.write(buf.getvalue())
tmp_path = f.name
try:
proc = pp_api.create_processor(phi4_proc_path)
with self.assertRaises(Exception, msg="CMYK JPEG must be rejected, not silently overflow the heap"):
images = pp_api.load_images([tmp_path])
pp_api.image_pre_process(proc, images)
finally:
os.unlink(tmp_path)
if proc is not None:
pp_api.delete_object(proc)

Copilot uses AI. Check for mistakes.
@kunal-vaishnavi
Copy link
Contributor

kunal-vaishnavi commented Mar 6, 2026

Thanks for the fix! Have we tested that E2E still works fine with ORT GenAI? If not, the simplest way is to create a dev branch, change the Extensions commit in GenAI's cmake/deps.txt file, and build GenAI from source before running any of the E2E examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants