Add VK_EXT_ycbcr_3plane_16bit_lsb_formats extension by rmader · Pull Request #2709 · KhronosGroup/Vulkan-Docs

rmader · 2026-04-01T21:31:42Z

This extension adds support for 10/12bit YCbCr formats used by software decoders like ffmpeg, dav1d and libvpx.

See also:

This branch as well as the related Mesa and GTK4 MRs have been in draft status for a while and already got a lot of testing. With the recent Gstreamer 1.28 release (shipped in distros like Fedora 44 and Ubuntu 26.04) they can be tested and used - e.g. the default Gnome Video player will make use of the new formats when playing HDR videos (for various codecs, including AV1, HEVC, H266, Pro-Res and DNxHR).

@gfxstrand I'd be super happy if you could have a look at this and, if you find time, help me get this over the line 😅 Most importantly point me to where I should elaborate more.

oddhack · 2026-04-29T12:41:35Z

@rmader please rebase on github main branch, which should fix the CI CTS framework issue, and address the other issues raised by CI around copyright dates and naming.

rmader · 2026-04-30T09:18:24Z

@rmader please rebase on github main branch, which should fix the CI CTS framework issue, and address the other issues raised by CI around copyright dates and naming.

Thanks for the review! I think I addressed all issues - and hope the change to append _EXT to all formats is correct. The goal of course being that they'll get promoted at some point going forward.

rmader · 2026-04-30T13:34:19Z

I added one more change for completeness - 14bit formats. While those are very uncommon AFAIK, they are supported by ffmpeg. Thus let's include them while on it, so we don't have to revisit this in a couple of years.

gfxstrand

Sorry this took me nearly a month to get to. Between a Khronos meeting and vacation, it kind of got lost in my e-mail.

gfxstrand · 2026-05-01T14:03:43Z

+Add the formats in question. For any drivers already supporting YCbCr formats
+via shaders this should be straight forward. Values in the 0.0 - 1.0 range just
+need to get multiplied by 65535.0 / 1023.0 (for 10 bit), 65535.0 / 4095.0 (for
+12 bit) or 65535.0 / 16383.0 (for 14 bit).


This is a tricky assertion to make. Right now, we assume that the X bits are garbage and (mostly) ignored by sampling. I say "mostly" because it's kind of okay if they're not since they're the low bits and any garbage in those bits will show up as noise which likely isn't perceptible to the human eye. With theese formats, however, your assertion that they're easy for shaders by doing a bit of multiplication assumes that the top bits are always zero. While this is fairly easy for software to guarantee when outputting such an image, it's very hard for the driver to defend against. We could make it invalid usage or unknown results if you have non-zero bits in the high bits when sampling but we'd need to be very explicit about that in the API.

Indeed. So on the DRM side we solved that by using z instead of x: https://github.com/torvalds/linux/blob/master/include/uapi/drm/drm_fourcc.h#L403-L405

We could do the same here by turning

VK_FORMAT_X6G10_X6B10_X6R10_3PLANE_420_UNORM_3PACK16_EXT

into

VK_FORMAT_Z6G10_Z6B10_Z6R10_3PLANE_420_UNORM_3PACK16_EXT

which would hopefully make this clear. WDYT?

We could do that. Since no one will ever be writing to these formats, it's not like we would need hardware to support that. I'd feel a little better about that than saying it's ignored. Again, I think this is where @fluppeteer will have good opinions. As might @spencer-lunarg since he did the format table in the first place.

Since no one will ever be writing to these formats

For completeness: these formats are also the native/optimal input formats for common software encoders, meaning there is a chance that at some point we'd want native driver support for writing into shared buffers that can directly be fed into those sw-encoders. Not super likely, just to keep in mind.

Did this for now (X -> Z).

Sorry for the delay - I tried to write this a couple of weeks ago and GitHub ate it.

I don't love Z as a choice for the same depth-related reasons as @gfxstrand (and arguably should we decide to use WXYZ in a format at any point), but it may be the least bad option - N for "null" may be an alternative, but that may conflict with the common writing of "nnnn" to mean a series of bits. As you say, this feels like a 16-bit format with a different decode path for scaling, since we're saying the "must be zero" bits are actually significant and just expected to be zero; I'm not 100% clear that our hardware, at least, would be able to support this, since our hardware matrix transform has limited inputs. At least we avoided using Z for depth. (Fortunately a recent KDFS update can express mandatory 0/1 bits, I hope as is needed here.)

For reference it would be nice to include these formats in the 56.1 descriptions. C.f.:

VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 specifies an unsigned normalized multi-planar format that has a 10-bit G component in the top 10 bits of each 16-bit word of plane 0, and a two-component, 32-bit BR plane 1 consisting of a 10-bit B component in the top 10 bits of the word in bytes 0..1, and a 10-bit R component in the top 10 bits of the word in bytes 2..3, with the bottom 6 bits of each word unused. The horizontal and vertical dimensions of the BR plane are halved relative to the image dimensions, and each R and B value is shared with the G components for which ⌊iG×0.5⌋=[iB]=[iR] and ⌊jG×0.5⌋=[jB]=[jR]. The location of each plane when this image is in linear layout can be determined via vkGetImageSubresourceLayout, using VK_IMAGE_ASPECT_PLANE_0_BIT for the G plane, and VK_IMAGE_ASPECT_PLANE_1_BIT for the BR plane. This format only supports images with a width and height that is a multiple of two.

I don't think I see that in this PR? (Unless they're autogenerated these days.)

I don't think I see that in this PR? (Unless they're autogenerated these days.)

Ouch, I'm sorry - this was indeed missing and arguably a very important part 😅

Added that now.

I don't love Z as a choice for the same depth-related reasons as @gfxstrand (and arguably should we decide to use WXYZ in a format at any point), but it may be the least bad option

Thanks! IIUC that means unless somebody else voices resistance we can go forward with that.

I'm not 100% clear that our hardware, at least, would be able to support this

For the record: even if the formats can only be supported via a shader fallback path it would still be a useful for multimedia apps, freeing them from having to carry custom shaders themselves.

gfxstrand · 2026-05-01T14:08:07Z

@fluppeteer should probably also give this a read.

oddhack · 2026-05-06T14:05:34Z

I got tired with adding explanatory comments, but there are what appear to be a bunch of minor markup errors in the new VUs which I have suggested fixes for.

oddhack · 2026-05-06T14:05:58Z

(Sorry, accidentally clicked the comment & close button).

oddhack · 2026-05-06T15:12:59Z

Github is borked ATM (presumably because it is a day ending in 'y') and won't let me comment on the diff for some reason, but features.adoc line 1845 introduced some leading white space before the ' ifdef::' which should be removed.

rmader · 2026-05-06T15:56:01Z

Github is borked ATM (presumably because it is a day ending in 'y') and won't let me comment on the diff for some reason, but features.adoc line 1845 introduced some leading white space before the ' ifdef::' which should be removed.

Whops, fixed. Also should have addressed all other issues from the last CI run (and ran the corresponding scripts locally to validate that)

rmader · 2026-05-06T15:59:58Z

Sorry for the noise, just some more minor fixes.

fluppeteer

Feedback from two weeks ago...

fluppeteer · 2026-05-19T22:01:17Z

+Add the formats in question. For any drivers already supporting YCbCr formats
+via shaders this should be straight forward. Values in the 0.0 - 1.0 range just
+need to get multiplied by 65535.0 / 1023.0 (for 10 bit), 65535.0 / 4095.0 (for
+12 bit) or 65535.0 / 16383.0 (for 14 bit).


Sorry for the delay - I tried to write this a couple of weeks ago and GitHub ate it.

I don't love Z as a choice for the same depth-related reasons as @gfxstrand (and arguably should we decide to use WXYZ in a format at any point), but it may be the least bad option - N for "null" may be an alternative, but that may conflict with the common writing of "nnnn" to mean a series of bits. As you say, this feels like a 16-bit format with a different decode path for scaling, since we're saying the "must be zero" bits are actually significant and just expected to be zero; I'm not 100% clear that our hardware, at least, would be able to support this, since our hardware matrix transform has limited inputs. At least we avoided using Z for depth. (Fortunately a recent KDFS update can express mandatory 0/1 bits, I hope as is needed here.)

For reference it would be nice to include these formats in the 56.1 descriptions. C.f.:

VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16 specifies an unsigned normalized multi-planar format that has a 10-bit G component in the top 10 bits of each 16-bit word of plane 0, and a two-component, 32-bit BR plane 1 consisting of a 10-bit B component in the top 10 bits of the word in bytes 0..1, and a 10-bit R component in the top 10 bits of the word in bytes 2..3, with the bottom 6 bits of each word unused. The horizontal and vertical dimensions of the BR plane are halved relative to the image dimensions, and each R and B value is shared with the G components for which ⌊iG×0.5⌋=[iB]=[iR] and ⌊jG×0.5⌋=[jB]=[jR]. The location of each plane when this image is in linear layout can be determined via vkGetImageSubresourceLayout, using VK_IMAGE_ASPECT_PLANE_0_BIT for the G plane, and VK_IMAGE_ASPECT_PLANE_1_BIT for the BR plane. This format only supports images with a width and height that is a multiple of two.

I don't think I see that in this PR? (Unless they're autogenerated these days.)

rmader · 2026-06-08T08:52:22Z

I took the freedom to re-request reviews, hope that's ok 🙈

To quickly re-iterate why I'd love to see this land and therefor be very thankful for further reviews/acks: at least on Linux + Mesa the corresponding GL and DRM formats are already used by default by e.g. Gstreamer + GTK4 based video apps, including the default Gnome video player and various other apps in the ecosystem.

Having proper format representations and driver support allows better performance for software decoded video on most devices with unified memory. Most importantly: if buffers used by the software decoder are allocated as dmabuf (via udmabuf or dma heaps), the buffers can be imported and used by the GPU directly (via driver-internal shaders).

Not having these Vulkan formats currently forces GTK4 - which uses Vulkan by default - to fall back to a slower and overall suboptimal GLES path - and I expect other multimedia apps planning to switch to VK to run into similar issues going forward.

tl;dr: landing this extension would help close a feature gap between VK and GLES for multimedia apps (at least in the FDO / Linux world) :)

fluppeteer · 2026-06-10T11:05:26Z

Sorry (again) for the delay. The updated descriptions look fine - thank you.

If I'm hesitant it's only because we kind of have a Vulkan policy not to introduce features that developers couldn't implement just as well themselves. In this case, that means someone can actually support this efficiently in hardware, especially since the fall-back path (especially with sampling) is quite painful code, and developers might want to short-circuit it (and may also want to be precise about chroma resampling choices). Unless the IHV's software emulation is guaranteed to be as fast as anything the developer could do (and it's added just to make the ecosystem more consistent), we tend not to like adding software fallbacks; it means developers get unexpected performance impacts because they're expecting all format support to be fairly similar in performance. I've not thought through the interactions where some paths are hardware accelerated and others aren't (when it comes to things like dynamic state), but I suspect there may be dragons.

I say that partly because I've confirmed that we (Imagination) would have a problem: we have specific hardware for pulling out the 10- and 12-bit subsets, and our colour transform matrices don't use a conventional float encoding - we can't just scale the values up in the matrix. For us, if we supported this, it would be a software fallback, at least for the foreseeable future (and, as an IP vendor, our design sales have a very long tail). That probably means developers who care about our hardware would need to implement the fallback themselves anyway (or someone could implement a layer to catch it). I'm not sure how much that helps - if it's a GTK thing, the workaround might be possible (and necessary) in Vulkan orthogonally to the existence off this extension.

I'd welcome comments from other IHVs, though. Obviously if everyone else can do this efficiently in hardware and we're an outlier, that's different from it being a special code path for everyone.

oddhack · 2026-06-10T13:43:35Z

FYI the "CTS framework tests" CI failure is spurious - there's an upstream problem in the repo that script is pulled from which is getting dealt with separately.

rmader · 2026-06-10T16:05:39Z

If I'm hesitant it's only because we kind of have a Vulkan policy not to introduce features that developers couldn't implement just as well themselves. In this case, that means someone can actually support this efficiently in hardware, especially since the fall-back path (especially with sampling) is quite painful code, and developers might want to short-circuit it (and may also want to be precise about chroma resampling choices). Unless the IHV's software emulation is guaranteed to be as fast as anything the developer could do (and it's added just to make the ecosystem more consistent), we tend not to like adding software fallbacks; it means developers get unexpected performance impacts because they're expecting all format support to be fairly similar in performance. I've not thought through the interactions where some paths are hardware accelerated and others aren't (when it comes to things like dynamic state), but I suspect there may be dragons.

Thanks, this is a very good point. A couple of comments

I have not yet confirmed that any existing GPU hardware can accelerate any of these formats further than what is already possible with shaders within an app or in a fallback path. There are no hardware implementations for the corresponding GL formats yet, even though they also only got added relatively recently and there might be yet-unknown potential. In that regard your hesitation is definitely justified.
What is known is that certain display engines support some of these formats. Notably the format used for usual HDR10 videos (VK_FORMAT_Z6G10_Z6B10_Z6R10_3PLANE_420_UNORM_3PACK16_EXT which maps to DRM_FORMAT_S010 on Linux) is supported on e.g. the Raspberry Pi 5. That means that at least in the Wayland ecosystem having the format supported by the driver via a fallback path has a big practical advantage for both apps and compositors: they can easily advertise support for the format (in the best case archiving a zero-copy path from CPU to display engine without using the GPU at all, otherwise using the driver fallback) without having to duplicate the corresponding shaders between every app/toolkit (GTK4, QT, Chromium etc.) and Wayland compositor.
Even without display engine support it is often preferable to offload video composition from apps to system/Wayland compositors for performance reasons. Doing so reduces the worst case number of composition steps from two - one in the app, one in the compositor - to just the later. Therefor widespread support for these formats in compositors is desirable on most/all hardware - and outside of the control of individual apps. Many Wayland compositors (e.g. Mutter, Sway, Weston) already support the formats using GL(ES). Without this new extension each of them will need custom shaders equivalent to what would otherwise live in the driver - which in case of Mesa is shared between all platforms and, as of recently, even between Vulkan and GL.
To my knowledge existing software fallbacks - say the one for NV12 in Mesa - are considered quite fast / close to what's possible. I might be missing something, but I'm not aware of any existing apps preferring custom NV12 shaders for performance reasons. For example the shaders in Gstreamer and Firefox are essentially copies of what is found in Mesa and mostly exist for compatibility/historical reasons. Some apps like mpv apparently prefer their own, slower shaders in order to have tighter control over quality, however.

tl;dr: even without hardware support this extension can provide non-trivial quality-of-life improvements for multimedia ecosystems - for the increasingly common use-case of HDR video playback. IMHO that justifies the extension. Nonetheless I agree that it falls into a grey zone until at least one of the formats can be further accelerated on at least one existing GPU.

rmader · 2026-06-10T16:19:22Z

P.S. regarding

That probably means developers who care about our hardware would need to implement the fallback themselves anyway (or someone could implement a layer to catch it). I'm not sure how much that helps - if it's a GTK thing, the workaround might be possible (and necessary) in Vulkan orthogonally to the existence off this extension.

In case of GTK the plan is to rely on either having the driver handle the formats (which would be the case for all Mesa Vulkan drivers going forward) or fall back to either GL or software conversion, resulting in lower performance. I.e. the GTK devs would strongly prefer to not have to introduce YCbCr->RGB shaders in their Vulkan backend. I expect the same to apply to Chromium and similar apps.

oddhack · 2026-06-11T01:42:22Z

Heads-up: we have made a formatting change to vk.xml to crunch out whitespace in XML attribute lists, which will probably affect you the next time you sync this branch with current main branch. When you do this, try following the process in https://github.com/KhronosGroup/Vulkan-Docs/wiki/Merge-XML-Whitespace, which should prevent the need for any tedious conflict resolution consequent to the whitespace change.

This extension adds support for 10/12bit YCbCr formats used by software decoders like ffmpeg, dav1d and libvpx. See https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34303

rmader · 2026-06-11T09:19:02Z

Rebased and ran normalalize_xml.py - no other changes.

rmader commented Apr 1, 2026

View reviewed changes

Comment thread chapters/formats.adoc Outdated

r-potter reviewed Apr 29, 2026

View reviewed changes

Comment thread proposals/VK_EXT_ycbcr_3plane_16bit_lsb_formats.adoc Outdated

r-potter reviewed Apr 29, 2026

View reviewed changes

Comment thread proposals/VK_EXT_ycbcr_3plane_16bit_lsb_formats.adoc Outdated

r-potter reviewed Apr 29, 2026

View reviewed changes

Comment thread appendices/VK_EXT_ycbcr_3plane_16bit_lsb_formats.adoc Outdated

rmader force-pushed the ycbcr-16bit-lsb-formats branch 4 times, most recently from c04f6cb to 8d96571 Compare April 30, 2026 09:15

r-potter reviewed Apr 30, 2026

View reviewed changes

Comment thread chapters/features.adoc Outdated

rmader force-pushed the ycbcr-16bit-lsb-formats branch 2 times, most recently from 15b2a7c to b73a6f0 Compare April 30, 2026 13:32

rmader force-pushed the ycbcr-16bit-lsb-formats branch from b73a6f0 to 21b0612 Compare April 30, 2026 15:02

gfxstrand reviewed May 1, 2026

View reviewed changes