How to make HDR work in shaders
- rj200
- Topic Author
I've just released my Glamayre shader with HDR support. HDR wasn't as easy as I thought. Here's what I learned.
You can detect whether the game is in HDR using BUFFER_COLOR_BIT_DEPTH, which ReShade gives you. It might be 8 (SDR), 10 (HDR10) or 16 (16 bit HDR).
The first thing you'll notice testing shaders on HDR games and monitors is you get errors about SRGBTexture and SRGBWriteEnable on any shader that uses them. They only work in 8 bit mode.
We can fix this by only enabling them in 8 bit mode. Example:
sampler2D samplerColor
{
// The texture to be used for sampling.
Texture = ReShade::BackBufferTex;
#if BUFFER_COLOR_BIT_DEPTH > 8
SRGBTexture = false;
#else
SRGBTexture = true;
#endif
};
technique Glamarye_Fast_Effects_without_Fake_GI
{
pass Glamayre
{
VertexShader = PostProcessVS;
PixelShader = Glamarye_Fast_Effects_without_Fake_GI_PS;
// SDR or HDR mode?
#if BUFFER_COLOR_BIT_DEPTH > 8
SRGBWriteEnable = false;
#else
SRGBWriteEnable = true;
#endif
}
}
16 bit mode: if BUFFER_COLOR_BIT_DEPTH==16 then it's quite simple - it's already in linear colour so no conversion is necessary. However, you need to be aware that the maximum colour is no longer 1 - it goes higher! The range 0-1 matches linear 0-1 in SDR - but bright lights go above 1! If you're doing things like saturate() in your shader you'll clip all the bright areas of the image. There was some suggestion in #code-chat that Epic Games use a nits scale instead (which means numbers will be bigger, with 80 being equivelent to SDR maximum.)
10 bit mode: Like SDR, HDR10 uses a non-linear curve to match what humans can see. However, it uses a different curve. SDR on PCs uses the sRGB curve (which is similar to gamma 2.2), HDR10 uses the PQ curve: en.wikipedia.org/wiki/Perceptual_quantizer . PQ is steeper and further from linear that sRGB - so you'll probably have visible problems if you ignore it. The PQ curved output the game gives you is from 0-1, after apply the EOTF to get back to linear it will be 0-10,000, in nits. ReShade doesn't handle this for you (yet). Here are functions to convert:
float3 toLinear(float3 c) {
float3 r = c;
#if BUFFER_COLOR_BIT_DEPTH == 10
//HDR10 we need to convert between PQ and linear. en.wikipedia.org/wiki/Perceptual_quantizer
const float m1 = 1305.0/8192.0;
const float m2 = 2523.0/32.0;
const float c1 = 107.0/128.0;
const float c2 = 2413.0/128.0;
const float c3 = 2392.0/128.0;
//Unneccessary max commands are to prevent compiler warnings, which might scare users.
float3 powc = pow(max(c,0),1.0/m2);
r = 10000 * pow(max( max(powc-c1, 0) / ( c2 - c3*powc ), 0) , 1.0/m1);
#endif
return r;
}
float4 getBackBufferLinear(float2 texcoord) {
float4 c = tex2D( samplerColor, texcoord);
c.rgb = toLinear(c.rgb);
return c;
}
float3 toOutputFormat(float3 c) {
float3 r = c;
#if BUFFER_COLOR_BIT_DEPTH == 10
//HDR10 we need to convert between PQ and linear. en.wikipedia.org/wiki/Perceptual_quantizer
const float m1 = 1305.0/8192.0;
const float m2 = 2523.0/32.0;
const float c1 = 107.0/128.0;
const float c2 = 2413.0/128.0;
const float c3 = 2392.0/128.0;
r = c*0.0001;
//Unneccessary max commands are to prevent compiler warnings, which might scare users.
float3 powc = pow(max(r,0),m1);
r = pow(max( ( c1 + c2*powc ) / ( 1 + c3*powc ), 0 ), m2);
#endif
return r;
}
Then every time you read or write to the backbuffer use these functions. If not in HDR10 mode they will be optimized away into nothing by the compiler.
//When reading BackBuffer
c = getBackBufferLinear(texcoord);
//At end of shader before returning the result
c.rgb = toOutputFormat(c);
Note: those examples aren't exactly the same as what I have in Glamayre. I changed the 10000 multiplier as I wanted the output in the range 0-20, instead of 0-10000, so that 0-1 roughly corresponds to 0-1 in SDR mode. Note: this is only approximate - the actual optimal range depends on the game's tonemapping (which, if it's done right, will be done based on the screen's advertised maximum brightness). Neither the screen spec nor game's tonemap algorithm is made available to ReShade shaders.
Tonemapping
Actually, the "linear" output SRGBTexture=true gives you isn't linear. Neither is what you get back from inverting HDR10's PQ. Before applying those output curves games have already applied tonemapping - a process which basically compresses the huge dynamic range of light in the world (real or virtual) into the limited range screens can produce. They basically squish the huge range of brightness in light areas into a small range values at the top of the output scale. This is worst in SDR, in HDR modes you have more headroom so don't need to squish it as much. Most of the time you can ignore this, but if you want to do lighting effects like SSGI or Bloom then it might matter.
Unfortunately reversing perfectly isn't possible as there are many algorithms and you don't know which one each game is using. Old games might use Reinhard (the original). Modern games might use ACES, which is based on how they do it in films.
In Glamayre for Fake GI. I bring back some of the dynamic range at the top end using an equation based on the inverse of extended reinhard. The default setting only expands the range up to 3 - you have to be conservative as every game is different - it is better to do too little, overdoing it makes it look bad. The effect is subtle but I believe it helps. For now I'm only doing it in SDR mode. Code:
#if BUFFER_COLOR_BIT_DEPTH == 8
uniform float tone_map = 3;
#endif
//Tone map compensation to make top end closer to linear.
float3 undoTonemap(float3 c) {
#if BUFFER_COLOR_BIT_DEPTH > 8
return c;
#else
return c/(1.0-(1.0-rcp(tone_map))*c);
#endif
}
float3 reapplyTonemap(float3 c) {
#if BUFFER_COLOR_BIT_DEPTH > 8
return c;
#else
return c/((1-rcp(tone_map))*c+1.0);
#endif
}
The Future
What could ReShade do to make it easier for shader writers?
One obvious thing would be to take care of PQ for you - replace or complement the SRGBWriteEnable & SRGBTexture options to also convert to and from PQ for 10bit textures.
For HDR unaware developers, we might reduce saturate() problems by scaling PQ output and scRGB values down to be between 0 and 1 for them by default. ReShade could check the output device's max brightness to do that optimally.
Modern games all render in 16 bit then convert down to 8 or 10 near the end. For shaders wanting linear data, Reshade could look for that 16 to 8 or 10 bit conversion and insert itself just before that step. Even in SDR mode shaders could then receive and output 16 bit data, which would reduce rounding imprecision from working in 8 bits.
Please Log in or Create an account to join the conversation.
- lordbean
Edit: regardless I think it's excellent that you took the time to document all this. I struggled a fair bit even getting DisplayHDR formats to work with HQAA, and to my knowledge HQAA is still incompatible with scRGB-based modes as I haven't yet figured out how to mathematically convert between linear and scRGB.
Please Log in or Create an account to join the conversation.
What need is for ReShade to give us the color space as well as bit depth.
Useful docs:
docs.microsoft.com/en-us/windows/win32/d...s/high-dynamic-range
For apps that consume HDR10-encoded content, such as media players, or apps that are expected to be used mainly in fullscreen scenarios such as games, when creating your swap chain you should consider specifying DXGI_FORMAT_R10G10B10A2_UNORM in DXGI_SWAP_CHAIN_DESC1 . By default, this is treated as using the sRGB color space; therefore, you must explicitly call IDXGISwapChain3::SetColorSpace1 , and set as your color space DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020 , also known as HDR10/BT.2100.
This combination has more stringent restrictions than FP16. You can use this only with Direct3D 11 or Direct3D 12.
Please Log in or Create an account to join the conversation.
- lordbean
Please Log in or Create an account to join the conversation.
Glamayre v5.1 has the new option and is out now: github.com/rj200/Glamarye_Fast_Effects_for_ReShade
If you select sRGB, Glamayre has to calculate the sRGB curve, ReShade won't do it for us for 10 bit backbuffer.
The PQ curve (and to a lesser extent sRGB) have a quite noticeable performance impact (more than 10% in Glamayre). Therefore I've implemented optional fast approximations of both that don't use the slow pow() function. You can read the functions here: github.com/rj200/Glamarye_Fast_Effects_f...Fast_Effects.fx#L712
A fast approximation is okay because we apply the conversion one way, apply our effects, then apply the opposite conversion - so inaccuracies in fast mode mostly cancel out. Glamayre doesn't need perfect linear colour, just something pretty close.
Option in ReShade 5 that inserts shaders earlier in the pipeline so they just get 16 bit linear is what I really want. Shouldn't be too hard with the new plugin system. Inserting before fog is added would be awesome for all AO plugins too.
Please Log in or Create an account to join the conversation.
- lordbean
Please Log in or Create an account to join the conversation.
- crosire
BUFFER_COLOR_SPACE
1 = srgb_nonlinear (aka sRGB)
2 = extended_srgb_linear (aka scRGB)
3 = hdr10_st2084 (aka HDR10)
Please Log in or Create an account to join the conversation.
- lordbean
Please Log in or Create an account to join the conversation.
- lordbean
#define HQAA_Tex2D(tex, coord) ConditionalDecode(tex2Dlod(tex, coord.xyxy))
#define HQAA_Tex2DOffset(tex, coord, offset) ConditionalDecode(tex2Dlodoffset(tex, coord.xyxy, offset))
#if HQAA_TARGET_COLOR_SPACE == 2
float encodePQ(float x)
{
/* float nits = 10000.0;
float m2rcp = 0.012683; // 1 / (2523/32)
float m1rcp = 6.277395; // 1 / (1305/8192)
float c1 = 0.8359375; // 107 / 128
float c2 = 18.8515625; // 2413 / 128
float c3 = 18.6875; // 2392 / 128
*/
float xpm2rcp = pow(clamp(x, 0.0, 1.0), 0.012683);
float numerator = max(xpm2rcp - 0.8359375, 0.0);
float denominator = 18.8515625 - (18.6875 * xpm2rcp);
return 10000.0 * pow(abs(numerator / denominator), 6.277395);
}
float2 encodePQ(float2 x)
{
/* float nits = 10000.0;
float m2rcp = 0.012683; // 1 / (2523/32)
float m1rcp = 6.277395; // 1 / (1305/8192)
float c1 = 0.8359375; // 107 / 128
float c2 = 18.8515625; // 2413 / 128
float c3 = 18.6875; // 2392 / 128
*/
float2 xpm2rcp = pow(clamp(x, 0.0, 1.0), 0.012683);
float2 numerator = max(xpm2rcp - 0.8359375, 0.0);
float2 denominator = 18.8515625 - (18.6875 * xpm2rcp);
return 10000.0 * pow(abs(numerator / denominator), 6.277395);
}
float3 encodePQ(float3 x)
{
/* float nits = 10000.0;
float m2rcp = 0.012683; // 1 / (2523/32)
float m1rcp = 6.277395; // 1 / (1305/8192)
float c1 = 0.8359375; // 107 / 128
float c2 = 18.8515625; // 2413 / 128
float c3 = 18.6875; // 2392 / 128
*/
float3 xpm2rcp = pow(clamp(x, 0.0, 1.0), 0.012683);
float3 numerator = max(xpm2rcp - 0.8359375, 0.0);
float3 denominator = 18.8515625 - (18.6875 * xpm2rcp);
return 10000.0 * pow(abs(numerator / denominator), 6.277395);
}
float4 encodePQ(float4 x)
{
return float4(encodePQ(x.rgb), x.a);
}
float decodePQ(float x)
{
/* float nits = 10000.0;
float m2 = 78.84375 // 2523 / 32
float m1 = 0.159302; // 1305 / 8192
float c1 = 0.8359375; // 107 / 128
float c2 = 18.8515625; // 2413 / 128
float c3 = 18.6875; // 2392 / 128
*/
float xpm1 = pow(clamp(x / 10000.0, 0.0, 1.0), 0.159302);
float numerator = 0.8359375 + (18.8515625 * xpm1);
float denominator = 1.0 + (18.6875 * xpm1);
return pow(abs(numerator / denominator), 78.84375);
}
float2 decodePQ(float2 x)
{
/* float nits = 10000.0;
float m2 = 78.84375 // 2523 / 32
float m1 = 0.159302; // 1305 / 8192
float c1 = 0.8359375; // 107 / 128
float c2 = 18.8515625; // 2413 / 128
float c3 = 18.6875; // 2392 / 128
*/
float2 xpm1 = pow(clamp(x / 10000.0, 0.0, 1.0), 0.159302);
float2 numerator = 0.8359375 + (18.8515625 * xpm1);
float2 denominator = 1.0 + (18.6875 * xpm1);
return pow(abs(numerator / denominator), 78.84375);
}
float3 decodePQ(float3 x)
{
/* float nits = 10000.0;
float m2 = 78.84375 // 2523 / 32
float m1 = 0.159302; // 1305 / 8192
float c1 = 0.8359375; // 107 / 128
float c2 = 18.8515625; // 2413 / 128
float c3 = 18.6875; // 2392 / 128
*/
float3 xpm1 = pow(clamp(x / 10000.0, 0.0, 1.0), 0.159302);
float3 numerator = 0.8359375 + (18.8515625 * xpm1);
float3 denominator = 1.0 + (18.6875 * xpm1);
return pow(abs(numerator / denominator), 78.84375);
}
float4 decodePQ(float4 x)
{
return float4(decodePQ(x.rgb), x.a);
}
#endif //HQAA_TARGET_COLOR_SPACE
#if HQAA_TARGET_COLOR_SPACE == 1
float encodeHDR(float x)
{
return x * HqaaHdrNits;
}
float2 encodeHDR(float2 x)
{
return x * HqaaHdrNits;
}
float3 encodeHDR(float3 x)
{
return x * HqaaHdrNits;
}
float4 encodeHDR(float4 x)
{
return x * HqaaHdrNits;
}
float decodeHDR(float x)
{
return clamp(x, 0.0, 497.0) / 497.0;
}
float2 decodeHDR(float2 x)
{
return clamp(x, 0.0, 497.0) / 497.0;
}
float3 decodeHDR(float3 x)
{
return clamp(x, 0.0, 497.0) / 497.0;
}
float4 decodeHDR(float4 x)
{
return clamp(x, 0.0, 497.0) / 497.0;
}
#endif //HQAA_TARGET_COLOR_SPACE
float ConditionalEncode(float x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return encodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return encodePQ(x);
#else
return x;
#endif
}
float2 ConditionalEncode(float2 x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return encodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return encodePQ(x);
#else
return x;
#endif
}
float3 ConditionalEncode(float3 x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return encodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return encodePQ(x);
#else
return x;
#endif
}
float4 ConditionalEncode(float4 x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return encodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return encodePQ(x);
#else
return x;
#endif
}
float ConditionalDecode(float x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return decodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return decodePQ(x);
#else
return x;
#endif
}
float2 ConditionalDecode(float2 x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return decodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return decodePQ(x);
#else
return x;
#endif
}
float3 ConditionalDecode(float3 x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return decodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return decodePQ(x);
#else
return x;
#endif
}
float4 ConditionalDecode(float4 x)
{
#if HQAA_TARGET_COLOR_SPACE == 1
return decodeHDR(x);
#elif HQAA_TARGET_COLOR_SPACE == 2
return decodePQ(x);
#else
return x;
#endif
}
Please Log in or Create an account to join the conversation.
- lordbean
Please Log in or Create an account to join the conversation.
- aaronth07
Please Log in or Create an account to join the conversation.
- Daemonjax
Setup GUI:
github.com/rj200/Glamarye_Fast_Effects_f...Fast_Effects.fx#L442
Functions for Color space conversion:
github.com/rj200/Glamarye_Fast_Effects_f...Fast_Effects.fx#L768
By splitting into multiple functions I hope it's easier for others to re-use my code.
Minor complication - if bit depth is 8 we want to use SRGBWriteEnable and SRGBTexture in passes and samplers - see the #if in those parts.
Note: there's a small bug in 5.1's BUFFER_COLOR_SPACE - it doesn't change if the game changes HDR mode. crosire already accepted my patch so it should work in the next release.
I might be tempted in a future version to simplify and simply say ReShade 5.1.1+ is required for HDR and simplify the code.
Please Log in or Create an account to join the conversation.
- It's a standalone version of the new sharpening algorithm in Glamayre.
- The HDR bits are simplified compared to Glamayre
- No manual curve selection, only auto-selection via ReShade 5.1's BUFFER_COLOR_SPACE
- Hopefully it's easier to understand and copy from.
Please Log in or Create an account to join the conversation.