New shader: cinematic depth of field
- OtisInf
- Topic Author
Excellent! I indeed lacked the necessary info to fix it properly. Shall I leave the workaround in the shader though? I don't think it does that much harm (it's a preprocessor directive after all), and it then works on people with reshade dlls that don't contain the fix (yet).crosire wrote: Fixed the bug: github.com/crosire/reshade/commit/5d56c8...dc62093efa0006ee475d . Workaround should no longer be necessary with that.
While visiting the AST ReShade builds a string with the sampler definitions for the current pass. Before the fix it accidentally combined the definitions of the vertex and pixel shader, so if you used the same texture in both, that would cause a redefinition. Now they are seperate like it should be.
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
Also added a fix for the focusing helper plane not rendering correctly when the hyperfocal far plane became negative due to long focal length
Please Log in or Create an account to join the conversation.
- Daodan
I think it's very useful, maybe you want to add something like that to the shader.
Added uniform:
uniform float4 FocusCrosshairColor<
ui_category = "Focusing, overlay";
ui_label = "Focus crosshair color";
ui_type = "color";
ui_tooltip = "Specifies the color of the crosshair for the manual auto-focus.\nAuto-focus must be enabled\nMouse-driven auto-focus must be disabled";
> = float4(1.0, 0.0, 1.0, 1.0);
PS_FocusHelper:
(The last couple of lines)
void PS_FocusHelper(in VSFOCUSINFO focusInfo, out float4 fragment : SV_Target0)
{
#if __RENDERER__ == 0x9300 // d3d9 gives a compile error due to a glitch in reshade if we read from the depth buffer in a vertex shader so we'll work around that
FillFocusInfoData(focusInfo);
#endif
fragment = tex2D(SamplerCDBuffer3, focusInfo.texcoord);
if(ShowOutOfFocusPlaneOnMouseDown && LeftMouseDown)
{
float depthPixelInMM = ReShade::GetLinearizedDepth(focusInfo.texcoord) * 1000.0 * 1000.0;
float coc = tex2D(SamplerCDFocus, focusInfo.texcoord).x;
float4 colorToBlend = fragment;
if(depthPixelInMM < focusInfo.nearPlaneInMM || depthPixelInMM > focusInfo.farPlaneInMM)
{
colorToBlend = float4(OutOfFocusPlaneColor, 1.0);
}
else
{
if(abs(coc) < focusInfo.pixelSizeLength)
{
colorToBlend = float4(FocusPlaneColor, 1.0);
}
}
fragment = lerp(fragment, colorToBlend, OutOfFocusPlaneColorTransparency);
if(UseAutoFocus && !UseMouseDrivenAutoFocus)
{
fragment = lerp(fragment, FocusCrosshairColor, FocusCrosshairColor.w * saturate(exp(-BUFFER_WIDTH * length(focusInfo.texcoord - float2(AutoFocusPoint.x, focusInfo.texcoord.y)))));
fragment = lerp(fragment, FocusCrosshairColor, FocusCrosshairColor.w * saturate(exp(-BUFFER_HEIGHT * length(focusInfo.texcoord - float2(focusInfo.texcoord.x, AutoFocusPoint.y)))));
}
}
}
One additional suggestion regarding the uniform "AutoFocusPoint": the default step size is very coarse, setting it to 0.001 makes it more controllable.
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
If you don't mind me asking, do you have a reference for the math you're using? My math is rather rusty after doing high speed database code for the past 20 years so I didn't understand the exp() usage and I'd like to learn more about that approach as it likely will help me with future shader coding A link to an explanation or even a term is fine/enough. Thanks!
Please Log in or Create an account to join the conversation.
- Daodan
The idea behind using the exponential function is to prevent sharp edges when drawing something onto the screen (lines, arbitrary curves, etc.) .
Check the plot below out. The y-axis is the value exp() spits out, and the x-axis is basically the distance between the current texcoord and the point to draw.
One property of the exponential function is that is always equals to 1 when the argument is 0. So, when texcoord and the point to draw are fairly close (or overlap) you always get 1 (because the distance is 0). With increasing distance exp()'s result creeps towards 0. The negative factor with which the distance gets multiplied dictates how fast exp()'s result goes to 0.
WolframAlpha Plot Example
[Edit] [strike]The repo is not quite up to date. Will fix it![/strike] Should be fine now.
You can also check out one of my shaders that uses that: AdaptiveTint.fx (the AdaptiveTintDebug technique)
For that to work you also need Stats.fx(h) and Tools.fxh
Shader repository
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
Please Log in or Create an account to join the conversation.
- WalterDasTrevas
Please Log in or Create an account to join the conversation.
- NesQEdits
drive.google.com/open?id=1DT7KJlxEe7VtEqt12WRT5W_W8WBWxbCr
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
Please Log in or Create an account to join the conversation.
- NesQEdits
Please Log in or Create an account to join the conversation.
- MaxG3D
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
Please Log in or Create an account to join the conversation.
- Daodan
Just read the comment to the PR and i'm curious about the attached image. Do you use a custom version of ReShade or the official one? It's just that accessing the depth buffer in Remember Me has never worked for me.
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
It's the official one (3.4.0). It's tricky indeed. I got it working when I enabled supersampling in-game. Disabling that and it doesn't work. I haven't tried the depth buffer options now available in reshade btw, with supersampling disabled, so dunno if that can work around that. Did you try supersampling on ? (If you use gedosato to get a high resolution btw it will kill the depth buffer if you downsample through gedosato)Daodan wrote: @OtisInf
Just read the comment to the PR and i'm curious about the attached image. Do you use a custom version of ReShade or the official one? It's just that accessing the depth buffer in Remember Me has never worked for me.
Please Log in or Create an account to join the conversation.
- Daodan
OtisInf wrote: It's the official one (3.4.0). It's tricky indeed. I got it working when I enabled supersampling in-game. Disabling that and it doesn't work. I haven't tried the depth buffer options now available in reshade btw, with supersampling disabled, so dunno if that can work around that. Did you try supersampling on ? (If you use gedosato to get a high resolution btw it will kill the depth buffer if you downsample through gedosato)
Thanks alot! It indeed works in 3.4.1 with in-game supersampling enabled. I usually use DSR, so i had supersampling disabled. Because using both of them is quite demanding on the GPU.
Please Log in or Create an account to join the conversation.
- Marty McFly
Also, since I finally had the opportunity to test the shader thanks to crosire's update, I cannot quite reproduce this:
OtisInf wrote: Performance is on par with qUINT ADOF, often faster (9 passes, 6 rings).
Used quality 6 (6 rings like in your text), picking a scene with about 30% stuff in focus and 70% out of focus, I get 276 fps with no effects whatsoever, 176 fps with qUINT DOF and 50 fps with yours. Using 8 rings, the difference is even stronger, 153 vs 32 fps. With no quality setting I can achieve comparable performance. You are rendering in 50% width 50% height, did you by accident measure performance with my DOF rendered in fullscreen because then the numbers are close (my DOF in fullres, quality 3 for both: my DOF 190ish fps, yours 170).
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
If you credit me, sure.Marty McFly wrote: Hey would you mind if I "borrow" that physical based focusing code at some point?
The numbers were indeed from a version older than the last one (and I measured in full res for both). Mine gives good results with 4-5 rings, no need to go higher in most cases. On 1920x1200 that runs around 9-10ms. When I measured it it was around 6-7ms but contained some bugs I had to fix later on which caused it to go slower. Yours ran at that time with the same results around 7-8ms sometimes as high as 12ms. Likely me having a value for the bokeh shape to a value higher than needed.Also, since I finally had the opportunity to test the shader thanks to crosire's update, I cannot quite reproduce this:
Used quality 6 (6 rings like in your text), picking a scene with about 30% stuff in focus and 70% out of focus, I get 276 fps with no effects whatsoever, 176 fps with qUINT DOF and 50 fps with yours. Using 8 rings, the difference is even stronger, 153 vs 32 fps. With no quality setting I can achieve comparable performance. You are rendering in 50% width 50% height, did you by accident measure performance with my DOF rendered in fullscreen because then the numbers are close (my DOF in fullres, quality 3 for both: my DOF 190ish fps, yours 170).OtisInf wrote: Performance is on par with qUINT ADOF, often faster (9 passes, 6 rings).
I'm not rendering at 50% width/height though. Only the near plane blurred CoC texture is 50% width/height. All buffers are full res.
texture texCDBuffer1 { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = RGBA8; };
texture texCDBuffer2 { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = RGBA8; };
texture texCDBuffer3 { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = RGBA8; };
(edit)
Tweaking things a bit, I noticed having a 'continue' is terrible for performance. I expected the compiler to emit fast code with jumps but apparently that's not the case!
So:
if(!someBool)
{
continue;
}
// code
is much slower than
if(someBool)
{
// code
}
As I don't want to cut any corners for quality as that's my top priorioty, it's reasonable I think. It's taxing of course, but that's always the case with these kind of effects at full res.
(edit) now around 6.7ms for 5 rings. If I remove the if altogether and do the texture fragment read always, and depending on the boolean expression of the if I removed, multiply the weight with either 0 or 1, it's faster.
github.com/FransBouma/reshade-shaders/bl...CinematicDOF.fx#L437
I knew if statements could slow things down but that they're this inefficient is quite remarkable Will use it a bit to see if this didn't introduce any more bugs.
Please Log in or Create an account to join the conversation.
- Marty McFly
Please Log in or Create an account to join the conversation.
- OtisInf
- Topic Author
Thing is, the gaussian post pass isn't very slow, < 0.5ms at most. So optimizing it further won't help much.Marty McFly wrote: You can tune your post gaussian to calculate the samples on the fly, otherwise you apply a large kernel on 2x2 pixels, that'll further improve performance.
btw:
To clarify: you may, as it's licensed under the BSD2 license, but you have to obey that license when borrowing that code, s copy the whole license in your shader file as instructed. A simple "some code by Frans Bouma" won't cut it.Marty McFly wrote: Hey would you mind if I "borrow" that physical based focusing code at some point?
Please Log in or Create an account to join the conversation.
- crosire
You are writing code for the GPU, branches are the single worst thing you can do (anything that splits the code path, so that's ifs, fors, whiles, ... unless they can be unrolled which the compiler will try to do as best as possible). With the massive parallelism the GPU does multiple threads are always executed in a group (on NVIDIA this is called a warp of 32 threads, on AMD I believe it's called a wavefront of 64 threads). These run instructions in lockstep, meaning they always run the same instruction at the same time. As soon as you branch and one thread jumps to a different target than another in the same group, that thread has to wait for the first thread to finish before it continues, and then the first thread has to wait for that thread to finish going through its instructions until all threads converge to the same location again (after the if/else). If you are very unlucky and a single thread in a warp branches to a different location than all other threads, this effectivly means you force 31 threads to do absolutly nothing. That kills performance. So, it's important to remember that GPUs work very differently from CPUs (which are optimized for very fast branching via a lot of horse power put into branch prediction).OtisInf wrote: Tweaking things a bit, I noticed having a 'continue' is terrible for performance. I expected the compiler to emit fast code with jumps but apparently that's not the case!
Please Log in or Create an account to join the conversation.