How does performance mode actually work?

More
6 months 1 week ago #1 by Daemonjax
How does performance mode actually work? was created by Daemonjax
Does it just glob all the shaders together into one big shader file and let the optimizing compiler sort it out?  Or does it do more than that?

I'm asking because I'm wondering what kind of performance gains I could expect from manually merging all the shaders I use for a specific game.

Please Log in or Create an account to join the conversation.

More
6 months 1 week ago #2 by lordbean
Replied by lordbean on topic How does performance mode actually work?
Performance mode doesn't merge separate shaders, each active shader is built to a compiled version. The biggest difference is that setup options for the shaders as well as any calculations where a setup option is the only changeable input are changed to constants in the compiled code. For example, a global constant array with 4 values which is accessed by a setup option (a preset, essentially) compiles the full array when in setup mode, and compiles to a single value in performance mode, dropping all unused array elements.

Please Log in or Create an account to join the conversation.

More
6 months 4 days ago - 6 months 4 days ago #3 by Daemonjax
Replied by Daemonjax on topic How does performance mode actually work?
Thanks for the reply.

I meant to come back earlier and answer my own question, but I just didn't.

SO... Yeah, the performance increase can be very significant by merging shaders. The most obvious reason for this is by eliminating texture lookups.

The most low-hanging fruity example of this is merging the Vignette shader with virtually any other shader -- you basically get Vignette for free because after the tap it's only like 4  math instructions, saving like 0.170 ms from the total shader time over using a separate vignette shader.   So should basically every shader include some simple Vignette code?  Probably. ;)


if (enableVignette)
{
     float2 distance_xy = texcoord - float2(0.5f, 0.5f);
     distance_xy *= float2((BUFFER_RCP_HEIGHT / BUFFER_RCP_WIDTH), BUFFER_ASPECT_RATIO);
     const float dist= dot(distance_xy, distance_xy);
     color.rgb *= 1.0 - (dist * dist * 0.25); 
}
return float4(color.rgb, 1.0f);

The other way (related, but different) it increases performance is by giving it more math to do while waiting for the remaining texture lookups, so you basically get to do more math for "free".  And I believe there's a small performance hit for each pass, regardless of what it actually does -- so less passes is better. 

 
Last edit: 6 months 4 days ago by Daemonjax. Reason: yay I unfucked the formatting

Please Log in or Create an account to join the conversation.

More
6 months 4 days ago #4 by lordbean
Replied by lordbean on topic How does performance mode actually work?
That's actually a big part of why my HQAA shader includes optional built-in extras - several of the optionals work fine while squished into a single pass with each other, saving on texture lookups and of course the extra processing time of extra passes.

The big counter-argument to this is, of course, that most people don't have a complete enough understanding of the GPU pipeline to realize why this sort of thing can be such a benefit. That's why there are included pre-processor defines (which are set to disabled by default) that can be used to drop the extra code out of HQAA when it's compiled.

Please Log in or Create an account to join the conversation.

More
6 months 3 days ago - 6 months 3 days ago #5 by Daemonjax
Replied by Daemonjax on topic How does performance mode actually work?
Cool, you get it.

Oftentimes I spend a lot of time optimizing other people's shaders (it's fun for me!), but I'll quickly hit a wall where it's just waiting for texture lookups to complete and further reductions in math cycles have no effect on shader time.

Another fun merge I just completed was Quint's Sharpener and NFAA.  I basically use NFAA as a cheap specular highlights antialiaser, but that's besides the point.

NFAA needs 9 texture taps, 5 of which are already done by Quint's Sharpener.  The others are calculation dependent so there's no getting around them.  So, you get NFAA for basically the cost of 4 taps because a lot of the math gets folded in (not as much as I'd like but it is what it is), plus some compiler magic happens giving a nice reduction in total instructions (129 total instruction slots used)...

So NFAA for only 0.030 ms shader time cost (normally it's 0.170 ms even after I hand optimized the crap out of it)? Yes please!

There's definitely a lot of room still in that shader to squeeze in more math.

Plus if I ever want to do something with the depth buffer and NFAA, that would be basically free, too -- because I'm already using Quint's Sharpener's depth buffer feature (plus one I added myself because I don't want to sharpen distant objects very much).

 
Last edit: 6 months 3 days ago by Daemonjax.

Please Log in or Create an account to join the conversation.

More
6 months 2 days ago #6 by lordbean
Replied by lordbean on topic How does performance mode actually work?
Spending time hunting for optimizations is never a bad thing in my books. That's something I keep coming back to again and again in HQAA (it's a highly customized merger of SMAA and FXAA) since squishing several AA types together results in a slowish shader, no matter how fast each individual piece actually is. I'm constantly wondering in the back of my head where I can do things better to shave off some execution time.

Please Log in or Create an account to join the conversation.