Optimize AttributeBuffer to OutputVertex conversion (#3283)

Optimize AttributeBuffer to OutputVertex conversion

First I unrolled the inner loop, then I pushed semantics validation
outside of the hotloop.

I also added overflow slots to avoid conditional branches.

Super Mario 3D Land's intro runs at almost full speed when compiled with
Clang, and theres a noticible speed increase in MSVC. GCC hasn't been
tested but I'm confident in its ability to optimize this code.
This commit is contained in:
Dwayne Slater 2018-01-02 18:32:33 -05:00 committed by Yuri Kunde Schlesner
parent 3f7f2b42c0
commit 41929371dc
4 changed files with 34 additions and 18 deletions

View file

@ -87,6 +87,8 @@ struct RasterizerRegs {
BitField<8, 5, Semantic> map_y;
BitField<16, 5, Semantic> map_z;
BitField<24, 5, Semantic> map_w;
u32 raw;
} vs_output_attributes[7];
INSERT_PADDING_WORDS(0xe);