Optimize AttributeBuffer to OutputVertex conversion (#3283)
Optimize AttributeBuffer to OutputVertex conversion First I unrolled the inner loop, then I pushed semantics validation outside of the hotloop. I also added overflow slots to avoid conditional branches. Super Mario 3D Land's intro runs at almost full speed when compiled with Clang, and theres a noticible speed increase in MSVC. GCC hasn't been tested but I'm confident in its ability to optimize this code.
This commit is contained in:
parent
3f7f2b42c0
commit
41929371dc
4 changed files with 34 additions and 18 deletions
|
@ -87,6 +87,8 @@ struct RasterizerRegs {
|
|||
BitField<8, 5, Semantic> map_y;
|
||||
BitField<16, 5, Semantic> map_z;
|
||||
BitField<24, 5, Semantic> map_w;
|
||||
|
||||
u32 raw;
|
||||
} vs_output_attributes[7];
|
||||
|
||||
INSERT_PADDING_WORDS(0xe);
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue