Optimize AttributeBuffer to OutputVertex conversion (#3283)

Optimize AttributeBuffer to OutputVertex conversion

First I unrolled the inner loop, then I pushed semantics validation
outside of the hotloop.

I also added overflow slots to avoid conditional branches.

Super Mario 3D Land's intro runs at almost full speed when compiled with
Clang, and theres a noticible speed increase in MSVC. GCC hasn't been
tested but I'm confident in its ability to optimize this code.
This commit is contained in:
Dwayne Slater 2018-01-02 18:32:33 -05:00 committed by Yuri Kunde Schlesner
parent 3f7f2b42c0
commit 41929371dc
4 changed files with 34 additions and 18 deletions

View file

@ -50,6 +50,7 @@ struct OutputVertex {
INSERT_PADDING_WORDS(1);
Math::Vec2<float24> tc2;
static void ValidateSemantics(const RasterizerRegs& regs);
static OutputVertex FromAttributeBuffer(const RasterizerRegs& regs,
const AttributeBuffer& output);
};