Lately I have been experimenting, using sort of JIT engine to generate machine code, the graph above is example of what, I can do.
The graph test performance of the code deepening on x and y factor, to find the ideal condition for the best speed.
X axes is the number of unrolls / float point register used (unrolls), the Y axes is the max number of code block per loop.
The test runs 64000 int to float conversion with a float point scale factor.
So what you see is that number of unrolls help, but if the code in loops gets to big, the speed goes down.
This kind of test if I wrote it by head takes a month, but as I'm generating the code. I can try different combinations in a few seconds.
The same kind of code generator test can be done on any type of assembler code, it works on AltiVec, FPU or CPU instructions.