Here is a little brain teaser .. First, the original code:
next = (next+1) % (RINGBUFFER_SIZE);
Next is an unsigned 16 bit int. The idea here is that there is a ring buffer and when next goes past the end of the ring buffer we need it to wrap around and become zero.
The compiler generates a DIV instruction for this which is nice and compact, and leaves the remainder in a register. But the DIV instruction is horrible - something like 80 cycles to compute.
So the next, improved version of the code looks like this:
next++;
if ( next == RINGBUFFER_SIZE ) next = 0;
And this is what the code gen looks like:
cmp
jne skiparound
mov next, 0
skiparound:
It is a few more bytes of code, but the jump at worst case takes around 16 cycles. This is a win.
Now to make this even faster, consider the following - if the jump is taken it costs approximately 16 cycles. But if the jump is not taken, the cost is only four cycles. You want to optimize for 'branch not taken'. Well even though the condition that I checked for above (next is at the end of the ring buffer) is less likely than next is a safe value, the generated code is opposite because the jmp instruction has to branch around the infrequently used code.
So the challenge is, how to rewrite this in such a way that the generated code causes a branch less often than it does now.
I want the compiler to generate the branch opposite of what it does now - it should branch if next == RINGBUFFER size to somewhere else, set next to zero there, and then execute another branch back. That would make the common case fall through the branch, which is far faster, at the cost of doing two branches in the uncommon case. (One branch to get you to the extra code, and one branch to get you back.) But I can't think of a clever way to do this without dropping into assembler.
(The situation with 'if-then-else' constructs is even worse. Even if you code the branch the correct way to make the common case fall though, you still need to jump around the else leg. This seems to be the way the compilers like to generate the code.)
Any ideas on how to write this in C without dropping into assembly?
Regards,
Mike
next = (next+1) % (RINGBUFFER_SIZE);
Next is an unsigned 16 bit int. The idea here is that there is a ring buffer and when next goes past the end of the ring buffer we need it to wrap around and become zero.
The compiler generates a DIV instruction for this which is nice and compact, and leaves the remainder in a register. But the DIV instruction is horrible - something like 80 cycles to compute.
So the next, improved version of the code looks like this:
next++;
if ( next == RINGBUFFER_SIZE ) next = 0;
And this is what the code gen looks like:
cmp
jne skiparound
mov next, 0
skiparound:
It is a few more bytes of code, but the jump at worst case takes around 16 cycles. This is a win.
Now to make this even faster, consider the following - if the jump is taken it costs approximately 16 cycles. But if the jump is not taken, the cost is only four cycles. You want to optimize for 'branch not taken'. Well even though the condition that I checked for above (next is at the end of the ring buffer) is less likely than next is a safe value, the generated code is opposite because the jmp instruction has to branch around the infrequently used code.
So the challenge is, how to rewrite this in such a way that the generated code causes a branch less often than it does now.
I want the compiler to generate the branch opposite of what it does now - it should branch if next == RINGBUFFER size to somewhere else, set next to zero there, and then execute another branch back. That would make the common case fall through the branch, which is far faster, at the cost of doing two branches in the uncommon case. (One branch to get you to the extra code, and one branch to get you back.) But I can't think of a clever way to do this without dropping into assembler.
(The situation with 'if-then-else' constructs is even worse. Even if you code the branch the correct way to make the common case fall though, you still need to jump around the else leg. This seems to be the way the compilers like to generate the code.)
Any ideas on how to write this in C without dropping into assembly?
Regards,
Mike