Can't beat BCD, but here's my attempt to optimize the binary version.

Timings as documented + 4 clocks per byte, except after DIV:

Code:

; DX:AX=integer less than 100_000_000
; ES:DI=>string buffer
; ret buffer filled (always 8 digits)
mov cx, 10000 ;16
div cx ;152
mov bx, dx ;2
xor dx, dx ;3
mov cx, 100 ;16
div cx ;152
mov cl, 10 ;4
div cl ;88
add ax, '00' ;4
stosw ;11
xchg ax, dx ;7
div cl ;88
add ax, '00' ;4
stosw ;11
xchg ax, bx ;7
mov cl, 100 ;12
xor dx, dx ;11
div cx ;152
mov cl, 10 ;4
div cl ;88
add ax, '00' ;4
stosw ;11
xchg ax, dx ;7
div cl ;88
add ax, '00' ;4
stosw ;11
; = 957

Originally Posted by

**deathshadow**
Though... wasn't there a trick for doing conversion to any bit-depth "better" using AAM?

AAM just divides AL by a constant (which is 10 in the documented version) - it isn't any faster than a normal DIV.

## Bookmarks