ATTENTION I:
This is based on Borland's C++ 4.02. Whenever possible I've checked it with any
other library / program containing the specific functions, but differences may
exist between this and your version of C. Also this is strictly 32-bit code,
Windows compiler. No DOS or UNIX.]
ATTENTION II:
Size comparisons are extremely easy to do. Speed comparison's aren't. The differences
in speed I give are based on RDTSC timings, but they DON'T take into
account extreme cases. That's why I don't give exact clock cycles. Of course if
you need exact clock cycles for your Pentium II, you can always buy me one :)
The C language offers three functions to convert an integer to ASCII:
char *itoa(int value, char *string, int radix);
char *ltoa(long value, char *string, int radix);
char *ultoa(unsigned long value, char *string, int radix);
_itoa and _ltoa do exactly the same thing. This is because an integer is a
long in 32-bit code. Yet they are different: _itoa has some completely
useless code in it (in 16bit this code would sign-extend value if radix=10).
Yet the result is always the same, so _ltoa from here on means both _ltoa and
_itoa. _ultoa is exactly the same as _ltoa and _itoa, except when radix=10 and
value < 0.
Anyway all these functions call this function:
___longtoa(value, *string, radix, signed, char10)
The first three parameters are passed 'as is', signed is set to 1 by _ltoa if
radix=10 else it is set to 0 and char10 is the character that corresponds to 10
if radix>10, and is always set to 'a' (___longtoa is also used by printf, which
has an option to have uppercase chars in Hex).
___longtoa does the following (and it does it with badly written code):
- Checks that 2<=radix<=36, if it isn't returns '0'
- If signed=1 and value<0 add a '-' to the string and neg the value
- Loop1: create a pseudo-string in the stack, reversed
- Loop2: convert and copy the pseudo-string into string
The check on radix is necessary because:
radix=0 would generate an INT0 (divide by zero)
radix=1 would put the program in an infinite loop, destroying the stack
radix=37 for value=36 would return '}', the character after 'z'
The two loops are necessary because of the way the conversion is done (see code
later). To implement a single-loop conversion, the number of digits should be
calculated in advance, which results in less efficient code (the number of
digits in value is n=(int)(log(value)/log(radix))+1, but using one more loop is
much faster).
Including the disassembly of C's functions would create a really large article,
and anyway they're just examples of really bad code. So straight to the result:
ltoa proc
cmp dword ptr [esp+0Ch], 10
sete ch
mov cl, 'a'-'0'-10
jmp short longtoa
- ultoa
-
mov cx, 'a'-'0'-10
- longtoa
-
push ebx
push edi
push esi
sub esp, 24h
mov ebx, [esp+3Ch] ; radix
mov eax, [esp+34h] ; value
mov edi, [esp+38h] ; string
cmp ebx, 2
jl short _ret
cmp ebx, 36
jg short _ret
or eax, eax
jge short skip
cmp byte ptr ch, 0 ; _ltoa ?
jz short skip
mov byte ptr [edi], '-'
inc edi
neg eax
skip: mov esi, esp
loop1: xor edx, edx
div ebx
mov [esi], dl
inc esi
or eax, eax
jnz loop1
loop2: dec esi
mov al, [esi]
cmp al, 10
jl short nochar
add al, cl
nochar: add al, '0'
stosb
cmp esi, esp
jg short loop2
_ret: mov byte ptr [edi], 0
mov eax, [esp+38h]
add esp, 24h
pop esi
pop edi
pop ebx
ret
ltoa endp
This is a 3 into 1 procedure. ltoa and ultoa take the same parameters as the
standard C functions. longtoa was changed to take from the stack the same
parameters as ltoa and ultoa, while signed and char10 are passed in CH and CL
respectively. This way ltoa and ultoa 'see' longtoa as 'their' code, not as a
different procedure (this is to avoid a common problem in C, procedures that
just 'forward' their parameters to another function).
This code compiles to 102 bytes (and it could be optimized to gain some more
bytes) whereas the standard C code takes 270 bytes. Specifically:
function C size Asm size
itoa 60 0
ltoa 40 12
ultoa 27 4
longtoa 143 86
------ ------
total 270 102
It also runs 2x faster than ltoa. And of course, this is a fully C-compatible
version of ltoa and ultoa. Of course it can be changed from C-compatible to
suit specific needs (e.g make it stdcall instead of cdecl, or if speed and size
are needed remove the check for the radix, and so on...)
Anyway, it is rather strange that you'll ever use values of radix other than 2,
8, 10 or 16. So if speed or size is of essence, a better, more specific routine
can be written. For example, consider this routine which stores the value of
EAX as a binary number at the address specified by EDI:
ultob proc
mov ecx, 32
more1: shl eax, 1
dec ecx
jc more2
jnl more1
more2: setc dl
add dl, '0'
shl eax, 1
mov [edi], dl
inc edi
dec ecx
jnl more2
mov [edi], al
ret
ultob endp
This runs 14x faster than C ltoa, and 7x faster than Asm ltoa, and is only 29
bytes long. But this article is long enough, so wait for another article on
specific 'ltoa' functions (who knows, maybe if I decide to write a 'printf'
function in Asm, which would use them...).
|