Statistics

Members: 1927
News: 293
Web Links: 1
Visitors: 3932312

Who's Online

Damn Vulnerable LinuxDamn Vulnerable Linux (DVL) is a Linux-based (modified Damn Small Linux) tool for IT-Security & IT-Anti- Security and Attack & Defense. [CLICK HERE FOR MORE INFOS! ]

Featured Conference Video

T16-Recon2006-Joe_Stewart-OllyBonE.gif OllyBone - Semi-Automatic Unpacking on IA-32. View the conference video here!
Home arrow Submit Your Paper!
The _itoa, _ltoa and _ultoa functions
User Rating: / 0
PoorBest 
Written by Xbios2   


ATTENTION I:
This is based on Borland's C++ 4.02. Whenever possible I've checked it with any other library / program containing the specific functions, but differences may exist between this and your version of C. Also this is strictly 32-bit code, Windows compiler. No DOS or UNIX.]

ATTENTION II:
Size comparisons are extremely easy to do. Speed comparison's aren't. The differences in speed I give are based on RDTSC timings, but they DON'T take into account extreme cases. That's why I don't give exact clock cycles. Of course if you need exact clock cycles for your Pentium II, you can always buy me one :)

 

 

 

The C language offers three functions to convert an integer to ASCII:

char *itoa(int value, char *string, int radix); char *ltoa(long value, char *string, int radix); char *ultoa(unsigned long value, char *string, int radix);

_itoa and _ltoa do exactly the same thing. This is because an integer is a long in 32-bit code. Yet they are different: _itoa has some completely useless code in it (in 16bit this code would sign-extend value if radix=10). Yet the result is always the same, so _ltoa from here on means both _ltoa and _itoa. _ultoa is exactly the same as _ltoa and _itoa, except when radix=10 and value < 0.

Anyway all these functions call this function:

___longtoa(value, *string, radix, signed, char10)

The first three parameters are passed 'as is', signed is set to 1 by _ltoa if radix=10 else it is set to 0 and char10 is the character that corresponds to 10 if radix>10, and is always set to 'a' (___longtoa is also used by printf, which has an option to have uppercase chars in Hex).

___longtoa does the following (and it does it with badly written code):

  1. Checks that 2<=radix<=36, if it isn't returns '0'
  2. If signed=1 and value<0 add a '-' to the string and neg the value
  3. Loop1: create a pseudo-string in the stack, reversed
  4. Loop2: convert and copy the pseudo-string into string

The check on radix is necessary because: radix=0 would generate an INT0 (divide by zero) radix=1 would put the program in an infinite loop, destroying the stack radix=37 for value=36 would return '}', the character after 'z'

The two loops are necessary because of the way the conversion is done (see code later). To implement a single-loop conversion, the number of digits should be calculated in advance, which results in less efficient code (the number of digits in value is n=(int)(log(value)/log(radix))+1, but using one more loop is much faster).

Including the disassembly of C's functions would create a really large article, and anyway they're just examples of really bad code. So straight to the result:

ltoa proc

        cmp     dword ptr [esp+0Ch], 10
sete    ch
mov     cl, 'a'-'0'-10
jmp     short longtoa
ultoa

mov cx, 'a'-'0'-10

longtoa

push ebx push edi push esi sub esp, 24h mov ebx, [esp+3Ch] ; radix mov eax, [esp+34h] ; value mov edi, [esp+38h] ; string cmp ebx, 2 jl short _ret cmp ebx, 36 jg short _ret or eax, eax jge short skip cmp byte ptr ch, 0 ; _ltoa ? jz short skip mov byte ptr [edi], '-' inc edi neg eax skip: mov esi, esp

loop1:  xor     edx, edx
div     ebx
mov     [esi], dl
inc     esi
or      eax, eax
jnz     loop1
loop2:  dec     esi
mov     al, [esi]
cmp     al, 10
jl      short nochar
add     al, cl
nochar: add     al, '0'
stosb
cmp     esi, esp
jg      short loop2
_ret:   mov     byte ptr [edi], 0
mov     eax, [esp+38h]
add     esp, 24h
pop     esi
pop     edi
pop     ebx
ret

ltoa endp

This is a 3 into 1 procedure. ltoa and ultoa take the same parameters as the standard C functions. longtoa was changed to take from the stack the same parameters as ltoa and ultoa, while signed and char10 are passed in CH and CL respectively. This way ltoa and ultoa 'see' longtoa as 'their' code, not as a different procedure (this is to avoid a common problem in C, procedures that just 'forward' their parameters to another function).

This code compiles to 102 bytes (and it could be optimized to gain some more bytes) whereas the standard C code takes 270 bytes. Specifically:

function C size Asm size

itoa          60           0
ltoa          40          12
ultoa         27           4
longtoa      143          86
------      ------
total   270         102

It also runs 2x faster than ltoa. And of course, this is a fully C-compatible version of ltoa and ultoa. Of course it can be changed from C-compatible to suit specific needs (e.g make it stdcall instead of cdecl, or if speed and size are needed remove the check for the radix, and so on...)

Anyway, it is rather strange that you'll ever use values of radix other than 2, 8, 10 or 16. So if speed or size is of essence, a better, more specific routine can be written. For example, consider this routine which stores the value of EAX as a binary number at the address specified by EDI:

ultob proc

        mov     ecx, 32
more1:  shl     eax, 1
dec     ecx
jc      more2
jnl     more1
more2:  setc    dl
add     dl, '0'
shl     eax, 1
mov     [edi], dl
inc     edi
dec     ecx
jnl     more2
mov     [edi], al
ret

ultob endp

This runs 14x faster than C ltoa, and 7x faster than Asm ltoa, and is only 29 bytes long. But this article is long enough, so wait for another article on specific 'ltoa' functions (who knows, maybe if I decide to write a 'printf' function in Asm, which would use them...).