String handling in assembly is - anyway - a difficult subject. There are few
string-oriented x86 opcodes, and most of them are slow. There is not a standard
library providing even basic functions. There is no string specific syntax in
assembly, like C's printf('hello world') or, even worse, BASIC's a$=b$+'hello'.
In a few words, if easy string-related programming is your goal, maybe you
should consider PERL, or another text-manipulation language.
I. INTRODUCTION
Beware: this is going to be long...
String handling in assembly is - anyway - a difficult subject. There are few
string-oriented x86 opcodes, and most of them are slow. There is not a standard
library providing even basic functions. There is no string specific syntax in
assembly, like C's printf('hello world') or, even worse, BASIC's a$=b$+'hello'.
In a few words, if easy string-related programming is your goal, maybe you
should consider PERL, or another text-manipulation language.
Yet, string functions are really needed, since almost any program in assembly
uses text for I/O. (An alternative to this would be using animated paper-clips
to communicate with the user :)).
Furthermore, coding those functions in assembly allows for smaller and faster
functions. Actually many of the string functions in C were written in
assembly (e.g. strlen, strcat, strcpy, etc). Those can be divided in two
categories:
-'Traditional' functions, using the x86 string instructions
-'Modern' functions, which run faster by being Pentium-optimized
Borland C++ 4.02 and KERNEL32.DLL only have traditional functions. Borland's
C++ Builder v1.0 (once given free as a demo) includes both types. MSVCRT.DLL
(version 5) contains 'modern' versions.
The three main aspects considered in these articles (and generally when comparing
different versions of the same function) are speed, size and common sense.
'Common sense' indicates how easy it is to understand the way a function
operates by reading the source code, how 'elegant' the code is. In a library
module distributed as a binary (in a 'static' reuse of code), common sense is
not important. It becomes important when the source code is distributed too,
because it allows 'dynamic' reuse. 'Elegant' code can be easily optimized for
specific needs or expanded to become a more general function.
'Size' is, obviously, the size of the resulting code. Besides creating smaller
files, small size has two interesting 'side-effects'. It (usually) creates more
elegant code and faster code (it decreases k, but it usually increases l (for
an explanation of k and l see 'speed'). For very small functions like strlen it
has the added advantage of allowing the code to be inlined without wasting too
much space, thus decreasing k even more.
'Speed' indicates the number of cycles needed to execute the function. For
simple string functions the number of cycles needed can be expressed as
c=k+l*n
where c is the total number of cycles, k is the number of cycles needed to
'prepare' the function, l is the number of cycles needed to process each character
and n is the number of characters in the string. It is obvious that small
values of c mean faster execution. In order to compare two versions of a
function that run at speeds of
c1=k1+l1n and c2=k2+l2n
the ratio of c1/c2 is calculated:
c1 k1+l1n
r=----=---------
c2 k2+l2n
if r=1 then both versions run at the same speed.
if r>1 then version 2 is faster. if rl2, c1
|