Lcc-win32 is a free C compiler system. It features an IDE, a resource compiler, a linker, librarian, a windowed debugger, and other goodies.
Here, I would like to describe a special feature of lcc-win32 that will be
surely appreciated by the colleagues that use assembly.
Lcc-win32 understands special macro definitions called intrinsics.This
constructs will be seen as normal function calls by the front end of the
compiler, but will be inline expanded by the back-end.
You can add your own intrinsic macros to the system, allowing you to use the
power and speed of assembly language within the context of a more powerful and
safer high level language.
I will present here two examples, to give you an idea of how this can look like.
You will need the source code of lcc-win32, that can be obtained at the home
page: {http://ps.qss.cz/lcc} or {ftp://ftp.cs.virginia.edu/pub/lcc-win32}
Inlining the strlen function
Lets assume the strlen function of the C library is just to slow for you.
Instead of generating:
pushl Arg
call _strlen
addl $4,%esp
you would like to generate inline the following code:
; Inlined strlen. The input argument is in ECX and points to the
; character string
orl $-1,%eax
loop:
inc %eax
cmpb $0,(%ecx,%eax)
jnz loop
This function then, should be inlined by the compiler. The C interface would be:
_strlen(str);
The prototype must be:
extern _stdcall _strlen(char *);
The compiler recognizes intrinsic macros because they have an underscore as the
first character of their names, they are declared _stdcall, and they appear in
the intrinsics table. Functions that begin with an underscore are few, and this
avoids looking up the intrinsics table for each function call, what would slow
down compilation speed.
You take then the file intrin.c, in the sources of lcc-win32 and modify the
intrinsics table. Its declaration is in the middle of the file, and looks like
this:
static INTRINSICS intrinsicTable[] = {
{"fsincos",2, 0, fsincos, NULL },
{"bswap", 1, 0, bswap, bswapArgs },
... many declarations omitted ...
{"reduceLtb",3, 0, redCmpLtb, paddArgs },
{"mmxDotProduct",3,0, mmxDotProd, paddArgs },
{"_emms",0, 0, emms, NULL },
{NULL, 0, 0, 0, 0 }
};
You add before the last line, the following line:
{"_strlen",1, 0, strlenGen, strlenArgs },
telling the system that you want an intrinsic called _strlen, that takes one
argument, whose code will be generated by the function strlenGen(), and the
arguments assigned to their respective registers in the function strlenArgs().
This functions should assign the registers in which you want the arguments to
the inline macro, and generate the code for the body of the macro. Basically,
this macros are seen as special calls by the compiler, that instead of
generating a push instruction, will call your <arguments> function, that should
set the right fields in each node passed to it, to make later the code generator
generate a move to the registers specified.
Note that all intrinsics should start with an underscore to avoid conflicting
with user space names.
When a call to this function is detected by the compiler, you will first be
called when pushing the arguments at each call site. Here is the function
strlenArgs() then:
static Symbol strlenArgs(Node p)
{
Symbol r=NULL;
//The global ArgumentsIndex is zero before each call. The compiler
//takes care of that.
switch (ArgumentsIndex) {
case 0: // First argument pushed, from right to left!
if (p->x.nestedCall == 0) {
Symbol w;
r = SetRegister(p,intreg[ECX]);
}
break;
}
// We have seen another argument
ArgumentsIndex++;
// Assign the register to this expression.
if (p->x.nestedCall == 0 && r)
p->syms[2] = r;
// Should never be more than one arguments
if (ArgumentsIndex == 1)
ArgumentsIndex = 0;
return r;
}
You see that in several places we have the test:
if (p->x.nestedCall == 0)
This means that we should check if we have a nested call sequence within the
arguments, i.e. the following C expression:
strlen( SomeFunction() );
True, in the case of strlen this doesnt change anything important, the result
of the function will be in EAX anyway. But suppose you defined a macro that
takes two arguments, say, some special form of addition sadd(a,b).
In this case we would assign the second argument (from left to right) to ECX,
and the first to EAX. Consider then the case of:
sadd( SomeFunction(),5);
If we would just assign 5 to ECX, then the call to SomeFunction(), would
destroy the contents of ECX during the call!
This means that when the compiler detects a call within argument passing, all
arguments WILL BE in the stack, and our code generating function should take
care of popping them into the right registers before proceeding.
In the case of strlen this can really hardly happen, but its important to see
how this would work in the general case.
Note too that the argument function should increase the global argument counter
for each argument, and reset it to zero when its done. Again, this is not
necessary for strlen, but for macros that take more arguments this should be
done imperatively.
The SetRegister function takes care of the details of assigning a register.
Here is its short body:
Symbol SetRegister(Node p,Symbol r)
{
Symbol w;
w = p->kids[0]->syms[2];
if (w->x.regnode == NULL || w->x.regnode->vbl == NULL)
p->kids[0]->syms[2] = r;
return r;
}
This function tests that in the given node, the left child isn't already
assigned to a register. It will assign the register only if this is not the
case. Otherwise, the compiler will generate the move.
We come now to the center of the routine: Generating code for the strlen
utility.
static Symbol strlenGen(Node p)
{
static int labelCount;
// OK, the first thing to do is to see if we should pop our arguments.
// If that is the case, pop them into the right registers.
if (p->x.nestedCall) {
print("\tpopl\t%%ecx\n");
}
/*
Here we generate the code for the strlen routine. Note that the % sign is used
by the assembler of lcc-win32 to mark a register keyword, but our print()
function uses it too to mark (as printf) the beginning of an argument. We must
double them to get around this collision.
- Set the counter to minus one
*/
print("\torl\t$-1,%%eax\n");
/*
- We should generate the label for this instance. All labels must be unique,
and the easiest way to ensure that we always generate a new label is to number
them consecutively using a counter. To avoid colliding with other labels, we
use a unique prefix too.
*/
print("$strlen%d:\n",labelCount);
/*
- Now we generate the code for the body of the loop searching for the
character zero.
*/
print("\tinc\t%%eax\n");
/ 4) Note the dollar before the immediate constant./
print("\tcmpb\t$0,(%%ecx,%%eax)\n");
/*
- We generate the jump, incrementing our loop counter afterwards
*/
print("\tjnz\t$strlen%d\n",labelCount++);
/*
Now we are done, the result is in eax, as it should. We finish our function.
Note that no pops are needed, since the ones we did at the beginning
(eventually) are just to compensate for the pushs the compiler generated.
Note too that we shouldn't insert a return statement since this is a macro
that shouldn't cause the current function to return!
*/
}
We compile the compiler, and we obtain a new compiler that will recognize the
macro we have just created. Compiling the compiler with itself is a good test
for your new function of course. This should be done at least three times to
be sure that your function is working OK.
Register assignments
In general, you can use ECX, EDX, and EAX as you wish. The contents of EBX,
ESI, EBP and EDI should always be saved. If you destroy them unpredictable
results will surely occur.
Lets write a test function for our new compiler:
#include <stdio.h>
#ifdef MACRO
int _stdcall _strlen(char *);
#define strlen _strlen
#else
int strlen(char *);
#endif
int main(int argc, char *argv[])
{
if (argc > 1)
printf("Length of \"%s\" is %d\n", argv[1],
strlen(argv[1]));
return 0;
}
In the C source, we use the conditional MACRO to signify if we should use our
macro, or just generate a call to the normal strlen procedure for comparison
purposes. We compile this with our new compiler, and add the S parameter to see
what is generating.
lcc -S DMACRO tstrlen.c
The assembly (that the compiler writes in tstrlen.asm) is then:
- main
-
pushl %ebp
movl %esp,%ebp
pushl %edi
.line 9
.line 10
cmpl $1,8(%ebp)
jle _$2
.line 11
movl 12(%ebp),%edi
; Our argument gets assigned to ECX, as our strlenArgs function
; defined
movl 4(%edi),%ecx
; Here is the begin of our macro body
orl $-1,%eax
; This is our generated label
_$strlen0:
inc %eax
cmpb $0,(%ecx,%eax)
jnz _$strlen0
; Our macro ends here, leaving its results in EAX
pushl %eax
movl 12(%ebp),%edi
pushl 4(%edi)
pushl $_$4
call _printf
addl $12,%esp
_$2:
.line 12
xor %eax,%eax
.line 13
popl %edi
popl %ebp
ret
We see that there is absolutely no call overhead. The arguments are assigned to
the right registers in our function strlenArgs, and the body is expanded
in-line by strlenGen.
Next, we link our executable:
D:\lcc\src74\test>lcclnk tstrlen.obj
And we run a test:
D:\lcc\src74\test>tstrlen abcde
The length of "abcde" is 5
D:\lcc\src74\test>
Here is the strlenGen() function again for clarity.
static void strlenGen(Node p)
{
static int labelCount;
if (p->x.nestedCall) {
print("\tpopl\t%%ecx\n");
}
print("\torl\t$-1,%%eax\n");
print("$strlen%d:\n",labelCount);
print("\tinc\t%%eax\n");
print("\tcmpb\t$0,(%%ecx,%%eax)\n");
print("\tjnz\t$strlen%d\n",labelCount++);
}
Another example: inlining the strchr function
To demonstrate a function with two arguments, we inline the strchr function.
This function should return a pointer to the first occurrence of the given
character in a string, or NULL, if the character doesnt appear in the string.
The implementation could be like this :
- strchr
-
movb (%eax),%dl // read a character
cmpb %cl,%dl // compare it to searched for char
je _strchrexit // exit if found with pointer to char as result
incl %eax // move pointer to next char
orb %dl,%dl // test for end of string
jne strchr // if not zero continue loop
xorl %eax,%eax // Not found. Zero result
strchrexit :
We just scan the characters looking for either zero (end of the string) or the
given char. The pointer to the string will be in EAX, and the character to be
searched for will be in ECX. We use EDX as a scratch register.
The next step is then, to write the strchr function for assigning the arguments.
Here it is :
static Symbol strchrArgs(Node p)
{
Symbol r=NULL;
switch (ArgumentsIndex) {
case 0: // First argument (from right to left) char to be searched.
// We put it in ECX
if (p->x.nestedCall == 0) {
r = SetRegister(p,intreg[ECX]);
}
break;
case 1: // Second argument: pointer to the string. We put it in EAX
if (p->x.nestedCall == 0) {
r = SetRegister(p,intreg[EAX]);
}
break;
}
ArgumentsIndex++;
if (p->x.nestedCall == 0)
p->syms[2] = r;
if (ArgumentsIndex == 2)
ArgumentsIndex = 0;
return r;
}
The next step is finally to write the generating function. Here it is; note
that we need two labels:
static void strchrGen(Node p)
{
static int labelCount;
if (p->x.nestedCall) {
print("\tpopl\t%%ecx\n");
}
print("$strchr%d:\n",labelCount);
print("\tmovb\t(%%eax),%%dl\n");
print("\tcmpb\t%%cl,%%dl\n");
print("\tje\t$strchr%d\n",labelCount+1);
print("\tinc\t%%eax\n");
print("\torb\t%%dl,%%dl\n");
print("\tjne\t_$strchr%d\n",labelCount);
print("\txorl\t%%eax,%%eax\n");
print("_$strchr%d:\n",labelCount+1);
labelCount += 2;
}
This facility is not very common in a compiler system, and it allows you to
use assembly language in the routines that are really needed in a software
system, leaving to the compiler the tedious work of generating the assembly
for you in the 90% of the code where speed is not so important after all.
Another benefit is that you can't do simple mistakes when passing arguments
to your assembler macros since they are understood as function calls by the
compiler, and all prototype checking is done by the front end. If you attempt
to use the strchr macro like this:
strchr('\n",string);
the compiler will issue an error.
The lcc-win32 system can be downloaded free of charge from
{http://ps.qss.cz/lcc}
|