Assembly language programing under Unix is highly undocumented. It is generally assumed that no one would ever want to use it because various Unix systems run on different microprocessors, so everything should be written in C for portability.
Now, we know that C portability is a myth. Even C programs need to be modified
when ported from one Unix to another, regardless of what processor each runs on.
I was pleasantly surprised when one of FreeBSD hackers recently posted an
assembly language 'Hello, World' program on the web. See
{http://home.ptd.net/~tms2/hello.html} for what he has to say.
There were two things I did not like in his example:
First of all, he uses the GNU assembler with its AT&T syntax. Talk about lack
of portability! Ever since I got involved in Unix programming, I switched from
MASM to NASM and never looked back. NASM allows me to use the same code for
Windows and Unix with only minor modifications needed wherever system calls are
necessary. Everything else remains the same. I also like the fact I can use
dots in the middle of a label.
Secondly, he uses a separate procedure for the system call. It looks like this
(in AT&T syntax):
- do syscall
-
int $0x80 # Call kernel.
ret
He says a direct use of int 80h would not work. I refused to believe it. And I
was right. The "problem" he is solving by using a separate procedure is the
fact that int 80h is optimized for the use with C programs which make calls to
functions like write() and read(). Because they make a call, an extra DWORD is
pushed on the stack before invoking int 80h.
His solution works, of course, but is unnecessary. All that is needed is
pushing an extra DWORD before invoking int 80h. The value pushed is irrelevant.
In my modification to his code, I simply pushed EAX and invoked int 80h. Then I
added an extra four bytes to ESP. I already had to increase it anyway because
int 80h uses C calling convention of receiving parameters on the stack and
leaving them there. It worked without a hitch.
I learned from his code that the value in EAX determines which system call int
80h makes. A list of these can be found in the C include file <sys/syscall.h>.
I then decided to experiment with his code a bit further, and create something
that actually does some work.
A typical Unix program is a filter which reads its input from stdin, writes its
output to stdout, and sends error messages to stderr. I decided to produce such
a filter for this article. Because I used tabs in my source code and needed to
convert them to spaces for this article, I made the filter convert tabs to
spaces. Because I started writing it under Windows and finished it under Unix,
I also made the filter strip any carriage returns.
It would be more useful if it could accept command line parameters, so you
could decide how many spaces a tab should expand to. Alas, I have no idea where
to find the command line under FreeBSD. If you know, please email me at
{
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
}. For now, the program simply assumes a tab stop is at
every 8th position.
The program uses ESI as a counter of where on the line it is. To calculate the
number of blanks to insert, it moves ESI to EAX, negates EAX, ands it with
seven, and adds 1. This works very well. Suppose you are at the beginning of
the line, i.e., at the first position. So, you turn 1 into -1, i.e.,
0FFFFFFFFh. And it with 7, you get 7. Increase that, and you know you need to
write 8 spaces.
I also used EDI as the pointer to the read/write buffer. I could have just
pushed its offset (push dword buffer) every time, but pushing a register
produces less code and is probably faster.
I chose ESI and EDI to hold persistent values (i.e., values that need to
survive the system call) because Unix system software uses the C convention of
preserving these two registers (as well as EBX and EBP).
In my first version I started the program with a PUSHAD and ended it a POPAD.
This is certainly needed in Windows programs: An assembly language program will
crash Windows if it returns to Windows with any of the four aforementioned
registers modified.
Then I thought that surely FreeBSD would not allow such a serious security hole
in the system. I removed the PUSHAD and the POPAD, and the program worked
without a hitch.
The result is below.
;---------------------------------------------------------------------------
; File: tab2sp.asm
;
; A sample assembly language program for FreeBSD.
; It converts tabs to spaces. Nothing new, expand
; already does that and with more options.
;
; But it illustrates reading from stdin, and writing
; to stdout and stderr in assembly language.
;
; 05-May-2000
; Copyright 2000 G. Adam Stanislav
; All rights reserved
;
; {http://www.whizkidtech.net/}
; {http://www.redprince.net/}
;
; Assemble with nasm:
;
; nasm -f tab2sp.asm
; ld -o tab2sp tab2sp.o
section .data
buffer times 8 db ' '
errread db 'TAB2SP: Error reading input', 0Ah
erlen equ $-errread
align 4, db 0
errwrite db 'TAB2SP: Error writing output', 0Ah
ewlen equ $-errwrite
section .code
; ld expects every program to start with _start
global _start
_start:
; We use EDI and ESI to store persistent data
; because syscall will not modify them.
mov edi, buffer ; EDI = address of buffer
sub esi, esi ; ESI = counter
; NOTE:
;
; Because int 80h expects to be within a separate
; procedure, we need to push a fake return address
; before invoking it. It can be anything, so we
; just push EAX.
.read:
sub eax, eax
inc al
push eax ; size of "string"
push edi ; address of buffer
dec al
push eax ; stdin = 0
push eax ; "return address"
mov al, 3 ; SYS_read
int 80h ; syscall
add esp, byte 16 ; clean the stack after reading
or eax, eax
je .quit ; end of file reached
js .rerror ; read error...
; Decide what to do:
;
; If the byte is a carriage return, ignore it.
; If the byte is a newline, initialize ESI = 0.
; If the byte is a tab, convert it to spaces.
; Otherwise, just write it.
mov dl, [edi]
cmp dl, 0Dh ; carriage return
je .read
cmp dl, 0Ah ; new line
je .newline
inc esi
cmp dl, 09h ; tab
jne .write
; It's a tab. Expand it.
mov byte [edi], ' '
mov eax, esi
neg eax
and eax, 7
add esi, eax
inc eax
jmp short .write
.newline:
sub esi, esi
.write:
push eax ; size of "string"
push edi ; address of buffer
sub eax, eax
inc al
push eax ; stdout = 1
push eax ; "return address"
mov al, 4 ; SYS_write
int 80h ; system call
add esp, byte 16
or eax, eax
jns short .read
push dword ewlen
push dword errwrite
jmp short .err
.rerror:
push dword erlen
push dword errread
.err:
sub eax, eax
mov al, 2 ; stderr = 2
push eax
push eax ; "return address"
add al, al ; SYS_write
int 80h
add esp, byte 16
.quit:
sub eax, eax ; EAX = 0
push eax ; exit status
inc eax ; SYS_exit
push eax ; "return address"
int 80h
; Program ends here.
;--------------------------------------------------------------------------
|