Let me start by saying this -
Quote:
you cannot and will not ever write a program that is not crackable
. But that's not to say you can't make it extremely difficult!
That being said, let's begin. In this document, we are going to
learn some anti-debugging tricks that you can incorporate in your own
applications.
In my first article
for Osix I discussed Self Modifying Code, and attempting to throw off
disassemblers. Now we'll talk about defeating (or at least
discouraging) the debuggers.
Brief history of debuggers
Once upon a time there was Debug.com, and it was the first Windows
debugger. It was part of the regular MS-DOS package, and today it's
pretty much useless except for those studying assembly. The first
debuggers that were even slightly applicable for cracking appeared with
the 80286 processor. No real damage could be accomplished with them,
however. Then the situation changed with the 80386. Major complication
is software (Windows?) demanded better debuggers. It was at this time
that debuggers became a real threat to programmers, as they became much
more powerful. Softice emerged on the scene in the end of the 80's and
gave protected programs and their developer a lot of trouble. Since
then, Softice is hands-down the hackers debugger of choice, although
more and more lately Olly is gaining popularity among the younger
crackers.
In view of the current possibilities for analyzing applications, struggling against hackers is a useless occupation.
However, another serious threat comes from yesterday's beginners, who
have read a lot of different "how-to-crack-programs" FAQs. (Thank
goodness they are accessible to everyone.) These beginners now are
looking for something on which to test their powerful capabilities.
We'll try to throw off these beginners looking for a new challenge.
How debuggers work
Struggling against a debugger without knowing how it works would be
useless. It's important you understand exactly how the debugger works.
All debuggers can be categorized in one of two categories:
- ones that use the processor's debugging capabilities
- and ones that emulate the processor independently, monitoring the execution of the program being tested
So far to date, there are no high-quality emulating debuggers that
cannot be detected or bypassed by the code. And it's not likely we'll
see on anytime soon either.
And is it even worth developing such an emulator when we can
already step through code, control execution of an instruction at a
given address, monitor references to a given memory location (or to
input-output ports), signal task switching, and so on? I don't think
so. Anyways, moving on...
Okay, I am going to get a little deep here so try to follow me.
Debuggers will check to see if the trap bit ofthe flags register is
set. If so, an INT 1 debug exception is generated automatically after
each instruction and control is passed to the debugger. From here, your
code could detect tracing (debugging) by analysing the flags register.
So in order for debuggers to stay invisible, it needs to recognize the
instructions for reading the flags register, emulate their execution,
and return zero for the value of the trap flag. Much easier said than
done, huh?
There are four debug registers:
1. DR0
2. DR1
3. DR2
4. DR3
And they store the linear addresses of four checkpoints. Of course
there is a register that contains a condition for each of these points,
and that register is DR7. When any of the conditions are TRUE, the
processor throws an INT 1 exception and control is passed to the
debugger. There are four conditions:
1. An instruction is executed
2. The contents of a memory location are modified
3. A memory location is read or updated, but not executed
4. an input-output port is referenced
Now let's talk about software breakpoints. A software
breakpoint is the only thing that cannot be hidden without writing a
full-scale processor emulator. If you place this one byte of code --
0xCC at the beginning of an instruction, it will cause an INT 0x3
exception when an attempt is made to execute it. To discover whether at
least one point has been set, it is enough for the program being
debugged to count its checksum. To do this, it may use MOV, MOVS, LODS,
POP, CMP, CMPS, or any other instructions; no debugger is capable of
tracing and emulating any of them (as far as I know!).
Tracing, and how to overcome
Since the possibility of a completely invisible debugger is only a possibility, most can be detected.
Most debuggers use the 0xCC one byte(r).
Let's take a look at a simple protection scheme:
Listing 1. A Simple Protection Mechanism in C++
int main(int argc, char* argv[])
{
// The ciphered string "Hello, Free World!"
char s0[]="\x0C\x21\x28\x28\x2B\x68\x64\x02\x36\
\x21\x21\x64\x13\x2B\x36\x28\x20\x65\x49\x4E";
__asm
{
BeginCode: ; The beginning of the code
;
being debugged
pusha ; All general-purpose
;
registers are saved.
lea ebx, s0 ; ebx=&s0[0]
GetNextChar: ; do
xor eax, eax ; eax = 0;
lea esi, BeginCode ; esi = &BeginCode
le ecx, EndCode ; The length of code
sub ecx, esi ; being debugged is computed.
HarvestCRC: ; do
lodsb ; The next byte is loaded into al.
Add eax, eax ; The checksum is computed.
loop HarvestCRC ; until(--cx>0)
xor [ebx], ah ; The next character is decrypted.
Inc ebx ; A pointer to the next character
cmp [ebx], 0 ; Until the end of the string
jnz GetNextChar ; Continue decryption
popa ; All registers are restored.
EndCode: ; The end of the code being debugged
nop ; A breakpoint is safe here.
}
printf(I s0); ; The string is diplayed.
return 0;
}
Let's examine this code closely (read the comments). After starting
the program, the words Hello, Free World! should appear on the screen.
But when the program is run under the debugger, even with at least one
breakpoint set within the limits of BeginCode and EndCode, senseless
garbage like "Jgnnm."Dpgg"Umpnf#0" will show up on the screen! Kick
ass, huh?!? Now we are getting somewhere. You can even strengthen this
protection by putting the procedure that calculates the checksum in a
different thread.
Speacking of threads, they demand a special approach on things.
It's kinda hard for us humans to realize that a program can run in
several places simutaneously. And commonly used debuggers have a weak
point: They debug each thread separately, never simultaneously. The following example shows how this can be used for protection.
Listing 2. The Weakness of Debugging Threads Separately
// This function will be executing in a separate thread.
// Its purpose is to alter imperceptibly the case of the characters
// in the string that contains the user name.
void My(void *arg)
{
int p=1;
// This is a pointer to the byte being encrypted.
// Note that encryption is not carried out
// from the first byte, since this allows the breakpoint
// set at the beginning of the buffer to be bypassed.
// If the line feed is not encountered, execute.
while ( ((char *) arg) [p] !='\n')
{
// If the next character is not initialized, wait.
while( ((char *) arg) [p]<0x20 );
// The fifth bit is inverted.
// This toggles the case of the Latin characters.
((char *) arg) [p] ^=0x20;
// A pointer to the next byte being processed
p++;
}
}
int main(int argc, char* argv[])
{
char name[100];
// A buffer containing the user name
char buff[100];
// A buffer containing the password
// The buffer of the user name is stuffed with zeroes.
// (Some compilers do this, but not all.)
memset (&name[0], 0, 100);
// The My routine is executed in a separate thread.
_beginthread(&My, NULL, (void *) &name[0]);
// The user name is requested.
printf("Enter name:"); fgets(&name[0], 66, stdin);
// The password is requested.
// Note: While the user enters the password, the second
// thread has enough time to alter the case of all
// characters of the user name. This is not evident
// and does not follow from the analysis of the program,
// especially if it is studied under a debugger that poorly
// shows the mutual influence of the program's components.
printf("Enter password:"); fgets(&buff[0], 66, stdin);
// The user name and the password are compared
// with the reference values.
if (! (strcmp(&buff[0], "password\n")
// Note: Since the name entered by the user has been
// transformed to strcmp(&name[0], "Osix\n"),
// not strcmp(&name[0], "OSIX\n"), it is compared.
// (This is not apparent at first glance.)
|| strcmp(&name[0], "OSIX\n")))
// The correct name and password
printf("USER OK\n");
else
// Error: Wrong user name or password
printf("Wrong user or password!\n");
return 0;
}
Take a look at the listing. What's interesting is that initially
the program expects to receive OSIX:password. BUT the true answer is,
Osix:password. Let's examine this a bit. After the user enters the
username, the second thread processes the buffer that contains the username
and toggles the case (except the first char). So you see, when one
thread is traced, all the other thread are functioning
independently.And these other threads may intervene randomly in the
functioning of the thread being debugged (for example, to modify its
code). Ah....now this is beginning to have some possibilities!
Now there's something to consider. We know the threads can be
controlled, but if the protection developer places more than four
breakpoints, the debug registers become unreliable, and we're forced to
use the 0xCC byte, which we already know from reading above can be
easily detected.
This situation is made even worse by the debuggers, including the
famous Softice, when dealing with programs with structural exception
handling (SEH). The instruction that causes the exception being
processed either "defeats" the debugger, releasing itself from the
debugger's control, or passes control to the library exception filter,
which only passes control to the application's handler after it calls
several service functions that may "drown" the cracker.
But, when compared to early SoftIce versions, this is progress.
Before, SoftIce strictly held certain interrupts. For example, it would
not allow the program to process independently division by zero.
Let's have a look at some more code, huh? If the following example
is run under any SoftIce version through 4.05, the debugger, having
reached the int c=c/ (a-b) line, suddenly will abort execution, losing
the control over this application. You could correct this situation
though, by presetting the breakpoint on the first instruction of the
block __except. Then, the question is how to find the location of this
block without looking in the source code, which the hacker would not
have.
Listing 3. An Example That Employs Structural Exception Handling
int main(int argc, char* argv[])
{
// A protected block
__try{
int a=1;
int b=1;
int c=c/ (a-b);
// This is an attempt to divide by zero.
// Several statements are used because most compilers
// return an error after encountering a statement like
// int a=a/0;
// When SoftIce executes the following instruction, it loses
// control over the program being debugged. It "falls off" to
// code that never gains control but may be misleading.
// If the a and b variables are assigned values
// returned by some functions, not immediate values,
// their equality will not be obvious when
// the program is disassembled. As a result, the hacker
// may waste time analyzing useless code, hee hee ha!
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
// This code will gain control when the exception "division by
// zero" arises, but SoftIce will not recognize this
// situation. Instead, it will ask that a breakpoint be
// set manually on the first instruction of the block __except.
// To determine an address of the block __except, the hacker
// will have to figure out exactly how SEH support is
// implemented in a particular compiler.
}
}
In order for crackers to deal with such a protection as this,
they will have to study in depth how structural exceptions are
processed, both at the operating system level and at the level of a
particular compiler. Too much work for the casual cracker, wouldn't ya
say?
Since SEH is implemented differently in each compiler, it is not
surprising that SoftIce refuses to support it. Good for us programmers!
Bad for crackers.
Therefore, the previous example is highly resistant to breaking and, at the same time, is easy to implement. It works equally well under all operating systems of the Windows family, starting from Windows 95.
Getting Around Breakpoints
Breakpoints that are set on system functions are powerful
weapons in the hands of crackers. Suppose that protection tries to open
a key file. Under Windows, the only documented way of doing this is to
call the CreateFile function (actually, CreateFileA or CreateFileW for
the ASCII or UNICODE name of the file, respectively). All other
functions inherited from early Windows versions, such as OpenFile,
serve as wrappers to CreateFile.
Knowing this, the hacker may set a breakpoint on the starting
address of the beginning of this function (which is known to the
hacker), and instantly locate the protection code by calling this
function.
However, not all hackers know that the file can be opened in other ways:
by calling the ZwCreateFile (or NtCreateFile) function exported by
NTDLL.DLL, or by addressing the kernel directly via a call to the INT
0x2Eh interrupt. This is true not only for CreateFile, but for all
functions of the kernel. Interestingly, no privileges are needed for
this. Such a call can even originate from an application code.
This little trick won't stop the devoted cracker for long. Damn.
But it is worth placing this little time bomb in there (the call of INT
0x2E in the __try block ).
Now, what can be done with functions of the USER and GDI modules
(for example, GetWindowsText) that are used to read the user-entered
key information (as a rule, a serial number or a password)? Since we
know that all these functions begin with the PUSH EBP\MOV EBP, ESP
instructions, this can be executed independently by the application
code: Control can be passed not to the beginning of the function, but to three bytes lower
(Since PUSH EBP modifies the stack, control must be transferred using
JMP instead of CALL.) The breakpoint set at the beginning of the
function will not produce any effect. Such a trick may temporarily lead
even a skilled hacker astray.
Breakpoints can be divided into two categories: breakpoints built
into the program by the developer and dynamic breakpoints set by the
debugger itself. The first category is clear: To stop the program and
pass control to the debugger at a certain place, it is necessary to
write __asm{int 0x3}.
It is more complex to set a breakpoint in an arbitrary place of the
program. The debugger should save the current value of the memory
location at the specified address, then write the code 0xCC there.
Before exiting the debug interrupt, the debugger should return
everything to its former place, and should modify IP saved in the stack
so that it points to the beginning of the restored instruction.
(Otherwise, it points to its middle.)
What are the drawbacks of the breakpoint mechanism of the 8086
processor? The most unpleasant is that the debugger must modify code
directly when it sets the breakpoints.
SoftIce implicitly places the breakpoint at the beginning of each
next instruction when it traces the program using Step Over (the
<F10> key). This distorts the chesksum used by protection.
The simplest solution to this problem is instruction-by-instruction
tracing. Of course, this is a joke; it is necessary to set a hardware
breakpoint. In a similar situation, our ancestors (the hackers of the
1980s) usually decrypted the program manually and replaced the
decrypting procedure with the NOP instructions. As a result, debugging
the program did not present a problem (if there were no other traps in
protection). Before IDA appeared, the decrypting procedure had to be
written in C (Pascal, BASIC) as an independent program. Now this task
is easier, since decrypting has become possible in the disassembler
itself.
Decrypting is reduced to the reproduction of the decrypting
procedure in the IDA-C language. In this case, the checksum from
BeginCode to EndCode must be calculated, taking into account the sum of
the bytes and using the lower byte of the checksum to load the
following character. The obtained value is used to process the s0
string using the exclusive OR operation. All this can be done using the
following script (assuming that the appropriate labels are already in
the disassembled code):
Listing 239. Reproducing the Decrypting Mechanism in IDA-C
auto a; auto p; auto crc; auto ch;
for (p=LocByName("s0"); Byte(p) !=0; p++)
{
crc = 0;
for(a=LocByName("BeginCode"); a<(LocByName("EndCode")); a++)
{
ch = Byte(a);
// Since IDA does not support the byte and word types
// (which is a pity), it is necessary to engage in bit
// operations. The lower byte, CRC, is cleared,
// and then the value of CH is copied to it.
crc = crc & 0xFFFFFF00;
crc = crc | ch;
crc=crc+crc;
}
// The high-order byte is taken from CRC.
crc = crc & 0xFFFF;
crc = crc / 0x100;
// The next byte of the string is decrypted.
PatchByte(p, Byte(p) ^ crc);
}
If IDA is not available, HIEW can be used to carry out this operation as follows:
NoTrace.exe ?W PE 00001040 a32 <Editor> 28672 ? Hiew 6.04 (c)SEN
00401003: 83EC18 sub esp, 018 ;"$"
00401006: 53 push ebx
00401007: 56 push esi
00401008: 57 push edi
00401009: B905000000 000005
0040100E: BE30604000 [Byte/Forward ] 406030 ;" @'0"
00401013: 8D7DE8 1>mov bl, al | AX=0061 p][-0018]
00401016: F3A5 2 add ebx, ebx | BX=44C2
00401018: A4 3 | CX=0000
run from here: 4 | DX=0000
00401019: 6660 5 | SI=0000 [0FFFFFFE8]
0040101B: 8D9DE8FFFF 6 | DI=0000
00401021: 33C0
.0040101B: 8D9DE8FFFFFF
.00401021: 33C0 xor eax, eax
.00401023: 8D3519104000 lea esi, [000401019]; < BeginCode
.00401029: 8D0D40104000 lea ecx, [000401040]; < EndCode
.0040102F: 2BCE sub ecx, esi
.00401031: AC lodsb
00401032: 03C0 add eax, eax
00401034: E2FB loop 000001031
00401036: 3023 xor [ebx], ah
00401038: 43 inc ebx
00401039: 803B00 cmp b, [ebx], 000 ;" "
0040103C: 75E3 jne 000001021
0040103E: 6661 popa
to here:
00401040: 90 nop
00401041: 8D45E8 lea eax, [ebp][-0018]
00401044: 50 push eax
00401045: E80C000000 call 000001056
0040104A: 83C404 add esp, 004
1Help 2Size 3Direct 4Clear 5ClrReg 6 7Exit 8 9Store 10Load
At the first stage, the checksum is computed. The file is loaded in
HIEW, and the necessary fragment is found. Then, the <Enter> key
is pressed twice to toggle in the assembler mode, the
<F8>+<F5> key combination is pressed to jump to the entry
point, and the main procedure in the start code is found. Next, the
<F3> key is pressed to enable editing the file. The editor of the
decryptor is called using the <Ctrl>+<F7> key combination.
(This combination varies from one version to another.) Finally, the
following code is entered:
mov bl, al
add ebx, ebx
Another register can be used instead of EBX, apart from for EAX,
since HIEW clears EAX as it reads out the next byte. Now, the cursor is
brought to the 0x401019 line and the <F7> key is pressed to run
the decrypt up to, but not including, the 0x401040 line. If all is done
correctly, the high-order byte BX should contain the value 0x44,
precisely the checksum.
In the second stage, the encrypted line is found (its offset,
.406030, is loaded into ESI), and "xor-ed" by 0x44. (The <F3> key
is pressed to toggle the editing mode, the <Ctrl>+<F8> key
combination is used to specify the key for encrypting, 0x44, then the
<F8> key is pressed to conduct the decryptor along the line.)
NoTrace.exe ?W PE 00006040 <Editor> 28672 ? Hiew 6.04 (c)SEN
00006030: 48 65 6C 6C-6F 2C 20 46-72 65 65 20-57 6F 72 6C Hello, Free World
00006040: 20 65 49 4E-00 00 00 00-7A 1B 40 00-01 00 00 00 eIN z$@ $
All that is left is to patch XOR in the 0x401036 line with the NOP
instructions; otherwise, when the program is started, XOR will spoil
the decrypted code (by encrypting it again), and the program will not
work.
After the protection is removed, the program can be debugged without serious consequences for as long as necessary.
Well, that's it for now. Another long, but hopefully interesting article on assembly and debugging.
|