The Challenge
-------------
Write the smallest possible PE program (win32) that outputs it's command line.
The Solution
This problem looks like the one about the 11-byte .COM program solved on the
previous issue. Yet the method used to solve it is entirely different. This is
because while .COM files include just raw code and data, the PE files include a
header with information on the file. It is this header that must be 'tweaked'
to get a small file.
Before going on, some things must be cleared:
- This article relies heavily on "The PE File Format" by B.Luevelsmeyer
(whom I really thank). You are advised to find the .txt and read it. Of course
Microsoft provides it's own documentation but they would hardly ever say 'this
seems to be ignored' for their own format.
- If you think that PE (Portable Exexutable) is the format introduced by win95
you're wrong. Not only was PE created for winNT, but it also seems that win95
is not 100% PE compatible. Anyway, this article has been written for winNT, and
I don't think anything will run in windows 95.
- This article was based on a 'trial and error' method. Some solutions exist
only because they work. So don't ask why... (Actually the trial and error
resulted in two BSODs, thus proving that a program can crash windows NT without
even running it's own code)
- No, I'm not paranoid. I just like pushing things to their limit :)
Now, on to the solution...
The code to print the command line looks like this:
----------------- normal.asm -----------------------
.386
.model flat
extrn GetCommandLineA:proc
extrn GetStdHandle:proc
extrn WriteFile:proc
.data?
dummy db ?
.code
start:
call GetCommandLineA
xor ecx, ecx
push ecx
loop1: inc ecx
cmp byte ptr [eax+ecx], 0
jne short loop1
push esp
push ecx
push eax
push -11
call GetStdHandle
push eax
call WriteFile
ret
ends
end start
some comments on the code:
- the .data? section is present because I can't make TASM work without any data
- there is no ExitProcess. In it's place there is a simple 'ret'. This is
because the entry point is actually called by kernel32 with the following piece
of code:
call [ebp+8] ; [ebp+8] holds the entry point address
push eax
jmp label:
...
label: call ExitThread
This program compiles under TASM to 4 KB long. Those 4096 bytes are divided
like this:
Dos Stub 256
PE Header 248
4 section headers 160
padding 872
------------------------
code 50
padding 462
imports 132
padding 380
reloc 16
padding 1520
This means that we have:
16% header
5% code / data
79% padding
It seems that TASM can't create anything smaller. So, the code will have to be
written by hand in a hex editor. Actually you don't have to worry, as you'll
only have to write 192 bytes for the final program (believe it or not!).
In order to shrink the file, the following steps must be taken: Remove Padding,
Use a Single Section, Remove the DOS Stub, Tweak the PE Header, Squeeze the
Code, Squeeze the Imports, and 'ReAssemble' the Program.
1. Remove padding
By changing the 'FileAlignment' field in the PE header, all the padding can be
discarded. (Actually it seems that win95 won't allow this)
2. Use one section
TASM creates the following sections:
.code : code
.data : initialized and uninitialized data
.idata : imports
.reloc : relocation info
-The .reloc section is not needed, as only DLLs get relocated
-The .data sectionis only present because I can't have TASM create a normal
executable without a data section.
-The .idata section can then be merged with the .code section. Remember that the
name of each section does not depend on what the section contains, since the OS
finds things like imports, relocations or resources from the directory in the
PE header.
3. No DOS stub
All compilers that compile PE executables create a DOS stub that displays a
message like 'This program must be run under Win32'. Yet this is NOT required
by the PE format. What PE needs (as seen in [ntdll.dll]RtlImageNtHeader or
[imagehlp.dll]ImageNtHeader) is:
PIECE I: DOS HEADER
0000| 4D5A **** **** **** **** **** **** ****
0010| **** **** **** **** **** **** **** ****
0020| **** **** **** **** **** **** **** ****
0030| **** **** **** **** **** **** ???? ????
where ???? is the offset of the PE header from the beginning of the file
4. Tweaked PE header
The PE header consists of the following structures:
IMAGE_NT_SIGNATURE: 00004550h
IMAGE_FILE_HEADER:
WORD Machine ; >> 014Ch for Intel 386
WORD NumberOfSections ; 1 for this example
DWORD TimeDateStamp ; *
DWORD PointerToSymbolTable ; *
DWORD NumberOfSymbols ; *
WORD SizeOfOptionalHeader ; >> 70h (Opt. header + directories)
WORD Characteristics ; >> 0102h for 32bit executable
IMAGE_OPTIONAL_HEADER:
WORD Magic ; 0B01h
BYTE MajorLinkerVersion ; *
BYTE MinorLinkerVersion ; *
DWORD SizeOfCode ; *
DWORD SizeOfInitializedData ; *
DWORD SizeOfUninitializedData ; *
DWORD AddressOfEntryPoint ; >> ???? RVA of entry point
DWORD BaseOfCode ; *
DWORD BaseOfData ; *
DWORD ImageBase ; >> 00100000h for this example
DWORD SectionAlignment ; 2
DWORD FileAlignment ; 2
WORD MajorOperatingSystemVersion ; *
WORD MinorOperatingSystemVersion ; *
WORD MajorImageVersion ; *
WORD MinorImageVersion ; *
WORD MajorSubsystemVersion ; >> 0004
WORD MinorSubsystemVersion ; >> 0000
DWORD Win32VersionValue ; *
DWORD SizeOfImage ; >> ????
DWORD SizeOfHeaders ; *
DWORD CheckSum ; *
WORD Subsystem ; 0003 for win32 console application
WORD DllCharacteristics ; *
DWORD SizeOfStackReserve ; 00100000h
DWORD SizeOfStackCommit ; 00001000h
DWORD SizeOfHeapReserve ; 00100000h
DWORD SizeOfHeapCommit ; 00001000h
DWORD LoaderFlags ; *
DWORD NumberOfRvaAndSizes ; 2 data directories (Exports & Imports)
...a number (actually 2) of the following:
IMAGE_DATA_DIRECTORY:
DWORD VirtualAddress ; 0 for exports, ???? for imports
DWORD Size ; 0 for exports, ???? for imports
...a number (actually 1) of the following:
IMAGE_SECTION_HEADER:
BYTE Name[8] ; * (Anything we like)
DWORD VirtualSize ; ?! (h.o. word must be zero??)
DWORD VirtualAddress ; >> ????
DWORD SizeOfRawData ; >> ????
DWORD PointerToRawData ; >> ????
DWORD PointerToRelocations ; *
DWORD PointerToLinenumbers ; *
WORD NumberOfRelocations ; *
WORD NumberOfLinenumbers ; *
DWORD Characteristics ; *
So the raw hex data for the PE header are:
PIECE II: PE HEADER
| 5045 0000 4C01 0100 **** **** **** ****
| **** **** 7000 0201 0B01 **** **** ****
| **** **** **** **** ???? ???? **** ****
| **** **** 0000 1000 0200 0000 0200 0000
| **** **** **** **** 0400 0000 **** ****
| ???? ???? **** **** **** **** 0300 ****
| 0000 1000 0010 0000 0000 1000 0010 0000
| **** **** 0200 0000 0000 0000 0000 0000
| ???? ???? ???? ???? **** **** **** ****
| **** **** ???? ???? ???? ???? ???? ????
| **** **** **** **** **** **** **** ****
- NOTES
- - ???? means that the value is needed but has to be filled in later as it
depends on the code
- **** means that the value is either completely ignored or it can be set to
any value without raising an error
- the main difference between this and a 'normal' PE header is that the size of
the optional header is 70h (112 bytes) instead of the standard 0E0h (224 bytes).
This is because there are only 2 directories instead of 16. This seems to be
the minimum number of directories possible, as there seems to be no way of
running an .exe that has no imports.
5. Squeezed code
Even though the code we have is already tight, it has one major drawback: It
invokes three API functions. To realize what this means just think that the
names of the functions are included in the imports section as normal ASCII
which means that only the names would take 36 bytes...
The solution here (since those functions are needed) is to call the functions
directly. This is possible because kernel32.dll is never relocated so the
function entry points are always the same (for a given version of windows).
For NT4 those values are:
GetStdHandle: 77F01CBB
WriteFile : 77F0D354
GetCommandLine is a special case since it has the format:
GetCommandLineA proc near
mov eax, [77F4657Ch]
retn
GetCommandLineA endp
so the final code will look like:
----------------- code.hex -----------------------
A17C65F477 mov eax, offset CommandLine
BEBB1CF077 mov esi, offset GetStdHandle
33C9 xor ecx, ecx
51 push ecx
41 inc ecx
803C0800 cmp [eax+ecx], 0
75F9 jnz -07
54 push esp
51 push ecx
50 push eax
6AF5 push -11 ; StdOut
FFD6 call esi ; GetStdHandle
50 push eax
B854D3F077 mov eax, offset WriteFile
FFD0 call eax
C3 ret
6. Squeezed imports
[Comment: read a text on PE format to better understand what's going on]
As mentioned earlier, the PE file must have an imports directory in order to
load properly. Yet, since we call API functions directly, we only have to
specify one dummy import. A good choice (since it really has a short name) is
'Arc' from 'gdi32.dll'. To specify this imported function we should need:
IMAGE_IMPORT_DESCRIPTOR for gdi32.dll:
OriginalFirstThunk ; *
TimeDateStamp ; *
ForwarderChain ; *
Name ; >> ???? RVA of ASCII string 'gdi32.dll',0
FirstThunk ; >> ???? RVA described later...
IMAGE_IMPORT_DESCRIPTOR full of zeroes to specify end of imports
OriginalFirstThunk ; *
TimeDateStamp ; *
ForwarderChain ; *
Name ; 0 This is checked to see if it is the end...
FirstThunk ; *
'FirstThunk' is the RVA of a 0-terminated list of RVAs, one for each function
in the specified DLL. For this example we only need one RVA followed by a null
dword. This RVA will point to a structure IMAGE_IMPORT_BY_NAME:
WORD Hint ; *
BYTE Name[...] ; 'Arc',0
By putting all this together we would have:
PIECE III: IMPORTS
| **** **** **** **** **** **** -dword 1-
| -dword 2- -dword 3- 0000 0000 **** ****
| 0000 0000 **** ****
dwords 1 and 2 are the two RVAs for the IMAGE_IMPORT_DESCRIPTOR. dword 3 is the
RVA to the IMAGE_IMPORT_BY_NAME. So, dword 2 is the RVA of dword 3. We also
need space for the two strings 'gdi32.dll',0 and 'Arc',0.
There is a way to use even less bytes for the imports. Just remember that the
imports are examined after the file has been mapped into memory. So, since
memory is allocated in blocks, after the end of the file there will be a space
full of zeroes. So by placing the three dwords in the last 12 bytes of the file,
there is no need for the two zeroes.
7. 'Assemble' the program
The values marked as ???? will be:
Offset of PE header : 00000010
AddressOfEntryPoint : 00000002
SizeOfImage : 000000C0
Imports RVA : 000000A8
Imports Size : 00000028
Section VirtualAddress : 00000000
Section SizeOfRawData : 000000C0
Section PointerToRawData: 00000000
Dll Name RVA : 00000098
Dll FirstThunk RVA : 000000BC
Dll Function Hint/Name : 000000AE
Notice that the Section data and the Header (DOS and PE) are the same thing.
The section RVA is 0, so file offset and RVAs are the same. The code will be
broken in three pieces, connected by two jumps. The final result will be:
THE PROGRAM
0000| 4D5A A17C 65F4 77BE BB1C F077 33C9 EB08
0010| 5045 0000 4C01 0100 5141 803C 0800 75F9
0020| 5451 EB06 7000 0201 0B01 506A F5FF D650
0030| B854 D3F0 77FF D0C3 0200 0000 1000 0000
0040| 0000 1000 0200 0000 0200 0000
0050| 0400 0000
0060| C000 0000 0300
0070| 0000 1000 0010 0000 0000 1000 0010 0000
0080| 0200 0000 0000 0000 0000 0000
0090| A800 0000 2800 0000 6764 6933 322E 646C
00A0| 6C00 0000 0000 0000 C000 0000 0000 0000
00B0| 4172 6300 9800 0000 BC00 0000 AE00 0000
Blank bytes are meaningless, and can be set to any value.
Wrapping Up
Well, if you managed to read up to here, and understood what happened, I guess
you need no more explanations. I just gave an idea (actually MANY ideas). Maybe
on another article I will start exploring the possibilities this 'experiment'
showed me...
Next Issue Challenge
Write a routine for converting ASCII hex to binary in 6 bytes.
|