System research mainly depends on the ability to easily instrument, ease to reverse engineer, monitor and/or extend existing operating system and application functionality. If the source code is available then it’s quite easy to insert new instrumentations and extend the OS & applications’ abilities. But in today’s commercial environment researchers have seldom access to the source code. So to fulfill these requirements a different mechanism is commonly used called API hooking.
Though API hooking is fairly established concept and popular among developers, In this paper I am going to introduce a new dll injection technique which overcomes the limitations experienced in existing dll injection techniques and corresponding API hooking techniques.
The intended audience for this paper is Windows developers/researchers which use or intend to use API Hooking in some way.
Api Hooking Overview:
API
Hooking is a mechanism to execute the user defined code(APIs)
before/after the execution of original APIs or just executing the user defined
code without executing the original API. It is basically a two phased process:
- Injection
- Interception
We will discuss, in brief, both one by one.
Injection:
This is the first phase of API Hooking. In
this phase a dll is injected into the target process which starts the second
phase “Interception”. There are lots of injection techniques known so far,
which are briefly summarized below:
1. Using Registry
In this
technique, Dll name is added to the following registry key:
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows
NT\CurrentVersion\Windows\AppInit_DLLs
This value contain
the dll(s) separated by comma or spaces and are loaded into each windows-base applications
running within the current logon session. These dlls are loaded as part of
USER32’s initialization. During DLL_PROCESS_ATTACH process of User32.dll,
LoadLibrary() for the dll(s) present in this key are called [1].
2. Using System-wide Windows
Hooks
In this
technique hook is installed using SetWindowsHookEx() by passing appropriate
parameters. Hook callback functions are
implemented in a dll. After the hook is installed windows maps the hook dll
into the address space of each of it’s client processes which meets the
requirement of the hook.
To unhook/unload
the dll from the processes in which it is loaded, UnhookWindowsHookEx()
function is called.
3.
Using CreateRemoteThread() API function
This is a very
popular technique proposed by Jeffrey Ritcher in his article[6]. In this
technique a dll is forced to load in the target process using
CreateRemoteThread() api.
As the
signatures of loadLibrary() and the thread function are same, i.e.,
DWORD WINAPI
ThreadProc(LPVOID lpParameter);
HMODULE WINAPI
LoadLibrary(LPCTSTR lpFileName);
We can pass the
address of LoadLibrary() to it with dll name as a parameter to this. As
Kernel32.dll is always mapped to the same address space in each process, the
address of LoadLibrary() will be same in all the processes. This ensures that
we can pass a valid address as a parameter of createRemoteThread() api.
For passing the
dll name as a parameter, it must be allocated in the target process using the
VirtualAllocEx() and WriteProcessMemory(), and then after successful writing
it’s address is passed as parameter to LoadLibrary().
4.
Implanting through BHO
add-ins
Sometimes it is required to inject a custom code inside Internet Explorer only.
Fortunately Microsoft provides an easy and well documented way for this purpose
Browser Helper Objects. A BHO is
implemented as COM DLL and once it is properly registered, each time when IE is
launched it loads all COM components that have implemented IObjectWithSite
interface.
5.
MS Office add-ins
Similar to
the BHOs, we can implant our code in MS Office applications, using MS Office
add-ins.
Interception:
Injection is the initial phase of API
hooking, although injection gives control to the user in the target process but
this alone is not sufficient for API hooking. For this Interception must be
done. In this second phase, the address of original API(s) is replaced with the
address of user defined APIs. This is done so that user defined APIs will be
called first irrespective of whether the original APIs are called afterwards or
not. There are various known techniques
of doing the interception as described in [2].
Problem Context
As the API Hooking is the combination of
the injection and interception technique, so a problem in any of the two
techniques will defeat the whole purpose of API hooking. In this paper I will
discuss the various problems in the existing injection techniques and their
solution:
1. Using Registry
- In order to activate/deactivate the injection process, we need
to reboot Windows.
- The DLL we want to inject will be mapped only into those
processes that use USER32.DLL, thus the Dll is not injected into console
applications, since they usually don't import functions from USER32.DLL.
- We can’t control the injection process because it is implanted
into every single GUI application, regardless we want it or not. So it is
a redundant overhead especially if we intend to hook few applications
only.
2. Using System wide Windows
Hooks
- Windows Hooks can degrade significantly the entire performance
of the system, because they increase the amount of processing the system
must perform for each message.
- Completion of dll injection into the target process cannot
deterministically be identified. This may not always be desirable.
- Last but not least, it may affect the processing of the whole
system and under certain circumstances (say a bug) we must reboot your
machine in order to recover it.
3. Using CreateRemoteThread()
API
- Initial API Patching
Problem:
During the
creation of a process if we attempt to hook some API, the CreateRemoteThread()
API calls the LoadLibrary() as a thread by passing the Dll name as a parameter
in the target process(which does the interception, e.g. IAT patching). But this
LoadLibrary() is executed only when the process is in execution phase. Moreover
this LoadLibrary() API will be running as a thread parallel to the main thread
of the target process. Because of this, the main thread of the target process
sometimes calls the original API almost in parallel to when the dll is about to
patch the original function. In such cases we will not be able to patch the
initial calls to the target API using CreateRemoteThread() technique.
As the main
thread and interception is done in parallel, some times the target process
crashes. This happens because if the
main thread is calling the target API and simultaneously the address of target
API is getting changed then the main thread of the target process will get
incorrect address of the API, which will lead to crash.
4. Implanting through BHO add-ins and MS Office add-ins
- It works only for limited
applications like explorer and MS Office applications.
Proposed
Solution
With
CreateRemoteThread injection technique we can address the limitations of other
injection technique. As mentioned above,
this technique also has few major limitations, so our focus is to remove
limitation in this technique by suggesting a better injection technique.
Before
describing this new technique I will first go through briefly how a process is
created in Windows. After that I will describe how we can safely inject our dll
into a newly created process.
Process
Creation:
A Windows
process is created using CreateProcess,
CreateProcessAsUser, CreateProcessWithTokenW, or CreateProcessWithLogonW APIs. Internally the process is
created in stages. Below are the main stages of creating a process in Windows
as described in Microsoft Windows Internals (By Mark
E. Russinovich, David A. Solomon)[4]:
I will not go into detail of every stage,
it is described in detail in [8]. Below is the brief description of all the
stages:
Stage
1: Open the image file (.exe) to be executed inside
the process and create section objects.
Stage
2: Create the Windows executive process object.
Stage
3: Create the initial thread (stack, context, and
Windows executive thread object).
Stage
4: Notify the Windows subsystem of the new process
so that it can set up for the new process and thread.
Stage
5: Start execution of the initial thread (unless
the CREATE_ SUSPENDED flag was specified).
Stage
6: In the context of the new process and thread,
complete the initialization of the address space (such as load required DLLs)
After stage 6 is completed the execution of
the entry point to image begins.
Dll Injection:
Now
we need to find a point where we can safely inject a dll into the target
process before process enters into execution phase. In the diagram below, it is
shown where we can safely inject our stub which will contain the code for
loading the dll.
So
we can inject the dll into the target process safely just after the stage 6.
i.e. when the Final Process/Image initialization is done and before the “Start
executing at entry point to image”. By doing this we will be able to patch the
required API before the entry point to image is executed.
Injection
Using Debug APIs:
Now
the problem boils down to find out, when
the stage 6 i.e. Final Process/Image initialization is compete and the actual
execution phase is about to begin.
We
can find out this point in a little tricky way. What we need to do is to create
the process in debug mode i.e. the process will run as a debuggee inside a
debugger. By doing this, the debugger will receive an event called CREATE_PROCESS_DEBUG_EVENT[3], which is sent to the debugger during
stage 6 (before doing any initialization and loading). During this event if we try to get the value
of EIP for that thread context, it contains the address of
BaseProcessStartThunk() which is responsible for executing the entry point of
the image ( the C runtime initialization code that eventually calls either main()
or WinMain() function). Now tricky part
here is that, if we can change this address in EIP to the desired address, we
will able to load the required dll before process enters execution phase.
The address that is replaced with the
address of EIP, is the starting address of a stub. This stub is written in the
process’s virtual memory space and contains simple assembly instructions for
loading the dll. After loading the
required dll it hands over the control again to BaseProcessStartThunk() which
does the execution of main thread of the process. The details of the stub can
be find out from the existing Code cave injection technique[5].
There
is one important question here. During CREATE_PROCESS_DEBUG_EVENT
event kernel32.dll is not loaded into the target process’s address space. If it is not loaded then how does our stub
loads the required dll into the target process as the address of LoadLibrary()
API is invalid at that time in that process context?
The answer is that after
CREATE_PROCESS_DEBUG_EVENT event (i.e. when ContinueDebugEvent() API is called
after receiving this event) a user mode APC, which runs image loader initialize
routine (LdrInitializeThunk in Ntdll.dll), is queued. So when the initial thread attempts to execute in the
user mode, this APC is delivered which calls the LdrInitializeThunk routine in
the context of initial thread. This routine initializes the loader, heap
manager, NLS tables, thread-local storage (TLS) array, and critical section
structures. It then loads any required DLLs and calls the DLL entry points with
the DLL_PROCESS_ ATTACH function code. After
LdrInitializeThunk returns, the initial thread executes the BaseProcessStartThunk
which then calls the entry point of the image.
Thus we can say that all implicit linked
dll are loaded (which also includes kernel32.dll for Windows subsystem
applications) before the stub loads the desired dll.
One more tricky point here is that, at the
time of injecting the stub into target process, the address of the
LoadLibrary() API it contains for loading the required dll, is written in the
context of the parent process rather than the target process. This is done so because
the stub will be injected from the parent process into the target process, so
it would be difficult to get the address in the context of target process. But
this will not pose any problem in the execution of stub, because kernel32.dll
is loaded at the same address space in all the processes.
Using this approach the limitations using
CreateRemoteThread() injection technique gets over as described below:
1.
Solution to Initial API Patching Problem:
As the dll
loading is done before entry point to image is executed, so initial calls to
the original API gets patched before a call to that API is made.
2.
Solution to Crash Problem:
As the dll is
loaded by stub (which does patching) in the context of main thread instead of
some parallel thread so patching the API will not lead to crash.
Limitations
This approach has further few limitations:
1.
Debugger detach in Windows 2000:
In Windows 2000,
DebugActiveProcessStop() Api is not supported, so debugger can’t be detached
from the process. If the debugger is also killed then the process being
debugged also gets killed. The possible solution for this problem is to run the
debugger in a thread and keep that thread running until the process gets
terminated. When the process is terminated debugger receives the
EXIT_PROCESS_DEBUG_EVENT event, after which the debugger thread can be
terminated. As the debugger will be running with the process, so it will cause
some overhead, but as there will be no processing during the debug events in the debugger, overall process execution
will run normally with a little overhead.
2.
In Windows 2000, debugger can’t
be detached from the debuggee, so no other debugger can attach to that process.
3.
As the stub is written in
Assembly language (as described in [9]), it’s machine dependent, so it may need
to change as per the underlying architecture.
Conclusion
“Dll injection using debugger” provides
better control and flexibility to the developer/researcher than any other known
techniques. This technique will work on all the Windows versions which let the
parent process attach a debugger, get
& set the thread context (through GetThreadContext() & SetThreadContext())
and write into the virtual memory (through WriteProcessMemory()) of the child
process.
Acknowledgements
I want to thank Mr. Alok Srivastava, my
mentor, for helping me in framing the structure of this paper. His cross questions and reviews made this
paper technically more sound & understandable.
References
[1] Working with the AppInit_DLLs
registry value MSDN knowledge base Q197571
[2] API
Hooking Revealed, Ivo Ivanov, December 2002
[3] Debugging
Events
[4] Microsoft® Windows® Internals, Fourth
Edition: Microsoft Windows Server™ 2003, Windows XP, and Windows 2000, By Mark
E. Russinovich, David A. Solomon
[5] Dll Injection
Tutorial by Darawk
[6] "Load Your 32-bit DLL into Another
Process's Address Space Using INJLIB" MSJ May 1994, by Jeffrey Ritcher's
|