is a modified process hollowing technique capable of injecting entire PE files.


What is process hollowing?

Process hollowing or RunPE is a code injection technique which allows for an arbitrary PE file to be run in the context of another, legitimate process. This is perhaps the most popular technique used by in-the-wild malware, and is very well documented and not worth going over.

In brief the technique is outlined as follows:

  • CreateProcess on a legitimate executable in a suspended state.
  • NtQueryInformationProcess/NtGetContextThread to find PEB which contains the imagebase.
  • NtUnmapViewOfSection if imagebases overlap.
  • NtAllocateVirtualMemory at requested imagebase.
  • NtWriteVirtualMemory/NtProtectVirtualMemory headers and sections.
  • NtWriteVirtualMemory to overwrite imagebase in PEB.
  • NtSetContextThread to update EAX to point to entrypoint.
  • NtResumeThread to resume suspended thread.

After this — the windows loader will take care of the rest of the injection for us. That got me thinking… what else could we possibly omit that the windows loader would handle automatically?

TLS Callbacks

One of the more obscure and downright weird things about the PE file structure is the TLS (Thread Local Storage) directory. TLS callbacks have all sorts of strange properties, and can be used to do some crazy stuff. I highly recommend reading the TLS section of corkami’s research. The structure of the TLS directory contains a few entries, but really only one you need to care about: AddressOfCallbacks.

AddressOfCallbacks is a pointer (not relative, it’s a complete virtual address) to the first TLS callback to be executed (e.g. some code we want to call).

Two properties of TLS callbacks I noticed which could possibly be helpful to code injection are:

  1. TLS callbacks are executed before the entrypoint
  2. TLS callbacks blindly call a virtual address

Well that’s pretty sketchy, but it’s definitely intended behavior. So let’s figure out how to teach an old RunPE dog some new tricks.

RunPE Revisited

A critical part of the RunPE technique is updating the eax register to point to the new entrypoint of the application. This is usually accomplished using NtSetContextThread, but can also be done by queuing an APC thread or creating a new thread. However it’s done, you’ve got to do some sort of highly suspicious thread modification, which in the realm of heuristic analysis is a big warning that you’re up to no good. What if we could make the windows loader do this for us? Enter, our TLS callbacks.

The entire point of manipulating the thread context is to get the payload entrypoint executed by the main thread. We can accomplish this in a novel way by inserting a TLS callback into our payload executable which calls the entrypoint. Because the TLS callback is called by the windows loader before the entrypoint, the payload will function as intended. LdrInitializeThunk is the ntdll subroutine that handles TLS callbacks and is not called until after we signal thread resumption using NtResumeThread. I verified this consulting Windows Internals 6th Ed. pg 386-387. Therefore our TLS callback will be called after we resume the main thread, but before the original entrypoint is called. Perfect! We now have a way to effectively call our payload entrypoint without any thread manipulation.

Putting it Together

The TRunPE code functions exactly as a normal RunPE does, except that after the remote imagebase has been determined, a TLS directory is appended with callback code which calls the remote imagebase + original entrypoint RVA. Of course, all without a call to NetSetContextThread. The code provided is a proof-of-concept and does not handle a lot of cases. Some future improvements could include:

  • Modifying an existing TLS section
  • Extending the IMAGE_SECTION_HEADER list if necessary
  • Placing the callback code in an already executable section
  • Relocation support

However, if you have a basic PE32 file, without a prior TLS directory and a fixed imagebase, there should not be any major issues.

This idea can (and will) be extended to other parts of process hollowing in future posts.

The fully commented source code is available on my Github.