is a modified process hollowing technique capable of injecting entire PE files.

What is process hollowing?

Process hollowing or RunPE is a code injection technique which allows for an arbitrary PE file to be run in the context of another, legitimate process. This is perhaps the most popular technique used by in-the-wild malware, and is very well documented and not worth going over.

In brief the technique is outlined as follows:

  • CreateProcess on a legitimate executable in a suspended state.
  • NtQueryInformationProcess/NtGetContextThread to find PEB which contains the imagebase.
  • NtUnmapViewOfSection if imagebases overlap.
  • NtAllocateVirtualMemory at requested imagebase.
  • NtWriteVirtualMemory/NtProtectVirtualMemory headers and sections.
  • NtWriteVirtualMemory to overwrite imagebase in PEB.
  • NtSetContextThread to update EAX to point to entrypoint.
  • NtResumeThread to resume suspended thread.

After this — the windows loader will take care of the rest of the injection for us. That got me thinking… what else could we possibly omit that the windows loader would handle automatically?

TLS Callbacks

One of the more obscure and downright weird things about the PE file structure is the TLS (Thread Local Storage) directory. TLS callbacks have all sorts of strange properties, and can be used to do some crazy stuff. I highly recommend reading the TLS section of corkami’s research. The structure of the TLS directory contains a few entries, but really only one you need to care about: AddressOfCallbacks.

AddressOfCallbacks is a pointer (not relative, it’s a complete virtual address) to the first TLS callback to be executed (e.g. some code we want to call).

Two properties of TLS callbacks I noticed which could possibly be helpful to code injection are:

  1. TLS callbacks are executed before the entrypoint
  2. TLS callbacks blindly call a virtual address

Well that’s pretty sketchy, but it’s definitely intended behavior. So let’s figure out how to teach an old RunPE dog some new tricks.

RunPE Revisited

A critical part of the RunPE technique is updating the eax register to point to the new entrypoint of the application. This is usually accomplished using NtSetContextThread, but can also be done by queuing an APC thread or creating a new thread. However it’s done, you’ve got to do some sort of highly suspicious thread modification, which in the realm of heuristic analysis is a big warning that you’re up to no good. What if we could make the windows loader do this for us? Enter, our TLS callbacks.

The entire point of manipulating the thread context is to get the payload entrypoint executed by the main thread. We can accomplish this in a novel way by inserting a TLS callback into our payload executable which calls the entrypoint. Because the TLS callback is called by the windows loader before the entrypoint, the payload will function as intended. LdrInitializeThunk is the ntdll subroutine that handles TLS callbacks and is not called until after we signal thread resumption using NtResumeThread. I verified this consulting Windows Internals 6th Ed. pg 386-387. Therefore our TLS callback will be called after we resume the main thread, but before the original entrypoint is called. Perfect! We now have a way to effectively call our payload entrypoint without any thread manipulation.

Putting it Together

The TRunPE code functions exactly as a normal RunPE does, except that after the remote imagebase has been determined, a TLS directory is appended with callback code which calls the remote imagebase + original entrypoint RVA. Of course, all without a call to NetSetContextThread. The code provided is a proof-of-concept and does not handle a lot of cases. Some future improvements could include:

  • Modifying an existing TLS section
  • Extending the IMAGE_SECTION_HEADER list if necessary
  • Placing the callback code in an already executable section
  • Relocation support

However, if you have a basic PE32 file, without a prior TLS directory and a fixed imagebase, there should not be any major issues.

This idea can (and will) be extended to other parts of process hollowing in future posts.

The fully commented source code is available on my Github.


is an extensible framework for easily writing debuggable, compiler optimized, position-independent, x86 and x64 shellcode for windows platforms.

I will be demonstrating how to write optimized, position-independent x86 and x64 shellcode using our ShellcodeStdio framework. Our approach is invaluable in the rapid development of shellcode as ShellcodeStdio maintains distinct advantages over coding in pure assembly. The framework allows for better debugging by utilizing the Visual Studio environment, as opposed to a assembler-level debugger such as OllyDbg. In addition, the shellcode produced by the Visual Studio compiler is optimized at a sufficient level to be used in production environments. Most importantly, it allows one to development exploit-ready shellcode using pure C/C++ code, with the convenience of pre-processed Win32 API macros and hard-coded string literals.

Shellcode 101

Shellcode, in an offensive security context, is a sequence of machine code that can execute at any memory location in a target process assuming it has the correct memory protections. To better understand what this means, let’s examine what happens when we view the disassembly of a simple program displaying a messagebox.

MessageBoxA(NULL, "Message", "Caption", MB_OK);
00A91013 6A 00                push        0  
00A91015 68 10 20 A9 00       push        offset string "Caption" (0A92010h)  
00A9101A 68 18 20 A9 00       push        offset string "Message" (0A92018h)  
00A9101F 6A 00                push        0  
00A91021 FF 15 00 20 A9 00    call        dword ptr [__imp__MessageBoxA@16 (0A92000h)]

From left to right, we can see the memory address, the machine code bytes, and the instructions. Since these addresses are relative to our already compiled executable file, this code will fail to execute in an arbitrary memory context.

When string literals are compiled into a portable executable file, they are most often stored in a section with constant, read-only, initialized data attributes (e.g. the .rdata section). We can see the memory locations of our two strings referenced in the code excerpt above. In addition, we must consider the memory location of the imported MessageBoxA function. ShellcodeStdio addresses these problems in order to allow our C/C++ compiled code to be executed at any memory location.

Making Data Position-Independent

In order to make our strings position-independent, ShellcodeStdio dynamically creates them on the stack. Any string (ANSI or Unicode), can be represented by pushing the individual bytes to the stack. A complication to be aware of when using this framework is that when one allocates a very large objects on the stack, the Visual Studio compiler will replace the position-independent stack method with a call to HeapAlloc, making the resultant code position-dependent. A possible solution is increasing the SizeOfStackCommit field in the IMAGE_OPTIONAL_HEADER. However, since this does not solve the importing of an extraneous Windows API function done by the compiler, it’s an interesting, but somewhat moot, approach. As of now, this issue limits the size of objects which one can allocate on the stack, but in practice, should not be a problem as there is usually sufficient space for most operations.

Making Windows API Calls Position-Independent

The second issue is resolving all calls to external libraries (i.e. doing anything useful). In the code above, we would need to resolve the call to the MessageBoxA function, which resides in user32.dll. In order to perform a runtime function call we need to know the following:

  • The name of the function to call (MessageBoxA).
  • The name of the library the function resides in (User32.dll).
  • Loading the library into memory, if not already loaded.
  • The address of the library in memory.
  • The address of the function relative to the library.
  • The parameters of the function.

Normally, one can simply call the LoadLibrary function to load external dlls at runtime. But, since our code is executing at an arbitrary memory location, there is additional work involved to first locate the base address of Kernel32 and then walk the export directory to find the LoadLibrary function. In every portable executable with the window or console subsystem, that is to say, non-system, it is safe to assume that both ntdll.dll and kernel32.dll are loaded into memory by the Windows loader. By navigating the Process Environment Block (PEB), one can reliably the base address of every module loaded in the current process.

Navigating the PEB

A pointer to the PEB can be obtained as follows:

#ifndef _WIN64
	p = (PPEB)__readfsdword(0x30);
	p = (PPEB)__readgsqword(0x60);

Once a pointer to the PEB is obtained, navigate to the PEB_LDR_DATA field which contains a linked-list data structure of each loaded module. These point to the LDR_MODULE structure which contains information about said modules — including the base address of where it was loaded into memory.

It is worthwhile to mention that ShellcodeStdio relies on the InMemoryOrder module list and not the other list entries because kernel32 will always be the 3rd module loaded in all executing environments (current image -> ntdll -> kernel32).

Walking the Export Table

To find the relative virtual address (RVA) of LoadLibrary, and consequentially any other function, we must walk kernel32’s export table. We can obtain a pointer to the export directory from the IMAGE_DATA_DIRECTORY structure.


The RVA of a desired function can be found by accessing a specific index of the AddressOfFunctions array. A procedural approach to this involves iterating through each name exported function, comparing the name to our function, then following a match, obtain the RVA by providing the corresponding name ordinal as the index in the AddressOfFunctions array. This is more concisely expressed through this code snippet which uses hashed values of the strings to to compare:

char* moduleName = (char*)(baseAddress + ied->Name);
DWORD moduleHash = rt_hash(moduleName);
DWORD* nameRVAs = (DWORD*)(baseAddress + ied->AddressOfNames);

for (DWORD i = 0; i < ied->NumberOfNames; ++i) {
	char* functionName = (char*)(baseAddress + nameRVAs[i]);
	if (hash == moduleHash + rt_hash(functionName)) {
		WORD ordinal = ((WORD*)(baseAddress + ied->AddressOfNameOrdinals))[i];
		DWORD functionRVA = ((DWORD*)(baseAddress + ied->AddressOfFunctions))[ordinal];
		return baseAddress + functionRVA;

This will yield the address of any name exported function, non-forwarded function that we need (ShellcodeStdio does support forwarded functions). Combine this with some neat C++ constant expression evaluation magic, and we can use this to call Windows API functions pretty much as we normally do — without the need to define every function as an individually typed pointer. To see this part of the source, visit my repository on GitHub.

Putting it All Together

The final step is to adjust our compiler settings. There are some that must be changed in order for ShellcodeStdio to emit proper shellcode.

C/C++ -> Optimization -> /O1, /Ob2, /Oi, /Os, /Oy-, /GL
C/C++ -> Code Generation -> /MT, /GS-, /Gy
Linker -> General -> /INCREMENTAL:NO