Skip to content

Designing Emulation Resistant Control Flow

Antimalware emulators have the Sisyphean task of implementing a complete and accurate clone of the Windows environment. My previous research focused on a generic way to detect the presence of such emulators based upon Windows API artifacts. This is and will continue to be an effective technique in one’s arsenal. One of the most effective ways in which to utilize this tactic for evasion is to modify control flow in a non-trivial way using the artifacts.

Trivial Implementation
push arg1
push arg2
call EmulatedAPI
xor eax, expected_value
jnz .exit_process

Non-Trivial Implementation
push arg1
push arg2
call EmulatedAPI
xor eax, expected_value
add eax, control_flow_value 
jmp eax

The trivial example will not be enough to successfully evade some antimalware emulators. An inquisitive emulator may see that the control flow branch leads to termination and choose to emulate the branch not taken.

In the non-trivial example, the artifact value is used to calculate the next control flow location at runtime. A non-trivial implementation forces emulators to correctly emulate this value. I wanted to create a version of this idea which is not dependent upon the Windows API. While detection of emulators using Windows API artifacts is indeed effective, it relies heavily on specific version of Windows.

To begin, let’s examine an obfuscation technique known as control flow flattening. Consider the following CFG-flattened code.

Control Flow Flattening Process:

  • Split up code logic into blocks.
  • Create a state dispatcher which can reach all blocks.
  • Create a state chain which will preserve the runtime order of the blocks.

This results in the omission of natural conditional constructs in the code.

Image Credit:
uint state = 0;
while(state != 3) {
	switch(state) {
		case 0:
			// CodeBlock0
			state = 1;
		case 1:
			// CodeBlock1
			state = 2;
		case 2: 
			// CodeBlock2
			state = 3;

Originally, what I was doing in order to make the code more opaque was to encrypt the variable in each block with a different API artifact. By encrypting the next state like this, it not only necessitates correct emulation, but also renders the rest of the code entirely unreachable to an emulator.

Using highly Windows version dependent artifacts works, but let’s pivot to a more generic approach. We can exploit another “by-design” limitations of the emulator schema — speed (no, it’s not a timing attack).

Enter… Bitcoin?

Well not really. But at the heart of Bitcoin’s block-generation is a proof-of-work algorithm known as HashCash. HashCash was originally designed as way to prevent e-mail spam and prevent DoS attacks. A proof-of-work algorithm, as defined on the Bitcoin wiki is as follows.

A proof of work is a piece of data which is difficult (costly, time-consuming) to produce but easy for others to verify and which satisfies certain requirements. Producing a proof of work can be a random process with low probability so that a lot of trial and error is required on average before a valid proof of work is generated.

To better understand how HashCash works, let’s examine a simple use case.

  • I want to send login to a website.
  • There is a hidden field which contains a random string.
  • Upon login request, there is a simple task to solve involving the random string. The browser will take the random string, then concatenate with another random string. After concatenation, the browser takes a hash of this value. If the resultant hash starts with a certain bitstring, usually just defined as a n 0s, then the task is complete. If not, this step is repeated until a string and resulting hash is found.
  • The plaintext string (the concatenated random strings) is sent along with the login information.
  • To see if the login request has completed proof-of-work, the server takes the hash of the string and ensures that it starts with the agreed upon bitstring.

This is a simple and high-level overview of the HashCash algorithm, however it is sufficient for understanding the purpose for building emulation resistant control flow.

Encrypting State Variables

Refer to the original control flow flattening code. The schema for designing our evasive control flow is like this.

  • Perform the HashCash algorithm on the current state (in the context of the block).
  • Take the resultant concatenated string and hash it.
  • Encrypt the next state using the hash value.

At runtime, the evasive application, performs this operation in reverse, using the same seed for your RNG. Aside from brute forcing, the only way for emulation to find the next control flow state is to perform the HashCash proof-of-work algorithm. As stated before, the CPU intensity of the algorithm can be adjusted by either increasing or decreasing the length of the bitstring.

Real World Application

I want to decrypt some shellcode in memory. I want to make sure the emulator cannot see the decrypted shellcode in memory. Note: this is a different attack/defense scenario than a memory scanner.

I can split up the decryption of the shellcode into a control flow flattened model. Here I can control each state variable and encrypt them at runtime using the process described above. For the emulator to reach the next block, it must emulate this CPU intense proof-of-work algorithm. If this process takes 30 seconds in a non-emulated environment, then it will take at least 30 seconds in an emulated environment, if not significantly more. Additionally, I can use the same principal to encrypt the shellcode.

Combining everything a routine which utilizes this method might look something like the code below.

Let’s examine a real world scenario where this proved useful. I anecdotally think ESET has one of the better AVs available (combination of ML-based runtime packer detection, memory scanning, and the infamous Kryptik.XXX detection family).

I will be building a stub program which decrypts and calls Assembly.Load/EntryPoint.Invoke on @harmj0y‘s SafetyKatz, which is detected as MSIL/Riskware.Mimikatz.A by ESET NOD32.

When I first ran this, it was detected via AMSI catching the Assembly.Load(SafetyKatz). I generated 3 1024 bit encryption keys using the methodology described above. Each key took around 5-10 seconds to generate at runtime. I took a public AMSI bypass which was also detected at runtime, then encrypted the strings, “amsi.dll” and “AmsiScanBuffer” and code patch bytes using these keys. This proved effective at mitigating runtime detections due to AMSI.

I then generated another 1024 bit key which was used to encrypt the SafetyKatz payload. This was decrypted and invoked. As you can see below, it successfully bypassed emulation layer defenses.


Source code can be found on my GitHub.

The example project contains a demonstration of how it can be used to create emulation resistant control flow patterns. You can apply this technique in anyway you see fit and that better suits your use case(s).

Additional Reading