Designing Emulation Resistant Control Flow

Dec 17, 2020

—

Antimalware emulators have the Sisyphean task of implementing a complete and accurate clone of the Windows environment. My previous research focused on a generic way to detect the presence of such emulators based upon Windows API artifacts. This is and will continue to be an effective technique in one’s arsenal. One of the most effective ways in which to utilize this tactic for evasion is to modify control flow in a non-trivial way using the artifacts.

Trivial Implementation

push arg1
push arg2
call EmulatedAPI
xor eax, expected_value
jnz .exit_process

Non-Trivial Implementation

push arg1
push arg2
call EmulatedAPI
xor eax, expected_value
add eax, control_flow_value 
jmp eax

The trivial example will not be enough to successfully evade some antimalware emulators. An inquisitive emulator may see that the control flow branch leads to termination and choose to emulate the branch not taken.

In the non-trivial example, the artifact value is used to calculate the next control flow location at runtime. A non-trivial implementation forces emulators to correctly emulate this value. I wanted to create a version of this idea which is not dependent upon the Windows API. While detection of emulators using Windows API artifacts is indeed effective, it relies heavily on specific version of Windows.

To begin, let’s examine an obfuscation technique known as control flow flattening. Consider the following CFG-flattened code.

Control Flow Flattening Process (Simplified):

Split up code into basic blocks tracking entry and exit, transforming cyclic dependencies when necessary
Create a dispatcher responsible for directing control flow by replacing branching with a switch table corresponding to entry and exits of basic blocks
Ensure no unreachable code

uint state = 0;
while(state != 3) {
	switch(state) {
		case 0:
			// CodeBlock0
			state = 1;
			break;
		case 1:
			// CodeBlock1
			state = 2;
			break;
		case 2: 
			// CodeBlock2
			state = 3;
			break;
	}
}

Resulting in the omission of natural conditional constructs in the code.

Originally, what I was doing in order to make the code more opaque was to encrypt the variable in each block with a different API artifact. By encrypting the next state like this, it not only necessitates correct emulation, but also renders the rest of the code entirely unreachable to an emulator.

Using highly Windows version dependent artifacts works, but let’s pivot to a more generic approach. We can exploit another “by-design” limitations of the emulator schema — speed (no, it’s not a timing attack).

Enter… Bitcoin?

Well not really. But at the heart of Bitcoin’s block-generation is a proof-of-work algorithm known as HashCash. HashCash was originally designed as way to prevent e-mail spam and prevent DoS attacks. A proof-of-work algorithm, as defined on the Bitcoin wiki is as follows.

A proof of work is a piece of data which is difficult (costly, time-consuming) to produce but easy for others to verify and which satisfies certain requirements. Producing a proof of work can be a random process with low probability so that a lot of trial and error is required on average before a valid proof of work is generated.

To better understand how HashCash works, let’s examine a simple use case.

I want to send login to a website.
There is a hidden field which contains a random string.
Upon login request, there is a simple task to solve involving the random string. The browser will take the random string, then concatenate with another random string. After concatenation, the browser takes a hash of this value. If the resultant hash starts with a certain bitstring, usually just defined as a n 0s, then the task is complete. If not, this step is repeated until a string and resulting hash is found.
The plaintext string (the concatenated random strings) is sent along with the login information.
To see if the login request has completed proof-of-work, the server takes the hash of the string and ensures that it starts with the agreed upon bitstring.

This is a simple and high-level overview of the HashCash algorithm, however it is sufficient for understanding the purpose for building emulation resistant control flow.

Encrypting State Variables

Refer to the original control flow flattening code. The schema for designing our evasive control flow is like this.

Perform the HashCash algorithm on the current state (in the context of the block).
Take the resultant concatenated string and hash it.
Encrypt the next state using the hash value.

At runtime, the evasive application, performs this operation in reverse, using the same seed for your RNG. Aside from brute forcing, the only way for emulation to find the next control flow state is to perform the HashCash proof-of-work algorithm. As stated before, the CPU intensity of the algorithm can be adjusted by either increasing or decreasing the length of the bitstring.

Real World Application

I want to decrypt some shellcode in memory. I want to make sure the emulator cannot see the decrypted shellcode in memory. Note: this is a different attack/defense scenario than a memory scanner.

I can split up the decryption of the shellcode into a control flow flattened model. Here I can control each state variable and encrypt them at runtime using the process described above. For the emulator to reach the next block, it must emulate this CPU intense proof-of-work algorithm. If this process takes 30 seconds in a non-emulated environment, then it will take at least 30 seconds in an emulated environment, if not significantly more. Additionally, I can use the same principal to encrypt the shellcode.

Combining everything a routine which utilizes this method might look something like the code below.

public static byte[] GenBuffer()
{
    List<byte> buffer = new List<byte>();
    var HashCash = HashCashWorker.Create(1837627802, 1);
    uint state = 0x40337D16;

    while (state != 0x7085114B)
    {
        switch (state)
        {
            case 0xA0BDCDE5:
                buffer.Add(0x69);
                HashCash.Solve(new byte[] { 0xE5, 0xCD, 0xBD, 0xA0 });
                state = BitConverter.ToUInt32(HashCash.DecryptWithSolution(new byte[] { 0x6F, 0xDD, 0x6B, 0x90 }), 0);
                break;
            case 0xF4E5647C:
                buffer.Add(0xC1);
                HashCash.Solve(new byte[] { 0x7C, 0x64, 0xE5, 0xF4 });
                state = BitConverter.ToUInt32(HashCash.DecryptWithSolution(new byte[] { 0xF3, 0x14, 0x60, 0xD0 }), 0);
                break;
            case 0x810B71D3:
                buffer.Add(0x2B);
                HashCash.Solve(new byte[] { 0xD3, 0x71, 0x0B, 0x81 });
                state = BitConverter.ToUInt32(HashCash.DecryptWithSolution(new byte[] { 0x71, 0x80, 0x22, 0x6E }), 0);
                break;
            case 0xA7D853BA:
                buffer.Add(0x50);
                HashCash.Solve(new byte[] { 0xBA, 0x53, 0xD8, 0xA7 });
                state = BitConverter.ToUInt32(HashCash.DecryptWithSolution(new byte[] { 0x47, 0x80, 0xD6, 0x3C }), 0);
                break;
                // and so on...

Source

Source code can be found on my GitHub.

The example project contains a demonstration of how it can be used to create emulation resistant control flow patterns. You can apply this technique in anyway you see fit and that better suits your use case(s).

Additional Reading

https://www.crestcon.org/wp-content/uploads/2019/11/MattWixey.pdf

http://www.hashcash.org/papers/hashcash.pdf

Evasion