Modern packers use API obfuscation techniques to obstruct malware sandboxes and reverse engineers. In such packers, API call instructions are replaced with equivalent lengthy and complex code. API obfuscation techniques can be categorized into two according to the obfuscation time - static and dynamic. Static obfuscation embeds obfuscated instructions into the executable file. Dynamic obfuscation allocates a new memory block and copies obfuscated API function code into the newly allocated block.
For dynamic obfuscation, I suggest memory access analysis. Previous approaches use pattern matching of the obfuscating code or code optimization on instruction trace. Pattern matching and code optimization based approaches are fragile to pattern change along the version up of the packers. My approach utilizes the API function obfuscation process which is harder to change than obfuscation pattern. Embedded obfuscator in packed file obfuscates each API function during runtime by reading the original API function code and writing the obfuscated API code on a newly allocated memory block. Memory access analysis relates memory reads of each API function and its corresponding memory writes. Memory access analysis produces a map from the obfuscated API function addresses to the original API function. Obfuscated API calls are retrieved by obfuscated call pattern at OEP. Each obfuscated call instruction is replaced by the deobfuscated API calls of which the call target is resolved by the map from memory access analysis. This deobfuscation method is implemented with Intel Pin to record each memory read/write/execute of the packed binary.For static obfuscation, I suggest iterative run-until-API method. Previous approaches used code emulators to identify obfuscated API calls. But most code emulators are not appropriate for deobfuscation because they are developed for emulating the whole operating system. Developing own emulators is time consuming because it requires implementing complex runtime behavior, such as exception based branches and multi-threads that modern packers use. I use a dynamic binary instrumentation tool - Intel Pin - by which the process can be monitored without being detected by protection mechanisms of the packers. After executing the packed binary until the original entry point, the tool changes the instruction pointer into an obfuscated API call address. The execution continues until the instruction pointer reaches the real API function. So the original API function is identified, but the function itself is not executed. In order to confirm the identified API function is correct, the integrity of stack pointer and stack data is also checked. This process is performed for each obfuscated API call instruction. In order to identify obfuscated API calls, the tool searches for all call instructions of which the target address is in the other section of the process.With the two deobfuscation methods, obfuscated API calls of Themida 32/64 packed binaries can be deobfuscated. We can analyze the deobfuscated binary with common reversing tools, such as x64dbg, Ollydbg and IDA Pro.
Seokwoo Choi is a Senior Member of the Engineering Staff at The Attached Institute of ETRI. He previously was a member of the research staff at Korea Telecom. He earned his PhD in Computer Science from the Korea Advanced Institute of Science and Technology in 2009. He was a speaker at Black Hat Asia 2015.