Malicious file analysis is well beyond the days when the humble PE32 file was all researchers needed to contend with. The use of malicious PDF, Office, and other files present a far more diverse threat than our defensive tools were originally designed to handle. To make matters worse, the sheer volume of files over time to analyze presents a meaningful logistical problem which becomes increasingly complex as analytical methods move from static to dynamic analysis. When the point in time problem is considered (the fact that historical discoveries can be viewed differently in the light of new analytical techniques or information), the problem seems all but intractable.
To this end, we have developed TOTEM, a system which is capable of coordinating, orchestrating, and scaling malware analytics across multiple cloud providers and thousands of running instances. It is easy to add new capabilities to and can intelligently segregate work based on features, such as filetype, analytic duration, and computational complexity. TOTEM supports dynamic analysis through DRAKVUF, a novel open-source dynamic malware analysis system which was designed specifically to achieve unparalleled scalability, while maintaining a high level of stealth and visibility into the executing sample. Building on the latest hardware virtualization extensions found in Intel processors and the Xen hypervisor, DRAKVUF remains completely hidden from the executing sample and requires no special software to be installed within the sandbox. Further addressing the problem of monitoring kernel-mode rootkits as well as user-space applications, DRAKVUF significantly raises the bar for evasive malware to remain undetected.This talk will discuss the design, implementation, and practical deployment of TOTEM and DRAKVUF to analyze tremendous numbers of binary files.