Due to the exploding number of unique malware binaries on the Internet and the slow process required for manually analyzing these binaries, security practitioners today have only limited visibility into the functionality implemented by the global population of malware. To date little work has been focused explicitly on quickly and automatically detecting the broad range of high level malware functionality such as the ability of malware to take screenshots, communicate via IRC, or surreptitiously operate users’ webcams.
To address this gap, we debut CrowdSource, an open source machine learning based reverse engineering tool. CrowdSource approaches the problem of malware capability identification in a novel way, by training a malware capability detection engine on millions of technical documents from the web. Our intuition for this approach is that malware reverse engineers already rely heavily on the web “crowd” (performing web searches to discover the purpose of obscure function calls and byte strings, for example), so automated approaches, using the tools of machine learning, should also take advantage of this rich and as of yet untapped data source.
As a novel malware capability detection approach, CrowdSource does the following:
CrowdSource is funded under the DARPA Cyber Fast Track initiative, is being developed by the machine learning and malware analysis group at Invincea Labs and is scheduled for beta, open source release to the security community this October. In this presentation we will give complete details on our algorithm for CrowdSource as it stands, including compelling results that demonstrate that CrowdSource can already rapidly reverse engineer a variety of currently active malware variants.