We could all have predicted this with our magical Big Data analytics platforms, but it seems that machine learning is the new hotness in Information Security. A great number of start-ups with 'cy' and 'threat' in their names that claim that their product will defend or detect more effectively than their neighbors' product "because math." And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine learning is a very powerful tool box, but not every tool can be applied to every problem and that's where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The ghost of marketing past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter in the past year as a part of his ongoing research project on the subject, in particular he'll present some interesting results obtained since the last presentation at Black Hat USA 2013, and some ideas that could improve the application of machine learning for use in information security, specially in its use as a helper for security analysts in incident detection and response.