Automated Twitter accounts have been making headlines for their ability to spread spam and malware as well as significantly influence online discussion and sentiment. In this talk, we explore the economy around Twitter bots, as well as demonstrate how attendees can track down bots in through a three step methodology: building a dataset, identifying common attributes of bot accounts, and building a classifier to accurately identify bots at scale.
We first demonstrate how to amass a large dataset of public Twitter accounts using the Twitter API, gathering basic profile information as well as public activity from each account. We go on to gather and map the "social graph" of each account, such as who the account is following and, likewise, who is following the account.
After this dataset has been obtained, we explore how to identify bots within it. We show common techniques used by real-world bot operators to try and keep the bot "under the radar", which can in many cases be used to help to fingerprint the bot. Finally, we demonstrate how we can tackle the bot problem at scale using data science to build a classifier that accurately identifies bots across our large global dataset.