Our world is instrumented with countless sensors. While many are outside of our direct control, there is an incredible amount of publicly available information being generated and gathered all the time. While much of this data goes by unnoticed or ignored it contains fascinating insight into the behavior and trends that we see throughout society. The trick is being able to identify and isolate the useful patterns in this data and separate it from all the noise.
Previously, we looked at using sites such as Craigslist to provide a wealth of wonderfully categorized information and then used that to answer questions such as "What job categories are trending upward?", "What cities show the most (or the least) promise for technology careers?", and "What relationship is there between the number of bikes for sale and the number of prostitution ads?" After achieving initial success looking at a single source of data, the challenge becomes to generate more meaningful results by combining separate data sources that each views the world in a different way. Now we look across multiple, disparate sources of such data and attempt to build models based on the trends and relationships found therein.
The initial inspiration for this work was a fantastic talk at DC13, "Meme Mining for Fun and Profit". It also builds upon a similar talk I presented at DC18. And once again seeks to inspire others to explore the exploitation of such publicly available sensor systems.