Finding predictable patterns in data can lead to new insights, better products, new profits, and even new cures for diseases. But creating predictive algorithms can be hard due to the novelty of the data, noise in the data, and the nuances of the domain. So, some data-intensive companies have turned to collaborative innovation to find better algorithms. These companies post large data sets gleaned from customer interactions and host an open competition to see who could create the best algorithm and win a lavish prize — $1 million in the case of NetFlix. Teams form, collaborate, developalgorithms, and compete to refine their ideas until a winner is found. This new process of crowdsourcing the development of sophisticated, innovative algorithms seems extremely promising.
The general roadmap for any new innovative process is that some individual companies try it first. And if it works, then some company comes along and creates a service that makes it easy for other companies to employ that process, too. The process of crowdsourcing algorithm development for complex “big-data” prediction problems is no different.
To that end, Kaggle.com created a service that lets companies run these kinds of competitions. The service makes it easy for companies to upload a big dataset, define the terms of the competition, and turn it loose on Kaggle’s hundreds of farflung teams of experts in machine learning, statistics, and algorithm development. The submitting company gains access to experts in over 100 countries and 200 universities. The competitors get a interesting problem to work on. And the winner receives a nice prize in exchange for giving the winning algorithm to the company.
To date, some two dozen organizations have already used Kaggle to collaboratively create algorithms to predict HIV disease progression, map dark matter, detect drowsy drivers, forecast tourism, and many others.