Crowdsourcing Algorithms to Solve Complex Problems
Christoph Riedl, a Northeastern professor with dual appointments in D'Amore-McKim School of Business and CCIS, designed and ran a crowdsourcing contest for the United States Patent and Trademark Office in 2013. The results of the contest were recently published in the International Journal of Document and Image Recognition, a peer-reviewed academic journal in the image-processing field.
By Shandana Mufti
More than 615,000 patent applications were filed with the United States Patent and Trademark Office (USPTO) in 2014. While the number of applications is staggering on its own, the amount of paperwork that comes with each application is even greater. Sorting through all those documents and images is not an easy task.
Enter crowdsourcing. While companies are increasingly turning to crowdsourcing to increase the rate of innovation, it’s still a fairly novel process for the federal government. Alongside several collaborators, Christoph Riedl, a Northeastern professor with dual appointments in D’Amore-McKim School of Business and CCIS, designed and ran a crowdsourcing contest for USPTO in 2013. The results of the contest were published in the International Journal of Document and Image Recognition, a peer-reviewed academic journal in the image-processing field.
“Rather than just trying to solve every problem inside the company, you reach out to others who might have special expertise that you might not have,” Riedl explains. “There’s a bigger, overall project run by the federal government to push these types of innovative approaches among the federal government.”
Participants competed in teams of two to design the best algorithm for identifying the figures and parts referred to in technical drawings, then linking those images to their textual description so a tool tips dialogue pops up when a user hovers their cursor over a given part of the image.
The challenge was a difficult one: to do text recognition within technical drawings. The teams were provided with the images and the texts of the patents. Access to the texts meant participants could correct errors as they worked – if a “part 3D” was detected but doesn’t actually exist, the corresponding part might be “3B” instead. At the end of the month-long contest, the submitted algorithms were run against a test dataset to reveal accuracy.
“[The algorithms] really had very high accuracy rates with regards to correctly identifying these parts and figure labels in these drawings,” Riedl says of the submissions. But there was a clear winner, though others among the top submissions had their respective strengths. “That’s actually another strength of crowdsourcing. Not only did we get a clear winner who submitted a good solution, but we could use that and combine it with the second best and third best solution and pick out the bits and pieces of those solutions that did really well.”
A total of 232 teams participated, of which 70 submitted algorithms. To mitigate effort-reducing effects resulting from high competition, the contest was split into 22 virtual competitions with 10 teams competing against each other in each mini competition. A grand prize winner took home $10,000 and the runner-up team won $5,000. Prizes of $1,000 and $250 were up for grabs in each group. “That makes these contests work much better than if you throw everyone in one pool,” Riedl explains.
Riedl’s contributions to the contest and to the paper focused more heavily on structuring and running the crowdsourcing competition than on poring over algorithms. While his research looks at crowdsourcing in general, one specific focus is crowdsourcing in a competitive setting. The twist on crowdsourcing in this contest was having participants work in teams of two instead of as individuals.
“We know a lot of how to incentivize individuals in competition settings, but once you’re competing in a team, the incentive is team-based,” Riedl says. “Either the team wins or the team doesn’t win. That’s where my research focuses on.”
The paper, titled “Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms,” focuses on both the structure of the contest and the use of crowdsourcing to solve innovation problems, and on the solution to the USPTO’s image-processing problem. Co-authors included researchers from the USPTO, Harvard Business School, London Business School, and Top Coder, the platform on which the contest was hosted.
Ultimately, the contest was relatively short – participants had just one month to perfect their algorithms. It was successful – a clear winner emerged. And it was relatively inexpensive – the total prize money of $50,000 was far less than the cost of hiring a full-time software developer.
“All that again goes to show how this model of running online crowdsourcing contests can really be a successful model for companies or organizations like the Patent Office to get access to outside information and solve some of their hardest algorithm or software development problems,” Riedl says.