Optimal Allocation of Crowdsourced Resources for Information Retrieval Evaluation
Fri 02.19.16
Optimal Allocation of Crowdsourced Resources for Information Retrieval Evaluation
Fri 02.19.16
Fri 02.19.16
Fri 02.19.16
Fri 02.19.16
Fri 02.19.16
Evaluating the performance of information retrieval systems, such as search engines, is critical to their effective development. Current “gold standard” performance evaluation methodologies generally rely on the use of expert assessors to judge the quality of documents or web pages retrieved by search engines, at great cost in time and expense. The advent of “crowd sourcing,” such as available through Amazon’s Mechanical Turk service, holds out the promise that these performance evaluations can be performed more rapidly and at far less cost through the use of many (though generally less skilled) “crowd workers”; however, the quality of the resulting performance evaluations generally suffer greatly. The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time. For example, one might ask inexpensive crowd workers what are likely to be “easy” questions while reserving what are likely to be “hard” questions for the expensive experts. While the project focuses on the performance evaluation of search engines as its use case, the techniques developed will be more broadly applicable to many domains where one wishes to efficiently and effectively harness experts and crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will be developed within which one can quantify the uncertainty about a performance evaluation as well as the cost and expected utility of asking any assessor (expert or crowd worker) any question (e.g. a nominal judgment for a document or a preference judgment between two documents) at any time. The goal is then to ask the “right” question of the “right” assessor at any time in order to maximize the expected utility gained per unit cost incurred and then to optimally aggregate such responses in order to efficiently and effectively evaluate performance.
For further information, visit the project website.
Evaluating the performance of information retrieval systems, such as search engines, is critical to their effective development. Current “gold standard” performance evaluation methodologies generally rely on the use of expert assessors to judge the quality of documents or web pages retrieved by search engines, at great cost in time and expense. The advent of “crowd sourcing,” such as available through Amazon’s Mechanical Turk service, holds out the promise that these performance evaluations can be performed more rapidly and at far less cost through the use of many (though generally less skilled) “crowd workers”; however, the quality of the resulting performance evaluations generally suffer greatly. The thesis of this project is that one can obtain the best of both worlds — performance evaluations with the quality of experts but at the cost of crowd workers — by optimally leveraging both experts and crowd workers in asking the “right” assessor the “right” question at the “right” time. For example, one might ask inexpensive crowd workers what are likely to be “easy” questions while reserving what are likely to be “hard” questions for the expensive experts. While the project focuses on the performance evaluation of search engines as its use case, the techniques developed will be more broadly applicable to many domains where one wishes to efficiently and effectively harness experts and crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will be developed within which one can quantify the uncertainty about a performance evaluation as well as the cost and expected utility of asking any assessor (expert or crowd worker) any question (e.g. a nominal judgment for a document or a preference judgment between two documents) at any time. The goal is then to ask the “right” question of the “right” assessor at any time in order to maximize the expected utility gained per unit cost incurred and then to optimally aggregate such responses in order to efficiently and effectively evaluate performance.
For further information, visit the project website.