Paper Requirement for Shahzad Rajput

Paper:
Shahzad K. Rajput, Virgil Pavlu, Peter B. Golbus and Javed A. Aslam A Nugget-based Test Collection Construction Paradigm, submitted to the 20^th ACM CIKM, 2011, Glasgow, UK.
Description:
The problem of building test collections is central to the development of information retrieval systems such as search engines. The primary use of test collections is the evaluation of IR systems. The widely employed "Cranfield paradigm" dictates that the information relevant to a topic be encoded at the level of documents, therefore requiring effectively complete document relevance assessments. As this is no longer practical for modern corpora, numerous problems arise, including scalability, reusability, and applicability.
We propose a new method for relevance assessment based on relevant information, not relevant documents. Once the relevant information is collected, any document can be assessed for relevance, and any retrieved list of documents can be assessed for performance. Starting with a few relevant "nuggets" of information manually extracted from existing TREC corpora, we implement and test a method that finds and correctly assesses the vast majority of relevant documents found by TREC assessors, as well as up to four times more additional relevant documents. We then show how these inferred relevance assessments can be used to perform IR system evaluation. Our main contribution is a methodology for producing test collections that are highly accurate, more complete, scalable, reusable, and can be generated with similar amounts of effort as existing methods, with great potential for future applications.
Summary of SIGIR'11 reviews and points addressed in CIKM'11 submission:
- Evidence that a small number of nuggets can cover a large and diverse set of documents was missing:
  The revised paper contains results on ClueWeb09, which is a web-based coprus, with large and diverse set of documents. We also validate that a small number of nuggets can cover a large and diverse set of documents, such as ClueWeb09.
- Argument that nuggets based relevance evaluation can handle dynamic collections was missing:
  We extract nuggets from a sample (training set) and then, using those nuggets, infer the relevance of the documents from outside that sample (test set). The test set does not need to be a fixed set of documents, new documents may be added or older documents may be modified. Therefore, by design our method handles dynamic collections.
- Argument that the expanded set is not largely a collection of redundant documents missing:
  Our methodology works under the assumption that a large number of documents may contain the "nugget(s)". By design, the matching algorithm would match a nugget with a document even if it is not contained in the document as it is. This allows us to infer a large number of non-duplicate documents as relevant. This assumption has been validated by a user study, and the argument has been included in the paper.
- Argument about the cost-effective of the nuggets based method missing:
  Assuming TREC assessors spent one minute per document, overall the entire TREC-8 qrel took about 36 man-weeks. SampleAdHoc, which is about 11% the size of entire TREC 8 qrel, by proportion required about 4 man-weeks for binary relevance assessments. For the relevant documents found in the sample, we spent an additional 2.1 man-weeks on extracting nuggets; thus the total human effort required for our method on SampleAdHoc is about 6.2 man-weeks.
  Under the same assumption, TREC spent about 11 man-weeks in creating the entire ClueWeb09 qrel. SampleWeb, which is about 38% the size of entire ClueWeb09 qrel, required about 4 man-weeks. Nugget extraction from relevant documents in the sample took another 1.6 man-weeks, for a total human effort on SampleWeb of about 5.6 man-weeks.
  Argument added to the paper.
- It was suggested to compare "nugget MAP (excluding 10 systems)" with "TREC MAP" for "Reusability" results?
  Plot added in the paper.
- Why is the proposed method underestimating top systems?
  The top systems are underestimating mainly due to one reason: some of the unique relevant documents brought into the pool by these systems are infered not relevant due to a missing aspect. The argument has been added to the paper.
- A comparison of inter-assessor agreement with past studies missing:
  The comparison has been added to the paper.
- There were some typos in the paper submitted to SIGIR.
  Majority of the typos have been fixed in the paper.
- Appreciation of the work by reviewers:
  "It is addressing an important an interesting topic, but I just found it very hard to understand at the correct level of detail in important places."
  "It is a straightforward idea with good ultimate performance, but the paper would be stronger with more analysis and discussion of some issues."
  "Surprisingly good results from a simple method, and a paper which would be likely to stimulate lots of discussion. More discussion and analysis would strengthen this submission."
- Complete reviews can be seen here.
Advisor's statement on the student's contributions to the paper and how these contributions provide evidence of research potential:

This paper proposes a new paradigm for assessing and encoding relevant information in Information Retrieval, with applications to search engine training and evaluation. Shahzad was the lead author on this work: he contributed a majority of the ideas, he conducted all of the experiments, and he contributed a majority of the writing. The experiments, in particular, involved the creation of a user study, the implementation of a user interface for this study, and an extensive analysis of the results obtained, all conducted entirely by Shahzad.
This paper represents quality publishable work, in my opinion. The paper was submitted to SIGIR'11, where it was not accepted; however, SIGIR is the premier venue for IR research and a most difficult conference in which to be published. (SIGIR has an historical average acceptance rate of 18%.) The paper will be resubmitted to CIKM'11, and I have no doubt that it will be published at some point.
Given the above, I believe that this paper demonstrates Shahzad's research potential, in terms of ideas, execution, and writing.
-- Prof. Javed A. Aslam