Paper Requirement for Shahzad Rajput

  1. Paper:

    Shahzad K. Rajput, Virgil Pavlu, Peter B. Golbus and Javed A. Aslam A Nugget-based Test Collection Construction Paradigm, submitted to the 20th ACM CIKM, 2011, Glasgow, UK.

  2. Description:

    The problem of building test collections is central to the development of information retrieval systems such as search engines. The primary use of test collections is the evaluation of IR systems. The widely employed "Cranfield paradigm" dictates that the information relevant to a topic be encoded at the level of documents, therefore requiring effectively complete document relevance assessments. As this is no longer practical for modern corpora, numerous problems arise, including scalability, reusability, and applicability.

    We propose a new method for relevance assessment based on relevant information, not relevant documents. Once the relevant information is collected, any document can be assessed for relevance, and any retrieved list of documents can be assessed for performance. Starting with a few relevant "nuggets" of information manually extracted from existing TREC corpora, we implement and test a method that finds and correctly assesses the vast majority of relevant documents found by TREC assessors, as well as up to four times more additional relevant documents. We then show how these inferred relevance assessments can be used to perform IR system evaluation. Our main contribution is a methodology for producing test collections that are highly accurate, more complete, scalable, reusable, and can be generated with similar amounts of effort as existing methods, with great potential for future applications.

  3. Summary of SIGIR'11 reviews and points addressed in CIKM'11 submission:

  4. Advisor's statement on the student's contributions to the paper and how these contributions provide evidence of research potential:

    This paper proposes a new paradigm for assessing and encoding relevant information in Information Retrieval, with applications to search engine training and evaluation. Shahzad was the lead author on this work: he contributed a majority of the ideas, he conducted all of the experiments, and he contributed a majority of the writing. The experiments, in particular, involved the creation of a user study, the implementation of a user interface for this study, and an extensive analysis of the results obtained, all conducted entirely by Shahzad.

    This paper represents quality publishable work, in my opinion. The paper was submitted to SIGIR'11, where it was not accepted; however, SIGIR is the premier venue for IR research and a most difficult conference in which to be published. (SIGIR has an historical average acceptance rate of 18%.) The paper will be resubmitted to CIKM'11, and I have no doubt that it will be published at some point.

    Given the above, I believe that this paper demonstrates Shahzad's research potential, in terms of ideas, execution, and writing.

    -- Prof. Javed A. Aslam