Created: Tue 11 May 2010
Last modified:
Assigned:
Tue 15 Jun, 2010
Proposals Due:
Fri 18 Jun, 2010
Presentations Held:
Wed/Thu 23-24 Jun, 2010
Projects Due:
Fri 25 Jun, 2010
Required of Graduate Students Only
Your term project consists of either a third
project or an in-class presentation on an IR topic.
You must submit a short (one-paragraph) proposal acknowledging and/or
describing the project or presentation topic you have chosen by
the "proposal due" date listed above. (E-mail or hardcopy is fine.)
Projects
Your task is to complete an IR project similar is scope to the projects
already assigned in this course. One such example project is to
build a metasearch system as described in the following
Metasearch Project document.
Examples of other projects include:
- implementing and testing a clustering algorithm,
- implementing and testing collaborative filtering system,
- implementing and testing a compression utility,
- and so on...
Any reasonable implementation topic relevant to the course is
acceptable. The metasearch project described above is most well
specified; if you are interested in another topic (such as those
listed above), please contact me so that I can provide guidance and
resources.
Presentations
Your task is to investigate an area of Information Retrieval research
that you find interesting. You will read 2-5 research papers from
(mostly) refereed publications in order to get a sense of what has
been done. You may end up covering fewer research papers. Your
in-class presentation should run approximately 15 minutes and should
highlight the interesting or exciting parts of the work you
explored.
Guidelines
An excellent presentation will make it clear that you have looked at
the few papers in sufficient depth to have noticed something
interesting and intriguing. The summary of the work will be succint
and demonstrate that you've thought about it sufficiently to distill
it to its essence. Slides for the presentation will be well executed
and easy to read. The presentation itself will be energetic and fun
(this will not count as heavily as it might since not everyone is
comfortable--let alone energetic and fun--in front of an
audience). The audience should be left anxious to read your paper(s).
More toughts on the content of the presentation and
paper are listed below.
Source material
You should be using primarily refereed papers (e.g., conferences and
journals). Here are some useful sources and how to get ahold of them:
- Proceedings of the SIGIR , CIKM and ECIR conferences.
- ACM Transactions on Information Systems journal.
- Journal of the American Society for Information Science and
Technology.
- Information Processing and Management.
You may get some of your information from the TREC proceedings.
However, TREC proceedings are not refereed and often are rather sparse
in the details presented. If you use a TREC paper, you should find
some refereed version of the results to confirm that what was
presented is accurate. Here is
the TREC homepage off of which you
can find the proceedings.
Many papers are available via
the ACM Digital
Library, Google Scholar,
and CiteSeer.
Topics
Here are some topics that could make good papers,
roughly grouped into affinity areas. Some have been discussed in class,
meaning that you'd have a better starting point. Others would be new to you
if you don't have any additional source of information. You should not feel
entirely constrained by this list, though most people will end up choosing
from it.
- Evaluation
- Techniques for finding relevant documents more quickly using
the pooled approach
- Comparison of evaluation measures, their stability, how they
scale
- The TREC robust track (trying to get rid of or recognize
poorly-performing queries)
- Question answering
- Methods for finding passages that might contain an answer
- Description of some of the better QA systems
- Dialogue in question answering
- Other sources of data
- Retrieving spoken documents (speech recognizer output) (an
old TREC track)
- Retrieving documents (an old TREC track)
- Web retrieval (a TREC track)
- Terabyte-scale retrieval (a TREC track)
- Genomics retrieval (a TREC track)
- Multimedia indexing and retrieval
- Direct retrieval of images and/or video
- Retrieval via surrounding or descriptive text
- Retrieval via image/video annotation
- Summarization
- Summarizing a single document
- Summarizing multiple documents
- Headline-type summaries
- Summarizing in other languages
- Cross-language and multi-lingual retrieval
- Research coming out of CLEF (European Cross-Language
Evaluation Forum)
- Details on some techniques
- Sparse language issues (recent special issues in ACM
Transactions on Asian Language Information Processing)
- Other interesting stuff not touched on in class
- Semantic Web
- Human interaction issues
- Structured documents (XML)
- String search algorithms
- More in-depth look at some aspect of some topic from class
Topics are first-come, first-served. Two (or more) people can have
the same topic only if they specify in advance how they will be
specializing their presentations.
Presentation
The goal of the presentation is to find and talk about something that
is intriguing. It could be something that runs counter to something
said in class, or that pushes an idea from class in an interesting
way. It could be something that was never mentioned in class, but
that is pretty cool and slick. It could be an outrageous claim that,
now that you're most of the way through the course, you don't believe.
It could be an open problem that you think would be exciting to
tackle.
Remember that this is something you think is intriguing. Find
some way to make it clear that it is intriguing, so your
audience understands why you picked it. What makes it exciting?
You have about 15 to 20 minutes for a presentation. In that time,
you'll need to provide just enough background for your tidbit to make
sense, and to present your tidbit. A good rule of thumb is 1 to 3
minutes per slide, so you shouldn't expect to use many more than 10 to
15 slides to fit within the time available. You should, of course,
practice your presentation to ensure that it is roughly 15 to 20
minutes.
DO NOT PLAGIARIZE. If you copy any text from any other source,
regardless of whether the source is one that you used, of whether you include
it in your bibliography, of whether it is published, of whether it is readily
available on the Web, of anything--if you copy any such text, you must
put it in quotation marks and/or indent it and indicate exactly where it came
from, including a complete citation and a page number.