Midterm Details for March 11 Midterm
ISU535 Information Retrieval - Spring 2004 - Prof. Futrelle
Updated 7 March
There are four major topics on the Midterm.
My approach to the Midterm is to ask you rather specific questions
about the two papers we read. The other questions will require that
you know the algebraic expressions for three different topics.
I'm asking for this, because I want everyone to have some "hard"
knowledge of information retrieval, not just producing general and
often vague and hard-to-grade answers about the non-quantitative topics.
The only way you can hope to do well on the Midterm is to practice putting
numerical values into the equations you'll be memorizing.
That is, you'll be asked for formulas and calculations with them, so
the only way to study is to memorize the formulas and practice doing
some sample calculations with them. This approach to studying
is really quite simple and commonsensical.
In addition, read the material explaining the ideas behind the equations.
Don't just memorize the equations and look at nothing else.
You may be asked questions that require that you understand the purpose
and use of the various expressions being evaluated.
As for preparing to write answers to a question(s) about the papers,
be sure you can spell the requisite technical words correctly.
Again, practice writing a bit about each of the two papers without
having them in front of you. Then you'll be much better prepared for
the Midterm than if you hadn't prepared in this way. (Years ago, my
girlfriend chastised me for misspelling "immediately". I replied to
her letter, no email then, with a letter in which I wrote it spelled
correctly a hundred times. I've never misspelled it since.)
- tf-idf. Section 2.5.3.
This was on Quiz #1, but many did not give concise and
accurate answers. You will be asked to write out the equations involved,
explain them and evaluate them for a few examples. You may be asked to
compare cases with and without stop words for example.
You should also understand how the weights computed via tf-idf
are used in the Vector Model inner product as discussed in this section.
You should at least memorize the first line of the sim(dj,q) formula
on page 27 as well as its interpretation on the next page.
- Recall and Precision. Section 3.2.1
You will need to be able to explain both of
these via correct formulas and via diagrams, as well as evaluating
them for numerical values I will give you.
- Of the three papers we studied, you will be asked a question or two about
two of them, the Vannevar Bush paper and the Jim Gray paper. See the
three papers assignment page
for a list. I've given you a copy of the Bush paper and you can easily
retrieve Gray's paper through the ACM access you get through our library.
- In Chapter 6 you must memorize Zipf's law as applied to word
frequencies, page 146, and know how to compute frequencies from it.
For example, given the frequency of the top-ranked word "the" (j=1)
you should be able to compute the frequencies of a few words of lower rank.
On page 147 you need to memorize Heap's law and be able to estimate
the number of distinct words, V, in a document for values of the parameters
K and beta that I'll give you. These two things you need to know are
trivial if you memorize them correctly, understand what the terms
correspond to and can do simple arithmetic correctly.
Go to ISU535 home page.
or RPF's Teaching Gateway or
homepage