(1) Document Corpus and Query

DOCUMENTS:

d1: for english model retireval have a relevance model while vectors space model retrieval dont
d2: R-precision measure is relevant to average precision measure
d3: most efficient retrieval models are language model and vector space model
d4: english is the most efficient language
d5: retrieval efficiency is measured by average precision

==============================================================================

TERM by DOCUMENT TABLE (ignoring stopwords)

                  d1    d2    d3    d4    d5       term in corpus
english           1                 1                 2    
language          1           1     1                 3          
model             3           3                       6
retrieval         2           1           1           4
relevance         1     1                 1           3
vector            1           1                       2
space             1           1                       2
R                       1                             1
most                          1     1                 2
efficient                           1     1           2
measure                 2                 1           3
average                 1                 1           2
precision               2                 1           3

DOC LENGTH       10    7     8     4     6           35

==============================================================================

T=35
D=5
U=13
avg_doc_length=35/5=7

==============================================================================

QUERY:

"efficient retrieval model efficient"

(2) Vector Space Model

RAW TF
binary QUERY WEIGHTS (i.e. a term either occur in query or not)
Dot Product Similarity

                  d1    d2    d3    d4    d5    QUERY
model             3           3                 1
retrieval         2           1           1     1
efficient                           1     1     1
                  5     0     4     1     1

==============================================================================

ROBERTSON's TF = TF/(TF+k), k=1
binary QUERY WEIGHTS
Dot Product Similarity

                  d1    d2    d3    d4    d5    QUERY
model             3/4        3/4                 1
retrieval         2/3        1/2          1/2    1
efficient                           1/2   1/2    1

                  13/12  0    5/4   1/2   1

==============================================================================

OKAPI TF = TF/[TF+k+c*(doclen/avglen)], k=0.5, c=1.5
binary QUERY WEIGHTS
Dot Product Similarity

                  d1    d2    d3    d4    d5    QUERY
model             0.53        0.57              1
retrieval         0.43        0.31        0.36  1
efficient                           0.42  0.36  1

                  0.96  0     0.88  0.42  0.72

==============================================================================

OKAPI TF= TF/[TF+k+c*(doclen/avglen)], k=0.5, c=1.5
TF QUERY
Dot Product Similarity

                  d1    d2    d3    d4    d5    QUERY
model             0.53        0.57              1
retrieval         0.43        0.31        0.36  1
efficient                           0.42  0.36  2

                  0.96  0     0.88  0.84  1.08

==============================================================================

IDF WEIGHTS = log(N/n_t)

english           log(5/2)=1.32
language          log(5/3)=0.73
model             log(5/2)=1.32
retrieval         log(5/3)=0.73
relevance         log(5/3)=0.73
vector            log(5/2)=1.32
space             log(5/2)=1.32
R                 log(5/1)=2.32
most              log(5/2)=1.32
efficient         log(5/2)=1.32
measure           log(5/2)=1.32
average           log(5/2)=1.32
precision         log(5/2)=1.32

==============================================================================

OKAPI TF*IDF
TF QUERY
Dot Product Similarity

                                                     
model       0.53*1.32   0     0.57*1.32   0           0           1
retrieval   0.43*0.73   0     0.31*0.73   0           0.36*0.73   1
efficient   0           0     0           0.42*1.32   0.36*1.32   2

            1.01        0    0.97        1.10        1.21

(3) Language Models and Smoothing