CS6200 Information Retrieval, SUMMER-FULL 2020

about CS6200          home          schedule          grades          piazza         data resources

* Schedule and materials subject to change
Week / Topics Lecture Reading Assignment
  • Week1   5/4 - 5/11  Retrieval Models / Intro
  • IR intro
  • Queries and documents
  • Matching scores
  • Cosine
  • ElasticSearch demo

  • HW 1
  • Due: Fri 5/22
  • Week 2   5/11 - 5/18  Retrieval Models / Vector Space
ElasticSearch demo
AP89 data
Tf-Idf, BM25
Co-occurrence, bigrams
trec_eval
Metasearch


  • Week3    5/18 - 5/25   Retrieval Models / Language Models
  • Language generative Models
  • Query Likelihood
  • Model Divergence
  • Smoothing
  • Relevance Feedback




 [PAPER] A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval  by Chengxiang Zhai and John Lafferty

  • Week 4    5/25 - 6/1  Indexing / Index Construction
  • Inverted Index
  • Stopping, stemming
  • Index Constrcution
  • Ngrams, Skipgrams, Finding Blurbs
  • Co-occurrance
  • MON 5/25 Memorial Day, NO CLASS



Notes: Indexing

  •  

  • HW 2
  • Due: Wed 6/10
  • Week 5    6/1 - 6/8  Indexing / Storage
  • Compression
  • Zipfs and Heaps laws
  • Index Storage
  • Distributed Indexes


Notes : Lempel Ziv

  • Week 6    6/8 - 6/15  Crawling
  • Crawling Basics
  • HTTP Links
  • Graph BFS recap
  • Frontier/Queue
  • Duplicates
  • MON 6/8 NEU Reflection Day, NO CLASS


Web Crawler - Wikipedia
Web Crawling Tutorial by Christopher Olston and Marc Najork inr-017.dvi
  • Week 7   6/15 - 6/22  Crawling / Merging
  • Link Graph
  • Freshness VS Coverage
  • Vertical Search




  • Week 8    6/22 - 6/29  Link Graph
  • PageRank
  • Topical PageRank
  • HITS
  • SALSA
  • Intro: IR Evaluation



  • HW4
  • Due: Tue 7/13
  • Week 9    6/29 - 7/6   IR Evaluation/ Measures
  • IR ranking performance
  • Set measures: Precision, Recall, F1, Accuracy,  ROC, confusion matrix
  • Ranking measures: R-prec, AP, nDCG, Reciprocal Rank
  • Significance tests



Paper: IR Metrics VS users
Wikipedia: ROC
  • Week 10    7/6 - 7/13  IR Evaluation / Assessments
  • Relevance Assessments
  • Diversity eval using subtopics
  • Assessors, Crowdsourcing, Cost
  • Assessor Interface
  • Test Collection Construction







  • Week 11    7/13 - 7/20  Machine Learning / Features
  • Document understanding
  • Features
  • Extracting Query Features
  • Similarity
  • How to measure ML
  • ML algorithms



  • Week 12    7/20 - 7/27  ML / Algorithms / Ranking
  • Text Classification with unigrams
  • Sparse matrix
  • How/Why Learning Works
  • What to expect
  • Learning to Rank
  • LambdaMart
  • Pairwise Models


  • Online ClassDecision Trees (annotated pdf)
  • Cheng's note, sparse format, Learning Code for HW7

    Bingyu's demo for HW7


    Paper: From RankNet to LambdaRank to LambdaMart
    Wikipedia: Learning To Rank
    Paper: Yahoo Learning to Rank challenge

    Paper: AdaBoost and  Rankboost
    Paper: RankNet
    Paper: LambdaMart, LambdaMart2


    LSTM
    • HW7
    • Due: Mon 8/10
    • Week 13   7/27 - 8/3  ML/ Clustering / Topic Models
    • Clustering
    • LDA/PCA
    • Topic Models



    • Slides : Topic Models/LDA (Blei)

    • * https://github.com/JohnLangford/vowpal_wabbit/wiki
    • * http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html
    • * https://ariddell.org/lda.html
      * https://pypi.python.org/pypi/lda
    • * https://cran.r-project.org/web/packages/lda/index.html
    • * http://programminghistorian.org/lessons/topic-modeling-and-mallet
    •  

    • HW8
      (no credit, but can make up 50 points of past hws)
    • Due: Tue 8/15
    • Week 14    8/3 - 8/10  Adversarial IR,  Spam
    • Spam
    • Link Farms