CS 7800/4810 – Course Schedule
Topics and Agenda
This schedule will be updated regularly as the course progresses. Check back
frequently. We will usually post lecture slides by the end of the day
following a lecture. Lectures will not be recorded.
Part 1: Fundamental Data Structures
Part 2: String Algorithms
Part 3: Hashing Techniques
Part 4: Filters and Sketches
- (): Filters (Bloom/Quotient/Cuckoo) Notes
- (): Adaptive filters Notes
Scribe: Yiyang Zhang
- (): Range filters Notes
Scribe: Peter Li
- (): Misra-Gries, Count Sketch, Count-Min Sketch Notes
Scribe: Brianna Marshall
- (): Cardinality Estimation (HyperLogLog) Notes
Scribe: Michael Pimble
- (): Cardinality Estimation (Mash) Notes
Scribe: Jenish
Part 5: Similarity and Nearest Neighbor Search
- (): Nearest Neighbor Notes
Scribe: Eric Chen
- (): Locality Sensitive Hashing Notes
Scribe: Reilly teeter
- (): IVF & HNSW & Vamana Notes
Scribe: Sanjana Injamuri
Part 6: Distributed Systems
- (): Consistent Hashing Notes
Scribe: Brady Aber
- (): Distributed Hash Table (CHORD) Notes
Scribe: Sai Srikanth
Part 8: Review and Presentations
- (): Final Project Presentations
Literature List
Part 1: Fundamental Data Structures
Part 2: String Algorithms
Part 3: Hashing Techniques
Part 4: Filters and Sketches
- A General-Purpose Counting Filter: Making Every Bit Count (P. Pandey, M.A. Bender, R. Johnson & R. Patro, SIGMOD 2017)
- Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design (P. Pandey et al., SIGMOD 2021)
- Cuckoo Filter: Practically Better Than Bloom (B. Fan, D.G. Andersen, M. Kaminsky & M.D. Mitzenmacher, CoNEXT 2014)
- Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters (T.M. Graf & D. Lemire, Journal of Experimental Algorithmics 2020)
- Range Emptiness in Constant Time and Optimal Space (Authors TBD)
- SuRF: Practical Range Query Filtering with Fast Succinct Tries (H. Zhang, H. Lim, V. Leis, D.G. Andersen, M. Kaminsky, K. Keeton & A. Pavlo, SIGMOD 2018)
- GraNite: Graph Neural Network Inference for Transferable Power Flow (Authors TBD)
- Broom: An Open-Addressing Quotient Filter (M.A. Bender et al., ESA 2021)
- Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier (Z. Dai & A. Shrivastava, NeurIPS 2020)
- An Improved Data Stream Summary: The Count-Min Sketch and its Applications (G. Cormode & S. Muthukrishnan, Journal of Algorithms 2005)
- Finding Frequent Items in Data Streams (G.S. Manku & R. Motwani, VLDB 2002)
- Finding Repeated Elements (J. Misra & D. Gries, Science of Computer Programming 1982)
- Methods for Finding Frequent Items in Data Streams (G. Cormode & M. Hadjieleftheriou, VLDB Journal 2009)
- Mash: Fast Genome and Metagenome Distance Estimation Using MinHash (B.D. Ondov, T.J. Treangen, P. Melsted, A.B. Mallonee, N.H. Bergman, S. Koren & A.M. Phillippy, Genome Biology 2016)
- Dashing: Fast and Accurate Genomic Distances with HyperLogLog (D.N. Baker & B. Langmead, Genome Biology 2019)
- Dashing 2: Genomic Distances with Sketching (D.N. Baker & B. Langmead, BMC Bioinformatics 2022)
Part 5: Similarity and Nearest Neighbor Search
- Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality (P. Indyk & R. Motwani, STOC 1998)
- Space-Saving: A Framework for On-Line Computation of Quantiles (A. Metwally, D. Agrawal & A. El Abbadi, VLDB 2005)
- b-Bit Minwise Hashing (P. Li & A.C. König, WWW 2010)
- Locality-Sensitive Hashing Scheme Based on p-Stable Distributions (M. Datar, N. Immorlica, P. Indyk & V.S. Mirrokni, SCG 2004)
- Count-Min Sketch (Class Notes) (Course Material)
- On the Resemblance and Containment of Documents (A.Z. Broder, SEQUENCES 1997)
- Similarity Search in High Dimensions via Hashing (A. Gionis, P. Indyk & R. Motwani, VLDB 1999)
- Weighted MinHash on GPU (S. Ioffe, 2010)
- One Permutation Hashing (P. Li, A. Owen & C. Zhang, NIPS 2012)
Part 6: Graph Algorithms
- Ligra: A Lightweight Graph Processing Framework for Shared Memory (J. Shun & G.E. Blelloch, PPoPP 2013)
- The GAP Benchmark Suite (S. Beamer, K. Asanović & D. Patterson, arXiv 2015)
- Low-Latency Graph Streaming using Compressed Purely-Functional Trees (L. Dhulipala, G.E. Blelloch & J. Shun, PLDI 2019)
- Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs (P. Pandey, B. Wheatman, H. Xu & A. Buluc, SIGMOD 2021)
- How to Apply de Bruijn Graphs to Genome Assembly (P.E.C. Compeau, P.A. Pevzner & G. Tesler, Nature Biotechnology 2011)
- ReadJoiner: A Fast and Memory Efficient String Graph-Based Sequence Assembler (G. Gonnella & S. Kurtz, BMC Bioinformatics 2012)
- Computational Pan-Genomics: Status, Promises and Challenges (The Computational Pan-Genomics Consortium, Briefings in Bioinformatics 2018)
- Variable-Order de Bruijn Graphs (C. Boucher, A. Bowe, T. Gagie, S.J. Puglisi & K. Sadakane, DCC 2015)
Part 7: Distributed Systems
Acknowledgements
The lecture notes in the course are derived from course material from Prof. Erik Demiane MIT, Prof. Michael A. Bender SBU, Prof. Martin Farach-Colton Rutgers, Prof. Ben Langmead JHU.