CS 7800/4810 – Course Schedule
Topics and Agenda
This schedule will be updated regularly as the course progresses. Check back
frequently. We will usually post lecture slides by the end of the day
following a lecture. Lectures will not be recorded.
Part 1: Fundamental Data Structures
Part 2: String Algorithms
Part 3: Hashing Techniques
Part 4: Filters and Sketches
- (): Filters (Bloom/Quotient/Cuckoo)
Scribe: Peter Li
- (): Range filters
Scribe: Daniel Long
- (): Adaptive Filters
Scribe: Anh Nguyen
- (): Misra-Gries, Count Sketch, Count-Min Sketch
Scribe: Yiyang Zhang
- (): Cardinality Estimation (HyperLogLog)
Scribe: Brianna Marshall
Part 5: Similarity and Nearest Neighbor Search
- (): Nearest Neighbor
Scribe: Jenish
- (): Locality Sensitive Hashing
Scribe: Reilly teeter
- (): Min Hashing & Similarity Search
Scribe: Sanjana Injamuri
Part 6: Graph Algorithms
- (): Graphs Representation & Computations
Scribe: Yuvraj Gohil
- (): Streaming Graphs & Incremental Computations
Scribe: Shreyaan Pathak
- (): Graphs in Comp. Bio.
Scribe: Sai Srikanth
Part 7: Distributed Systems
- (): Consistent Hashing
Scribe: Brady Aber
- (): Distributed Hash Tables
Scribe: Eric Chen
Part 8: Review and Presentations
- (): Revision Lecture
- (): Final Project Presentations
Literature List
Part 1: Fundamental Data Structures
Part 2: String Algorithms
Part 3: Hashing Techniques
Part 4: Filters and Sketches
- A General-Purpose Counting Filter: Making Every Bit Count (P. Pandey, M.A. Bender, R. Johnson & R. Patro, SIGMOD 2017)
- Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design (P. Pandey et al., SIGMOD 2021)
- Cuckoo Filter: Practically Better Than Bloom (B. Fan, D.G. Andersen, M. Kaminsky & M.D. Mitzenmacher, CoNEXT 2014)
- Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters (T.M. Graf & D. Lemire, Journal of Experimental Algorithmics 2020)
- Range Emptiness in Constant Time and Optimal Space (Authors TBD)
- SuRF: Practical Range Query Filtering with Fast Succinct Tries (H. Zhang, H. Lim, V. Leis, D.G. Andersen, M. Kaminsky, K. Keeton & A. Pavlo, SIGMOD 2018)
- GraNite: Graph Neural Network Inference for Transferable Power Flow (Authors TBD)
- Broom: An Open-Addressing Quotient Filter (M.A. Bender et al., ESA 2021)
- Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier (Z. Dai & A. Shrivastava, NeurIPS 2020)
- An Improved Data Stream Summary: The Count-Min Sketch and its Applications (G. Cormode & S. Muthukrishnan, Journal of Algorithms 2005)
- Finding Frequent Items in Data Streams (G.S. Manku & R. Motwani, VLDB 2002)
- Finding Repeated Elements (J. Misra & D. Gries, Science of Computer Programming 1982)
- Methods for Finding Frequent Items in Data Streams (G. Cormode & M. Hadjieleftheriou, VLDB Journal 2009)
- Mash: Fast Genome and Metagenome Distance Estimation Using MinHash (B.D. Ondov, T.J. Treangen, P. Melsted, A.B. Mallonee, N.H. Bergman, S. Koren & A.M. Phillippy, Genome Biology 2016)
- Dashing: Fast and Accurate Genomic Distances with HyperLogLog (D.N. Baker & B. Langmead, Genome Biology 2019)
- Dashing 2: Genomic Distances with Sketching (D.N. Baker & B. Langmead, BMC Bioinformatics 2022)
Part 5: Similarity and Nearest Neighbor Search
- Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality (P. Indyk & R. Motwani, STOC 1998)
- Space-Saving: A Framework for On-Line Computation of Quantiles (A. Metwally, D. Agrawal & A. El Abbadi, VLDB 2005)
- b-Bit Minwise Hashing (P. Li & A.C. König, WWW 2010)
- Locality-Sensitive Hashing Scheme Based on p-Stable Distributions (M. Datar, N. Immorlica, P. Indyk & V.S. Mirrokni, SCG 2004)
- Count-Min Sketch (Class Notes) (Course Material)
- On the Resemblance and Containment of Documents (A.Z. Broder, SEQUENCES 1997)
- Similarity Search in High Dimensions via Hashing (A. Gionis, P. Indyk & R. Motwani, VLDB 1999)
- Weighted MinHash on GPU (S. Ioffe, 2010)
- One Permutation Hashing (P. Li, A. Owen & C. Zhang, NIPS 2012)
Part 6: Graph Algorithms
- Ligra: A Lightweight Graph Processing Framework for Shared Memory (J. Shun & G.E. Blelloch, PPoPP 2013)
- The GAP Benchmark Suite (S. Beamer, K. Asanović & D. Patterson, arXiv 2015)
- Low-Latency Graph Streaming using Compressed Purely-Functional Trees (L. Dhulipala, G.E. Blelloch & J. Shun, PLDI 2019)
- Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs (P. Pandey, B. Wheatman, H. Xu & A. Buluc, SIGMOD 2021)
- How to Apply de Bruijn Graphs to Genome Assembly (P.E.C. Compeau, P.A. Pevzner & G. Tesler, Nature Biotechnology 2011)
- ReadJoiner: A Fast and Memory Efficient String Graph-Based Sequence Assembler (G. Gonnella & S. Kurtz, BMC Bioinformatics 2012)
- Computational Pan-Genomics: Status, Promises and Challenges (The Computational Pan-Genomics Consortium, Briefings in Bioinformatics 2018)
- Variable-Order de Bruijn Graphs (C. Boucher, A. Bowe, T. Gagie, S.J. Puglisi & K. Sadakane, DCC 2015)
Part 7: Distributed Systems
Acknowledgements
The lecture notes in the course are derived from course material from Prof. Erik Demiane MIT, Prof. Michael A. Bender SBU, Prof. Martin Farach-Colton Rutgers, Prof. Ben Langmead JHU.