CS 7280/4973 – Data Str & Alg Scalable Comp

Lectures: TBD

Instructor: Prashant Pandey

Teaching Assistant: TBD

We will use Piazza for all Q&A.

Course Overview

This course studies advanced data structures and algorithms for handling scalability challenges in large-scale data management and machine learning pipelines. It will cover modern hashing techniques, filters and sketching algorithms, locality-sensitive hashing, succinct data structures, string algorithms, graph algorithms, external memory algorithms, and ML-based learned indexes. This course is appropriate for both undergraduate and graduate students with intermediate data structure and algorithm skills. The course will also require intermediate programming skills in C/C++.

Prerequisites

There is no official pre-requisite for the course.

Please email me directly if you're interested in the course but don't meet the prerequisites in the system.

PhD program: The course fulfills PhD breath requirements for the "Artificial Intelligence and Data Science" and "Systems and Security" breath areas.

Course Topics

Assignments

Projects

Scribing

Useful Resources

Please refer to this brief overview of asymptotic notations The Asymptotic Cheat Sheet. This will help you easily follow theoretical analyses in the course.

Assignments, scribe notes, and final projects must be typeset in LaTeX. If you are not familiar with LaTeX, see this introduction. Here's a quick Overleaf tutorial.

Grading

Late submission policy

Collaboration and Plagiarism

Everyone needs to read the Northeastern University Policy on Academic Misconduct.

Working with others on assignment is a good way to learn the material and we encourage it. However, there are limits to the degree of cooperation that we will permit.

When working on programming assignments, you must work only with others whose understanding of the material is approximately equal to yours. In this situation, working together to find a good approach for solving a programming problem is cooperation; listening while someone dictates a solution is cheating. You must limit collaboration to a high-level discussion of solution strategies, and stop short of actually writing down a group answer. Anything that you hand in, whether it is a paper report or a computer program, must be written in your own words. If you base your solution on any other written solution, you are cheating.

If you collaborate with other students to discuss a problem and then write your own solution, make sure to declare upfront in the write up names of all the students you collaborated with.

Never look at another student's code or share your code with any other student.

You must not make your code public (on Github or by any other means).

Tools like Github Copilot, ChatGPT, and copying code from sites like Stack Overflow also constitutes cheating. Do not write code with Copilot enabled in this course.

We do not distinguish between cheaters who copy other's work and cheaters who allow their work to be copied. If you cheat, you will be given an E in the course and referred to the University Student Behavior Committee.

Clearly, any attempt to subvert the ordinary grading process constitutes cheating.

If you have any questions about what constitutes cheating, please ask first.