Covers big-data-analysis techniques that scale out with increasing number of compute nodes, e.g., for cloud computing. Focuses on approaches for problem and data partitioning that distribute work effectively while keeping total cost for computation and data transfer low. Deterministic and random algorithms from a variety of domains, including graphs, data mining, linear algebra, and information retrieval, are studied and analyzed in terms of their cost, scalability, and robustness against skew. Coursework emphasizes hands-on programming experience with modern state-of-the-art big-data-processing technology. Students who do not meet course prerequisites may seek permission of instructor.
Most aspects of the course are managed through Canvas (https://canvas.northeastern.edu/). Do not email your course-related questions, but post everything in the Piazza discussion board. To reach only the instructor and TAs, e.g., with a grading-related question, make your Piazza post private and set the appropriate visibility.
Please read the syllabus carefully.
Go to this page for the online modules. Please make sure you go through the material before the week it is discussed in class.
See Canvas.
Week | Start date | Comments |
---|---|---|
1 | Jan 18 | |
2 | Jan 25 | |
3 | Feb 1 | |
4 | Feb 8 | |
5 | Feb 15 | |
6 | Feb 22 | |
7 | Mar 1 | |
8 | Mar 8 | |
9 | Mar 15 | |
10 | Mar 22 | |
11 | Mar 29 | |
12 | Apr 5 | Exam week (exam on Tuesday via Canvas) |
13 | Apr 12 | |
14 | Apr 19 | |
15 | Apr 26 | Project presentations (by default on both class meeting days) |