Covers big-data-analysis techniques that scale out with increasing number of compute nodes, e.g., for cloud computing. Focuses on approaches for problem and data partitioning that distribute work effectively while keeping total cost for computation and data transfer low. Deterministic and random algorithms from a variety of domains, including graphs, data mining, linear algebra, and information retrieval, are studied and analyzed in terms of their cost, scalability, and robustness against skew. Coursework emphasizes hands-on programming experience with modern state-of-the-art big-data-processing technology. Students who do not meet course prerequisites may seek permission of instructor.
Most aspects of the course are managed through Canvas (https://canvas.northeastern.edu/), including homework submission, self-test quizzes, the exam, and the discussion board. Do not email your course-related questions. Instead, post everything in the discussion board. Unfortunately Canvas does not seem to offer a private-message feature. To reach only the instructor and TAs, e.g., with a grading-related question, use the Canvas messaging feature via the Inbox link and select the intended recipients of your message there.
Please read the syllabus carefully.
Go to this page for the online modules. Please make sure you go through the material before the week it is discussed in class.
Mirek: Tuesday 1:30-3:30pm on Zoom. I am also available right after class. If you cannot make it during office hours, request an appointment through a private discussion post.
TBA: day/time TBD; location: Zoom
Week | Start date | Comments |
---|---|---|
1 | Sep 7 | |
2 | Sep 14 | |
3 | Sep 21 | |
4 | Sep 28 | |
5 | Oct 5 | |
6 | Oct 12 | |
7 | Oct 19 | |
8 | Oct 26 | |
9 | Nov 2 | |
10 | Nov 9 | |
11 | Nov 16 | |
12 | Nov 23 | Exam week (exam on Tuesday via Canvas). No classes Nov 25-29 (Thanksgiving) |
13 | Nov 30 | |
14 | Dec 7 | |
15 | Dec 14 | Project presentations (by default on both class meeting days) |