This course is a comprehensive study of the internals of modern database systems and the challenges of indexing and querying large-scale data in the context of continuously evolving hardware. It will cover the core concepts and fundamentals of indexing and hashing data structures, concurrency control, storage, file organization, and query processing. The course will study both the in-memory and disk-based database systems and will use examples from modern key-value stores. All the class projects will be in the context of real in-memory and disk-based database systems. The course is appropriate for graduate students in software systems and for advanced undergraduates with systems programming skills.
Prerequisites:
There are no formal prerequisites for this course, but you should have a basic understanding of databases and how to use them.
Project work will require programming in C++. You’re expected to either know C++ or be willing to learn it independently. Here’s a good C++ tutorial.
PhD Students: This course satisfies the "Systems and Security" breadth requirement for the PhD program.
Time/location
Instructors: Prashant Pandey
Contact: Please use Piazza (via direct access from within Canvas) for all questions related to lectures, coursework, and the project. Notice you can post questions anonymously to all other students, or anonymously even to the instructors.
Course Topics
Projects
The projects will vary in both scope and topic, but they must satisfy this criterion. We will discuss this more in-depth during class, though students are encouraged to begin to think about projects that interest them early on. If a group is unable to come up with their own project idea, the instructor will provide suggestions on interesting topics.
Paper Reading
There is a set of assigned paper readings for the course. The reading list is designed to provide additional information and insight into the current state-of-the-art database systems research. Each student is required to pick five papers from the reading list and turn in a one-paragraph synopsis of each of the five papers. There will be five deadlines throughout the semester when students would be required to submit the synopsis. Late submissions will not be accepted without prior approval from the instructor.
Each review must include the following information:
These reading reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagiarism will not be tolerated.
Useful Resources
Please refer to this brief overview of asymptotic notations The Asymptotic Cheat Sheet. This will help you easily follow theoretical analyses in the course.
Grading
Late submission policy
Everyone needs to read the SoC Policy on Academic Misconduct.
Working with others on assignment is a good way to learn the material and we encourage it. However, there are limits to the degree of cooperation that we will permit.
When working on programming assignments, you must work only with others whose understanding of the material is approximately equal to yours. In this situation, working together to find a good approach for solving a programming problem is cooperation; listening while someone dictates a solution is cheating. You must limit collaboration to a high-level discussion of solution strategies, and stop short of actually writing down a group answer. Anything that you hand in, whether it is a paper report or a computer program, must be written in your own words. If you base your solution on any other written solution, you are cheating.
If you collaborate with other students to discuss a problem and then write your own solution, make sure to declare upfront in the write up names of all the students you collaborated with.
Never look at another student's code or share your code with any other student.
You must not make your code public (on github or by any other means).
Tools like Github Copilot, ChatGPT, and copying code from sites like Stack Overflow also constitutes cheating. Do not write code with Copilot enabled in this course.
We do not distinguish between cheaters who copy other's work and cheaters who allow their work to be copied. If you cheat, you will be given an E in the course and referred to the University Student Behavior Committee.
Clearly, any attempt to subvert the ordinary grading process constitutes cheating.
If you have any questions about what constitutes cheating, please ask first.
Prof. Andy Pavlo CMU, Prof. Manos Athanassoulis BU, Prof. Arun Kumar UCSD.
The lecture slides used in the course are taken from Prof. Andy Pavlo's CMU 15-721 course, Prof. Manos Athanassoulis's CAS CS561 course, and Prof. Arun Kumar's CSE 232A.
The website template is taken from Prof. Wolfgang Gatterbauer's CS 7240.