Assignment: vEB tree

Assignment Overview

The first assignment will teach you how to implement and empirically evaluate van Emde Boas tree. The primary goal of this assignment is to become familiar with van Emde Boas trees and understand the asymptotic behavior. All the code in this programming assignment must be written in C/C++. If you have not used C++ before, here's a short tutorial on the language. Even if you are familiar with C++, go over this guide for additional information on writing code in the system.

If you want to use another programming language for this assignment please ask the instructor first.

This is a single-person project that will be completed individually (i.e., no groups).

  • Release date: Wed, January 15

  • Due date: Mon, February 3

Implementation Details

There are three steps in this assignment:

  1. Implement vEB tree using O(u) space
  2. Write a test program for vEB tree implementation
  3. Empirical performance evaluation
  4. Write a report

Step #1 - Implement vEB tree using O(u) space

The first step is to implement the van Emde Boas (vEB) tree. You can refer to the lecture notes for the pseudo code. You do not need to implement space-optimized X-fast trees/Y-fast trees. Your vEB tree implementation can take O(u) space, where u is the universe size.

In your vEB tree implementation you need to support the following API:

We have a universe U of size u = |U| and a set S of size |S| = n, S ⊂ U, and we want to implement the following operations:

  • Insert (x): S ← S ⋃ {x}
  • Query (x): Return whether x ∈ S
  • Successor (x): Return the minimum y ∈ S,such that y ≧ x

Note: that x ∈ U but x need not be in S.

Step #2 - Write a test program for vEB tree implementation

Your next step is to write a test program to validate the correctness of the above operations.

As a reference, you can find a test program and a Makefile in C++ in Zip file. This program creates a binary search tree (BST) using std::set and performs insertions and queries (find, successor) for N items generated uniformly at random from universe U.

To build and run the test program you need follow the following instructions including installing openSSL library for generating input items.

sudo apt install libssl-dev
make
./test NUM_ITEMS

The test program also records the time to perform N insertions and queries and reports it using std::chrono library in C++.

Step #3 - Empirical performance evaluation

You need to evaluate the empirical performance of the vEB tree implementation and compare it against a binary search tree (BST). You should use an existing search tree implementation from the standard library (e.g., std::set).

You need to write a benchmark to measure the running time of the above operations. In the benchmark, you will insert and query (find, successor) N 32-bit integers (universe U of 32-bit integers) in the data structure and measure the running time. You can use the std::chrono to measure the time. If you are using a different programming language than C/C++ you can find a different timing function in that language.

You can extend the test program for benchmarking. Similar to the std::set in the program, you need to use the vEB tree implementation.

Step #4 - Write a report

In the report, you need to plot the performance of the vEB tree and BST. The x-axis of the plot will be the number of items (N) and y-axis will be the time to perform the operation.

For x-axis you need to vary the number of items. You should perform evaluation for: N: {1M, 2M, 4M, 8M, 16M, 32M, 64M, 128M, 256M, 512M} (M: Million).

Instructions

Students can ssh to either login.khoury.northeastern.edu or use the following based on the MOTD in login.

Linux at Khoury College:

You may SSH to alternative linux machines. Alternative linux machines are available if connected to NUwave, or if connected to Northeastern VPN.
The hostnames are linux-[071-085].khoury.northeastern.edu

Submission

You need to submit a .zip file of your source code to canvas.

You should also include a report.pdf in your submission that contains:

  1. A plot showing the performance of vEB tree and BST for insertions.
  2. A plot showing the performance of vEB tree and BST for queries.
  3. A plot showing the performance of vEB tree and BST for successors.
  4. A brief analysis of the performance of the two data structures.
  5. You also need to include a discussion about the theoretical guarantees of these data structures and empirical performance.

We will evaluate the correctness and the performance of your implementation off-line after the project due date.

Collaboration Policy

  • Every student has to work individually on this assignment.
  • Students are allowed to discuss high-level details about the project with others.
  • Students are not allowed to copy the contents of a white-board after a group meeting with other students.
  • Students are not allowed to copy the solutions from another colleague.
  • You can not copy code from the internet.

If you have any questions please contact the instructor.