Representation learning algorithms automatically learn the features of data. In modern machine learning, the learned representation is typically a deep neural network (DNN). When the input is graph data, representation learning algorithms, such as DeepWalk, node2vec, and GraphSAGE, have to first sample the graph to produce input that is suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing graph analytics, mining, and representation learning systems do not efficiently parallelize sampling.
Sampling is an “embarrassingly parallel” problem, and may appear to lend itself to GPU acceleration but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new transit-parallelism approach to graph sampling, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.