ROCKET: Scalable and Agile Analysis of Mass Spectrometry Experiments
Lead PI
Co PIs
Abstract
Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. The technology rapidly evolves and generates datasets of an increasingly large complexity and size. This rapid evolution must be matched by an equally fast evolution of statistical methods and tools developed for analysis of these data. Ideally, new statistical methods should leverage the rich resources available from over 12,000 packages implemented in the R programming language and its Bioconductor project. However, technological limitations now hinder their adoption for mass spectrometric research. In response, the project ROCKET builds an enabling technology for working with large mass spectrometric datasets in R, and rapidly developing new algorithms, while benefiting from advancements in other areas of science. It also offers an opportunity of recruitment and retention of Native American students to work with R-based technology and research, and helps prepare them in a career in STEM.
Instead of implementing yet another data processing pipeline, ROCKET builds an enabling technology for extending the scalability of R, and streamlining manipulations of large files in complex formats. First, to address the diversity of the mass spectrometric community, ROCKET supports scaling down analyses (i.e., working with large data files on relatively inexpensive hardware without fully loading them into memory), as well as scaling up (i.e., executing a workflow on a cloud or on a multiprocessor). Second, ROCKET generates an efficient mixture of R and target code which is compiled in the background for the particular deployment platform. By ensuring compatibility with mass spectrometry-specific open data storage standards, supporting multiple hardware scenarios, and generating optimized code, ROCKET enables the development of general analytical methods. Therefore, ROCKET aims to democratize access to R-based data analysis for a broader community of life scientists, and create a blueprint for a new paradigm for R-based computing with large datasets. The outcomes of the project will be documented and made publicly available via the Olga Vitek lab page.