Data Scientists and Life Scientists Come Together at the May Institute

The fifth annual May Institute on Computation and Statistics in Mass Spectrometry and Proteomics was held on April 29 – May 10th at Northeastern’s Shillman Hall. Hosted by Khoury College of Computer Sciences, the program combined keynote presentations, introductory lectures, practical training, and informal personal discussions that focused on computational and statistical aspects of quantitative mass spectrometry-based proteomics. Recent participants traveled from as far away as Europe, South Africa, Australia, and Brazil.

May Institute ’19 participants, with co-organizers Meena Choi and Olga Vitek, lower right corner
Funded by the NIH, the conference brought together leading experts with both beginner and experienced scientists who were looking to bolster their computational and statistical expertise. Not only did the participants have many opportunities to ask questions, they were also able to present their own research.

As explained by Dr. Olga Vitek, an event organizer along with Meena Choi and Brendan MacLean, the program is a union of data science and life science, or more concisely, “Data science in practice.” She explains, “What we teach here are the skills needed to change how research is approached.”

Olga Vitek, director of Khoury College MS in Data Science program, and Ruedi Aebersold, keynote speaker
Vitek, associate professor and director of Khoury College’s MS in Data Science program, finds the social aspect of the program to have a tremendous impact on the participants. She says, “New researchers are often isolated in their labs. Here, they find a whole community of people with the same interests and the same problems they have.” Another benefit is “the contacts that participants can make in their field.”

Participant Kaushal Paneri, an MSDS student and member of Vitek’s lab, elaborated on what makes the May Institute stand out for students and scientists: “They can go to other data science conferences, but here we speak their language.”

While most conferences often don’t give participants enough time to fully immerse themselves in topics, that’s not the case at the two-week-long May Institute. Brian Searle, whose session on “Introduction to Data Independent Acquisition (DIA)” covered a far-ranging discussion of DIA experiments, stated, “I’m used to having only time to teach a one-day program, so three days is fantastic!”

Data sharing was a significant topic. Laurent Gatto, associate professor of bioinformatics at Université Catholique de Louvain in Brussels, Belgium, is a third-time presenter at the May Institute. Aleksandra Petelski, a Northeastern bioengineering doctoral candidate, praised Gatto’s “Interactive Data Visualization” class for helping her learn how to use open-source software for statistical computing and graphics in her research. “This has been a great introduction to data analysis and visualization,” she says. “It’s so good to learn about tools you didn’t know before.”

Sai Srikanth Lakkimsetty, a student in the MSDS program, learned about the May Institute through an independent study with Vitek. He praised the “industry experience and cutting-edge leaders.” A highlight of the institute curriculum for him was a lecture on “Communication of Scientific Data through Information Design,” by Steven Braun, data analytics and visualization specialist at Northeastern University Library. “Since my area is data science,” Lakkimsetty explains, “the way I communicate my analysis through visuals is very important, and Braun taught me what makes a good visualization.”

To cap off one day of the Institute, participants presented posters they’d sketched to represent their work. In a brief, one-minute talk, they defined their research area and motivating question or problem, gave an overview of their workflow, and described how they conducted their research experimentally and statistically. Jann Schultchaus, a post-doc at the Office of Naval Research, discussed how she and her colleagues use proteomics to answer questions about the barnacle and sequence its genome. Commenting on the value of the May Institute, she says, “Learning workflow and formal methods” has been transformative to her development as a scientist.

The keynote speaker was Dr. Ruedi Aebersold, a pioneer in the field of proteomics and a professor at ETH Zurich. Vitek considers him to have made “enormous contributions to the field.” Aebersold, who received the prestigious Karger Medal from Northeastern in 2017, finds that proteomic studies frequently fail at the computational level, and says programs such as the May Institute teach people “how to avoid common mistakes.” Professor Cristina Clement from City University of New York came to the May Institute primarily to hear Dr. Aebersold’s lecture, “Case Studies in Data-Independent Acquisition (DIA).” “It’s quite an interesting time to be in the field of data science,” Clement explains. She emphasizes that being able to talk personally with Aebersold after his lecture was an incredible highlight of the institute experience.

According to Vitek, the program has educated over 350 scientists from academia including doctoral students and faculty, as well as informed the industry. Her hope is that “computer science students get to see the open data science problems in this area and how they can make a contribution in a very unique way.” She stresses that the May Institute “is not the same every year; it evolves with the technology of the times.”