Computational Biology at Khoury College of Computer Sciences

Uncovering the secrets of life through the power of computation


Computational biology bridges the gap between computer science and life sciences. Khoury College researchers explore both fundamental and applied questions in all aspects of studies of living systems. Their interdisciplinary approach brings together multiple computer science disciplines, including machine learning, computer vision, statistics, data science, and data visualization, as well as connections to programming languages, mathematics, physics, and network sciences.

Computational biology uses the power of large and complex datasets generated by modern biotechnologies to make complex biological processes legible and to provide insight for life science researchers, clinicians, policymakers, and educators. 

Using biological data to explore fundamental questions


Computational biology fuels discoveries across a wide range of fields, including studies of genomes, proteomes, metabolomes, diagnosis and prognosis of disease, discovery of molecular mechanisms of disease, vaccines, and therapeutics, and advances toward personalized medicine. Computational biology is also providing insights into agriculture, climate science, and ecosystem analysis. 

Sample research areas

  • Causal inference in biomolecular systems
  • Computational mass spectrometry-based proteomics
  • Macromolecular structure and function
  • Network biology and medicine
  • Studies of molecular mechanisms of disease
  • Variant and genome interpretation
  • Systems biology
  • Genomics
  • Biomedical imaging

Current project highlights

May Institute on Computation and Statistics for Spectrometry and Proteomics

Northeastern’s Barnett Institute for Chemical and Biological Analysis sponsors the national May Institute in computation and statistics for mass spectrometry and proteomics.

Machine Learning Approaches Towards Risk Assessment and Prediction of Adverse Pregnancy Outcomes

This research explores what molecular, clinical, and genetic factors increase the risk of adverse pregnancy outcomes. Using large data sets from pregnant women and the power of machine learning, this research has the potential to make a direct impact on maternal health.

Recent research publications

Eliater: A Python package for estimating outcomes of perturbations in biomolecular networks (Bioinformatics, 2024)
Authors: Sara Mohammad-Taheri, Pruthvi Prakash Navada, Charles Tapley Hoyt, Jeremy Zucker, Karen Sachs, Benjamin M. Gyori, Olga Vitek

This research introduces and showcases Eliater, a Python package for estimating the effect of perturbation of an upstream molecule on a downstream molecule in a biomolecular network.

Beyond protein lists: AI-assisted interpretation of proteomic investigations in the context of evolving scientific knowledge (Nature Methods, 2024)
Authors: Benjamin M. Gyori, Olga Vitek

Mass spectrometry-based proteomics provides broad and quantitative detection of the proteome, but its results are mostly presented as protein lists. Artificial intelligence approaches will exploit prior knowledge from literature and harmonize fragmented datasets to enable mechanistic and functional interpretation of proteomics experiments.

An MSstats workflow for detecting differentially abundant proteins in large-scale data-independent acquisition mass spectrometry experiments with FragPipe processing (Nature Protocols, 2024)

Authors: Devon Kohler, Mateusz Staniak, Fengchao Yu, Alexey I. Nesvizhskii, Olga Vitek

Khoury College is leading work on a new open-source software tool, MSstats, designed for researchers working with quantitative mass spectrometry-based proteomics, a technology used to measure protein levels in samples.

Improving transparency of computational tools for variant effect prediction (Nature Genetics, 2024)
Authors: Rachel Karchin, Pedrag Radivojac, Anne O’Donnell-Luria, Marc S. Greenblatt, Michael Y. Tolstorukov, Dmitriy Sonkin

Efforts to integrate computational tools for variant effect prediction into the process of clinical decision-making are in progress. However, for such efforts to succeed and help to provide more informed clinical decisions, it is necessary to enhance transparency and address the current limitations of computational predictors.

Cardinal v.3: a versatile open-source software for mass spectrometry imaging analysis (Nature Methods, 2023)
Authors: Kylie Ariel Bemis, Melanie Christine Föll, Dan Guo, Sai Srikanth Lakkimsetty, Olga Vitek

Mass spectrometry imaging (MSI) analyzes spatial distributions of analytes from complex biological samples, such as tissues, at cellular resolution. New research presents Cardinal v.3, an open-source software for reproducible analysis of MSI experiments, and a major update from its previous versions.

Automated assembly of molecular mechanisms at scale from text mining and curated databases (Molecular Systems Biology, 2023)
Authors: John A. Bachman, Benjamin M. Gyori, Peter K. Sorger

Researchers describe an approach to precisely assemble molecular mechanisms at scale using multiple natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA).

Related labs and groups

Faculty members

  • Benjamin Gyori

    Benjamin Gyori is an associate professor at Khoury College, jointly appointed with the College of Engineering. His research combines computational modeling, machine learning, natural language processing, and human–machine interaction to improve our understanding of human biology and facilitate advances in health care.

  • Wengong Jin

    Wengong Jin is an assistant professor in the Khoury College. His research aims to use geometric and generative AI models to improve the costly, time-consuming process of drug discovery.

  • Prashant Pandey

    Prashant Pandey is an assistant professor at Khoury College. He researches scalable data systems with robust theoretical foundations for efficient data management, and tackles every level of that challenge, from the theoretical aspects of data structures to the practical issues of scaling data systems.

  • Predrag Radivojac

    Predrag Radivojac is a professor and associate dean of research at Khoury College. His work strives to grasp the molecular basis for higher-level phenotypes and genetic disorders, and to develop algorithms and analysis techniques related to the function of biological macromolecules, mass spectrometry proteomics, genome interpretation, and precision health.

  • Olga Vitek

    Olga Vitek is the Raymond Bradford Bradstreet Professor at Khoury College, and the director of the Barnett Institute for Chemical and Biological Analysis. Her lab, which has been recognized with multiple major awards, uses statistical science, machine learning, and large-scale mass spectrometry to understand the functioning of living organisms.