How a Data Science Professor May Help Your Doctor Cure What Ails You

Picture the vast amount of healthcare data collected by scientists and clinicians on everything from clinical trials to doctors’ notes. Now imagine the difficulty doctors face in providing patients with the most up-to-date and reliable treatment plans.

Comprehending such an amplitude of data may sound insurmountable to most of us, but not to Byron Wallace, assistant professor and director of the BS in Data Science program at Khoury College. (He also holds an adjunct appointment at Brown University’s Center for Evidence Synthesis in Health.) Wallace believes – and demonstrates in his research – that medicine should be data-driven. The ultimate aim? “To uncover all clinical trials and determine what treatments that the evidence indicates will likely work,” he says.

With the goal of better treatment in mind, Wallace combines machine learning and natural language processing methods to “ingest and digest articles” to make sense of language and text to determine the best treatments for patients. Although doctors are the people most able to interpret and understand medical data, Wallace points out that “they don’t have the time or means to process vast amounts of data.” His objective is to help doctors sort through the evidence, lessening the human workload while providing information they need to determine effective treatments.

Wallace’s program RobotReviewer, “works by applying various extraction and classification models to the text within full-text articles describing the conduct and results of clinical trials. For example, it classifies every sentence as describing the study population (or not).” Another way to understand this is to think about how software in computers distinguishes spam in email from other messages by finding correlations between words and frequency of emails that are spam. RobotReviewer, explains Wallace, “appraises the full text to classify an article as being at high or low statistical ‘risks of bias’, which is a proxy for the reliability of the underlying evidence reported.”

Wallace first became interested in research as an undergraduate at University of Massachusetts Amherst with Professor James Kurose. Later, as a computer science graduate student at Tufts University, Wallace’s interest in the health field was sparked by Carla Brodley, his advisor and professor at Tufts and now dean of Khoury College, and Thomas Trikalinos, M.D., who is now at Brown. It was in Brodley’s Biomedical Machine Learning class that Wallace realized the vast amount of health-related data seemingly “buried” in texts.

Wallace now serves as a mentor to four PhD students at Northeastern and co-advises a fifth student. For his co-op job last year, undergraduate student Eric Lehman (BSCS ‘20) worked with Wallace on a project concerning medical trials from papers in which different outcomes were indicated for the same kinds of trial. Lehman facilitated the collection of all the data and worked with machine learning models to evaluate the tasks given to doctors.

Lehman found Wallace to be outstanding as a mentor: “He gave me the opportunity to do the kinds of research that is usually reserved for graduate students with more experience.” When their findings were published last year in a paper at the North American Chapter of the Association for Computational Linguistics (NAACL), a major and highly selective conference in natural language processing, Wallace gave Lehman first authorship of the paper, a remarkable feat for a college sophomore.

A native of western Massachusetts, Wallace spent two years at the University of Texas in Austin before joining the Northeastern faculty in 2013. His work at Khoury College has been supported with grants from the National Institutes of Health, National Science Foundation, the Army Research Office, Seton Hospital, and Amazon, with seed funds from Brown University.

The recipient of the 2018 Early Career Award from the Society for Research Synthesis Methods, Wallace has published numerous papers.  He finds Khoury College a very exciting place to work because research in his research area usually takes place in hospitals, not universities. It is his hope that his research “contributes to continued progress towards teaching machines to make sense of language and text.”