Machine learning research is inherently interdisciplinary. For one Khoury grad student, that’s precisely the appeal

Author: Milton Posner
Date: 07.19.21

It’s fall 2014 at Princeton University. David Liu, a freshman from the Washington, D.C. suburbs, has begun an undergraduate chemical engineering track. He’s also covering every sport in sight for the Daily Princetonian.
David Liu headshotDavid Liu, doctoral candidate
How he wound up as a Khoury College of Computer Sciences doctoral student, a published researcher, and the recipient of a prestigious grant to study machine learning equity is a story of personal drive and interdisciplinary curiosity.

“What drives a lot of my work is being aligned with the purpose of the work and what it’s being used for,” he says. “Instead of defining a new definition of fairness, let’s take an existing one but also confront the realities and challenges of the present day.”

Liu charts his own interdisciplinary path 
Chemical engineering made sense; after all, it was Liu’s primary academic interest in high school. But it wouldn’t stay that way.

“Because of how big computer science departments have grown, there’s a lot of diversity of perspective and background,” Liu explains. “And because the world is becoming more and more technological, it’s applicable to many different domains. When I was an undergrad choosing what to study, I lost the misconception that computer science was this niche field that only a small group of people cared about.”

This diversity of academic fields crept into the rest of Liu’s Princeton pursuits. Take his Daily Princetonian work. Because sports coverage orbits around weekend games, Liu turned to his science background for meaningful midweek material.

I was already engrossed in the data of Princeton athletics,” he recalls. “Especially in college sports, it’s very complete data and very painstakingly accumulated. So I thought, ‘There’s not a lot of people who are taking advantage of the data. Let’s see what we can do.’” He wrote about gender-based salary gaps among Ivy League coaches and correlations between fan attendance and basketball team success, among others.

The science informed the journalism, then the journalism informed the science. In his first research project, spurred by his interdisciplinary digital humanities seminar, Liu dug into more than a century’s worth of archived Princetonian headlines to make historical, linguistic, and cultural trends more digestible.

The journalism–computer science mix continued with software jobs at the New York Times and Bloomberg, born of a desire to take a different path than his peers.

“I had a software engineering experience, but also a firsthand view of how news was being written and created. That was very eye-opening,” he says of the Times job. “At Bloomberg, I was looking to develop my technical skills. It was less so Bloomberg’s media newsroom that drew me; it was the scale of their financial operation.” 

Liu began the Bloomberg job after graduating in 2018 and held it for two years. In 2019, he published his undergraduate thesis, which examined the difficulties of reproducing computational social science experiments. 

You can’t reproduce something that’s not available or accessible. You’re working on proprietary data that you can’t distribute, or people just are not releasing their code,” he explains, adding that, “You’re rewarded for publishing your work, for getting new ideas out there. But how good your code is, how well documented it is, how usable it is — all of that is not rewarded.”  

Liu had combined his machine learning coursework with his engineering experience on a project he cared about. But it wouldn’t be his last interdisciplinary research effort.  

Ethical computer science 

Liu joined Khoury College last September, citing — unsurprisingly — its interdisciplinary bona fides as a motivator. He found the same ethos at the Network Science Institute, which draws high-powered faculty from several Northeastern colleges. 

“The institute is a unifying body for network science research and is inherently interdisciplinary because you’re bringing people from different backgrounds,” Liu notes. “My social inclination definitely merged with existing networks research.”  

He’d soon have even more going his way. Liu applied for and won a prestigious National Science Foundation Graduate Research Fellowship, which provides the student with a three-year annual stipend of $34,000 and an education allowance paid to the university. The program recognizes and supports outstanding students—like Liu—who are pursuing a research-based graduate degree.  

“AI is only going to increase in ubiquity and integration into our lives, and we need people thinking about the implications, who it is harming, and how it can be used in productive ways,” Liu says. “That is what I wrote about in my NSF application and also what I intend to use [the award] for.” 

Those implications are tricky to tackle. Machine learning algorithms can perpetuate and amplify unfair inputs (e.g., demographically skewed data) in ways that are difficult to identify and remedy. This has contributed to discriminatory hiring, housing, and criminal sentencing — just to name a few — and has prompted challenging questions about how we should use machine learning.

Liu notes that data ought to be seen as a life cycle, which involves asking how the data is obtained, what it represents, who it benefits, what it’s trying to accomplish, and how algorithmic bias is quantified and combated. So, for his first major Northeastern research effort — presented virtually at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in May — he teamed up with his doctoral advisor Tina Eliassi-Rad, incoming Khoury doctoral student Zohair Shafi, Amherst College professor Scott Alfeld, and … a philosopher?

“We were taking notions of fairness from philosophy and operationalizing that in a computer science setting,” he explains. “The end product — we call it RAWLSNET — can help to take your existing policy mechanism and show you how to tweak it to better fit the Rawlsian notion of fairness.”

This notion — advanced by philosopher John Rawls and explained to the research team by co-author and Khoury postdoctoral fellow Will Fleisher — is called fair equality of opportunity (FEO). It posits that people with the same talent and willingness to use it should have the same chance for achievement regardless of background circumstances.

“It was not an easy thing to have a philosopher in the room, computer scientists in the room, and get us to talk in a language where we could understand each other,” Liu admits. But they got the job done.

RAWLSNET alters Bayesian networks — inherently interpretable models that quantify relationships between variables — to maximize fairness while abiding by system constraints. It can also help to identify sources of bias in machine learning by generating bias-free “aspirational data.”

It’s an exciting step, but Liu cautions that no one piece of software can solve all our problems.

“Rawls’s definition of FEO is very explicit about the domain it covers and the context it is defined under,” he says. “Rawls is operating in this ideal society framework of ‘If we defined a fair world, what would it look like?’ Then you take that and move to reality — messy data, messy systems. What is the bridge there? RAWLSNET is one bridge, but it’s not the only bridge, it’s not even the correct bridge for certain applications.”

RAWLSNET, he says, “is really good for simulating hypothetical scenarios. It can recommend policy in certain situations. But to honor the name of Rawls itself, we need to follow the context that this definition of fairness was created under.”

Following that context means using the right kinds of machine learning in the right ways for the right challenges, and prioritizing fairness. And with an estimated four years remaining in his Khoury journey, David Liu figures to follow that context to new and interesting places.

Subscribe to the Khoury College newsletter

Newsletter Subscription

Enter your information to subscribe now.

This field is for validation purposes and should be left unchanged.