Natural Language Processing at Khoury College of Computer Sciences

Helping computers understand and work with human language

Natural language processing (NLP) and information retrieval research helps bridge the gap between the ways people use language every day and the way computer programs work with it. Human languages are vastly complex with varied syntax, grammar, usage, and contexts. NLP research aims to help computers understand this complexity and interact with users successfully, with inputs and outputs that sound like regular conversation.

Information retrieval (IR) research focuses on the searching and retrieving of content from large sets of data — for instance, text collections with millions of documents. This presents significant computational challenges, including understanding the nature of the text in the collections, how to categorize it, and how to make it searchable. Khoury College researchers are shedding light on how using natural language approaches to analyze how we search could improve results.

Meet our faculty

Building innovative semantics systems

Khoury College researchers in NLP and IR work in a range of areas, for instance building the knowledge that will make search engines more effective and AI smarter about context, and enable tools that provide automated services like mental health counseling to be more conversational and helpful. Khoury researchers also do breakthrough work in digital humanities, helping improve the search function for large collections of digitized cultural resources.

Recent Khoury College research has the potential to help services like ChatGPT do a better job of summarizing medical content from multiple sources, making it more reliable and useful to nonexpert users looking for information.

Sample research areas

Information retrieval
Machine translation
Social dynamics of language
Inferring latent social networks
Brain signal transcription
Machine learning
Natural language processing
Large language models
Computational linguistics
Semantic systems

Meet researcher Timothy Bickmore

Bickmore discusses how his lab’s research aims to increase health-care access for patients and decrease the cost of care.

Current project highlights

The Viral Texts Project

Can large language models synthesize medical information?

Northeastern investigators used computational linguistics to test whether a large language model, GPT-3, could understand and summarize medical research papers. Although GPT-3 did a good job summarizing single papers written in plain language, it struggled to combine information accurately from multiple studies.

Learn more

Creating a computer agent to provide palliative care

Evaluating the zero-shot robustness of instruction-tuned language models

Users working with AI tools based on large language models can fine tune results by giving them specific instructions, even when the model hasn’t been trained on specific examples (zero-shot training). However, researchers at Northeastern have discovered that LLM models struggle to provide the right information if instructions have even minor changes in phrasing. In response, they are developing a way to help models understand the core meaning behind instructions, regardless of wording.

Learn more

Recent research publications

‘Don’t Get Too Technical with Me’: A Discourse Structure-Based Framework for Automatic Science Journalism
Authors: Ronald Cardenas, Bingsheng Yao, Dakuo Wang, Yufang Hou

How to help computers generate accurate and coherent science articles for the general public.

Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event Chains of Children’s Fairy Tales
Authors: Paulina Toro Isaza, Guangxuan Xu, Toye Oloko, Yufang Hou, Nanyun Peng, Dakuo Wang

This research uses computer methods to analyze stories for bias by looking at how characters participate in events throughout the story.

Detecting Manuscript Annotations in Historical Print: Negative Evidence and Evaluation Metrics
Authors: Jacob Murel, David Smith

This research uses computers to find and analyze handwritten annotations in early books to shed light on reading practices and history.

Conversational Assessment of Mild Cognitive Impairment with Virtual Agents
Authors: Emily E. Hurstak, Stefan Olafsson, Teresa K. O’Leary, Howard J. Cabral, Michael Paasche-Orlow, Timothy Bickmore

This research explored how computer-based virtual assessment of cognitive ability could be used to screen for dementia and found that this could be a useful tool for identifying people who might need further evaluation for cognitive impairment.

Related labs and groups

Faculty members

Malihe Alikhani

Malihe Alikhani is an assistant professor at Khoury College. Both enthused and wary of the transformative power of AI, Alikhani teaches courses and conducts research on AI ethics and equitable natural language processing.
Read bio
Silvio Amir

Silvio Amir is an assistant professor at Khoury College. By applying natural language processing, machine learning, and information retrieval methods to personal and user-generated data, he aims to improve the reliability, interpretability, and fairness of predictive models and analytics.
Read bio
Javed Aslam

Javed Aslam is a professor at Khoury College. His research emphasizes machine learning and information retrieval, with forays into human computation, transportation, computer security, wireless networking, and medical informatics.
Read bio
Ricardo Baeza-Yates

Ricardo Baeza-Yates is a professor of the practice and the director of research at Northeastern’s Institute for Experiential AI. He has held leadership positions in tech companies on three continents, taught in Spain and Chile, and co-wrote the best-selling textbook Modern Information Retrieval — among more than 600 other publications.
Read bio
David Bau

David Bau is an assistant professor at Khoury College and the lead principal investigator of the National Deep Inference Fabric project. His research centers on human–computer interaction and machine learning, including the gap between the efficacy of AI and scientists’ ability to explain it.
Read bio
Kenneth Church

Kenneth Church is a professor of the practice at Khoury College and a senior principal research scientist at Northeastern’s Institute for Experiential AI. His research focuses on natural language processing and information retrieval, artificial intelligence, and machine learning.
Read bio
Mai ElSherief

Mai ElSherief is an assistant professor at Khoury College. Her research strives to minimize harm and improve prosocial behavior online by detecting and mitigating biases in natural language processing systems.
Read bio
Virgil Pavlu

Virgil Pavlu is an associate teaching professor at Khoury College. His research focuses on information retrieval and organization, and the potential to use machine learning algorithms for the discovery and indexing of text data.
Read bio
David Smith

David Smith is an associate professor at Khoury College. His research spans the fields of natural language processing, computational linguistics, information retrieval, machine learning, digital libraries, digital humanities, and political science.
Read bio
Weiyan Shi

Weiyan Shi is an assistant professor in the Khoury College, jointly appointed with the College of Engineering. She is interested in NLP in the context of social influence dialogue systems such as persuasion, negotiation, and recommendation, as well as privacy-preserving NLP applications.
Read bio
Byron Wallace

Byron Wallace is the Sy and Laurie Sternberg Interdisciplinary Associate Professor and director of the undergraduate data science program at Khoury College. He applies machine learning and natural language processing methods in the health informatics space, with the goal of developing hybrid human–AI systems and streamlining the synthesis of biomedical information.
Read bio