Natural Language Processing at Khoury College of Computer Sciences

Helping computers understand and work with human language

Natural language processing (NLP) and information retrieval research helps bridge the gap between the ways people use language every day and the way computer programs work with it. Human languages are vastly complex with varied syntax, grammar, usage, and contexts. NLP research aims to help computers understand this complexity and interact with users successfully, with inputs and outputs that sound like regular conversation. 

Information retrieval (IR) research focuses on the searching and retrieving of content from large sets of data — for instance, text collections with millions of documents. This presents significant computational challenges, including understanding the nature of the text in the collections, how to categorize it, and how to make it searchable. Khoury College researchers are shedding light on how using natural language approaches to analyze how we search could improve results.

Building innovative semantics systems

Khoury College researchers in NLP and IR work in a range of areas, for instance  building the knowledge that will make search engines more effective and AI smarter about context, and enable tools that provide automated services like mental health counseling to be more conversational and helpful. Khoury researchers also do breakthrough work in digital humanities, helping improve the search function for large collections of digitized cultural resources.

Recent Khoury College research has the potential to help services like ChatGPT do a better job of summarizing medical content from multiple sources, making it more reliable and useful to nonexpert users looking for information.

Sample research areas

  • Information retrieval
  • Machine translation
  • Social dynamics of language
  • Inferring latent social networks
  • Brain signal transcription
  • Machine learning
  • Natural language processing
  • Large language models 
  • Computational linguistics
  • Semantic systems

Meet researcher Timothy Bickmore

Bickmore discusses how his lab’s research aims to increase health-care access for patients and decrease the cost of care.

Current project highlights

The Viral Texts Project

Using computational linguistics and natural language processing approaches, Northeastern researchers are understanding what made content go viral in the nineteenth-century newspapers.

Can large language models synthesize medical information?

Northeastern investigators used computational linguistics to test whether a large language model, GPT-3, could understand and summarize medical research papers. Although GPT-3 did a good job summarizing single papers written in plain language, it struggled to combine information accurately from multiple studies.

Creating a computer agent to provide palliative care

The Northeastern Relational Agents used relied on a range of approaches including natural language processing research to create a conversational agent that can help terminally ill patients lessen suffering and improve quality life.

Evaluating the zero-shot robustness of instruction-tuned language models

Users working with AI tools based on large language models can fine tune results by giving them specific instructions, even when the model hasn’t been trained on specific examples (zero-shot training). However, researchers at Northeastern have discovered that LLM models struggle to provide the right information if instructions have even minor changes in phrasing. In response, they are developing a way to help models understand the core meaning behind instructions, regardless of wording.

Recent research publications

‘Don’t Get Too Technical with Me’: A Discourse Structure-Based Framework for Automatic Science Journalism
Authors: Ronald Cardenas, Bingsheng Yao, Dakuo Wang, Yufang Hou

How to help computers generate accurate and coherent science articles for the general public.

Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event Chains of Children’s Fairy Tales
Authors: Paulina Toro Isaza, Guangxuan Xu, Toye Oloko, Yufang Hou, Nanyun Peng, Dakuo Wang

This research uses computer methods to analyze stories for bias by looking at how characters participate in events throughout the story.

Detecting Manuscript Annotations in Historical Print: Negative Evidence and Evaluation Metrics
Authors: Jacob Murel, David Smith

This research uses computers to find and analyze handwritten annotations in early books to shed light on reading practices and history.  

Conversational Assessment of Mild Cognitive Impairment with Virtual Agents
Authors: Emily E. Hurstak, Stefan Olafsson, Teresa K. O’Leary, Howard J. Cabral, Michael Paasche-Orlow, Timothy Bickmore

This research explored how computer-based virtual assessment of cognitive ability could be used to screen for dementia and found that this could be a useful tool for identifying people who might need further evaluation for cognitive impairment.

Related labs and groups

Faculty members

  • Ricardo Baeza-Yates

    Ricardo Baeza-Yates is a professor of the practice and the director of research at Northeastern’s Institute for Experiential AI. He has held leadership positions in tech companies on three continents, taught in Spain and Chile, and co-wrote the best-selling textbook Modern Information Retrieval — among more than 600 other publications.

  • Kenneth Church

    Kenneth Church is a professor of the practice at Khoury College and a senior principal research scientist at Northeastern’s Institute for Experiential AI. His research focuses on natural language processing and information retrieval, artificial intelligence, and machine learning.

  • Silvio Amir

    Silvio Amir is an assistant professor at Khoury College. By applying natural language processing, machine learning, and information retrieval methods to personal and user-generated data, he aims to improve the reliability, interpretability, and fairness of predictive models and analytics.

  • Javed Aslam

    Javed Aslam is a professor at Khoury College. His research emphasizes machine learning and information retrieval, with forays into human computation, transportation, computer security, wireless networking, and medical informatics.

  • David Bau

    David Bau is an assistant professor at Khoury College and the lead principal investigator of the National Deep Inference Fabric project. His research centers on human–computer interaction and machine learning, including the gap between the efficacy of AI and scientists’ ability to explain it.