Fascinated by the creep of misinformation, Ph.D. student Shan Jiang uses computational tools to measure its effects online
Thu 05.13.21 / Madelaine Millar
Fascinated by the creep of misinformation, Ph.D. student Shan Jiang uses computational tools to measure its effects online
Thu 05.13.21 / Madelaine Millar
Thu 05.13.21 / Madelaine Millar
Thu 05.13.21 / Madelaine Millar
Fascinated by the creep of misinformation, Ph.D. student Shan Jiang uses computational tools to measure its effects online
Thu 05.13.21 / Madelaine Millar
Fascinated by the creep of misinformation, Ph.D. student Shan Jiang uses computational tools to measure its effects online
Thu 05.13.21 / Madelaine Millar
Thu 05.13.21 / Madelaine Millar
Thu 05.13.21 / Madelaine Millar
“It’s a pretty big problem,” said Shan Jiang, a Ph.D. candidate at Khoury College who’s studying online misinformation. “There are a considerable portion of people who are affected by misinformation, whether they are believing in the false claims, supporting false claims, or spreading them, or whether they themselves were being triggered, being angered (by false stories).”
Jiang is fascinated by the ways misinformation spreads and evolves, and by how both individuals and platforms (like Facebook or YouTube) respond to it. His research uses computational tools to attempt to answer questions of social science: for instance, are conservatives really more heavily moderated on YouTube than liberals? This spring, as part of the college’s Speaker Spotlight Series, he shared some of his findings in a virtual talk, “Measuring the Misinformation Ecosystem.”
To investigate his question about YouTube’s content moderation, Jiang collected 84,068 comments from 258 videos posted in January and June of 2018. He was able to make sense of a data set consisting of more words than are contained in most books by using natural language processing, which helped to tag the comments as left- or right-leaning and moderated or left live on the site. He also tagged whether the comments included anything that would bring about justifiable, nonpartisan moderation, such as hate speech.
“Our initial hypothesis was that this claim (that conservative viewpoints are subject to more moderation) is kind of true, but it’s kind of exaggerated — it’s actually not true at all,” said Jiang. “(Moderation is) evenly distributed, after factoring in other things that we consider to be justified.”
The study didn’t just look at partisanship; Jiang also found that comments on politically extreme YouTube videos were 50 percent more likely to be moderated than politically centrist videos. Comments on videos that are factually true are significantly less likely to be moderated, while comments on untrue videos that are posted after a fact-check are slightly more likely to be moderated.
Another research question Jiang discussed was whether audiences believe in misinformation, to what extent, and whether that belief can be changed. He used similar natural language processing methods but applied them instead to Twitter, examining whether the passage of time or the release of a fact-check impacted whether audiences believed a false claim. He found that “waiting out” a piece of misinformation was not effective — statements expressing disbelief increased by 0.001 percent per day, while statements of belief decreased only 0.002 percent. However, a fact check had a more significant impact; after the release of a fact check, statements expressing disbelief in a false claim increased 5 percent, while statements expressing belief decreased by 3.4 percent.
Jiang also looked at whether the type of misinformation being generated by storytellers has changed over time. There is a wide range of different types of misinformation, from humor to mistakes, scams to legends, clickbait to fully fabricated stories. How much misinformation is simply a misprint or a joke in poor taste, compared to an intentionally or maliciously misleading fabrication?
He approached this question by applying natural language processing techniques to fact checking articles, which often have keywords like “common misconception” or “satire” that can help categorize their subjects, and comparing the relative prevalence of different types of misinformation over time. Jiang’s analysis found that over the last 10 years, legends, scams, and mistakes all became less common. Fabricated content and altered content, on the other hand, increased as new technologies made it easier to create believable fakes. Conspiracies have also been on the rise.
He was also able to compare the types of misinformation that plagued specific events; for instance, hoaxes, satire, and clickbait were all more common during the 2016 presidential election than during the 2020 cycle. Coverage of the 2009-2010 H1N1 pandemic had more mistakes and legends than coverage of the current COVID-19 pandemic, but fewer conspiracy theories.
“A lot of people will think a lot of misinformation is purposefully generated to sort of confuse people, to make people believe a certain sense,” said Jiang. “But in my research I also discovered a lot of this kind of misinformation is initially generated in a very benign kind of way, just kind of a regular mistake or someone just making a joke or something like that, and then people writing that without context. And then other people can go, you know, ‘this is true, this is a true story’, and then that becomes misinformation.”
“These issues (of online misinformation) are so big that they are driving national-level politics,” explained Jiang’s advisor Christo Wilson, an associate professor at Khoury College whose research focuses on big data, security and privacy, and algorithm auditing. “Are the platforms “censoring” political speech or doing their best to moderate toxic misinformation? Is online misinformation really driving polarization and radicalization, in which regulators should act immediately, or is this a tempest in a teapot? The CEOs do their best to deflect criticism, but nobody believes them. Research like Shan’s is what will ultimately help us answer these questions.”
So what should individuals do to avoid consuming or spreading misinformation online? Jiang gave two recommendations.
“A very high-level suggestion would be doing a little bit more research. You’ll see one news article ¾ just, you know, copying the title and doing some Google-ing and searching even related articles…that’s sort of the first thing you might want to do.”
His other suggestion requires more scrutiny. “When we know (there’s) some kind of event that’s happening, we usually look for news across the political spectrum,” said Jiang. “You’ll have these very neutral resources like AP or Reuters, and you’ll have media outlets that, it’s arguable (that) they’re a little bit left leaning like the New York Times, and then you have (more right leaning) Fox News.” With one specific, widely covered event, he advised readers to “look at how the reporting goes across the political spectrum.”
Growing up in China, Jiang was always fascinated by the relationships between rumor, propaganda, and misinformation; he quickly found it to be an international phenomenon. After completing an undergraduate degree studying information systems management at Beijing University of Posts and Telecommunications, Jiang was drawn to Northeastern and Khoury College’s graduate program by the opportunity to work with Wilson.
“At Northeastern, I have my advisor, Christo Wilson, and I also work closely with Alan Mislove. I think they are amazing people,” said Jiang. “They have research interests aligned with my own, and they also give students research freedom to explore.”
After he graduates in July, Jiang will join Facebook as a research scientist on the company’s Integrity team. In the position, he will be working to improve the detection of harmful content, such as posts selling guns or drugs. He is excited to continue to explore the intersection of AI and the social good.
“These are very hard problems,” said Jiang, “but I think we have to take it one step at a time to try to solve these problems.”
“It’s a pretty big problem,” said Shan Jiang, a Ph.D. candidate at Khoury College who’s studying online misinformation. “There are a considerable portion of people who are affected by misinformation, whether they are believing in the false claims, supporting false claims, or spreading them, or whether they themselves were being triggered, being angered (by false stories).”
Jiang is fascinated by the ways misinformation spreads and evolves, and by how both individuals and platforms (like Facebook or YouTube) respond to it. His research uses computational tools to attempt to answer questions of social science: for instance, are conservatives really more heavily moderated on YouTube than liberals? This spring, as part of the college’s Speaker Spotlight Series, he shared some of his findings in a virtual talk, “Measuring the Misinformation Ecosystem.”
To investigate his question about YouTube’s content moderation, Jiang collected 84,068 comments from 258 videos posted in January and June of 2018. He was able to make sense of a data set consisting of more words than are contained in most books by using natural language processing, which helped to tag the comments as left- or right-leaning and moderated or left live on the site. He also tagged whether the comments included anything that would bring about justifiable, nonpartisan moderation, such as hate speech.
“Our initial hypothesis was that this claim (that conservative viewpoints are subject to more moderation) is kind of true, but it’s kind of exaggerated — it’s actually not true at all,” said Jiang. “(Moderation is) evenly distributed, after factoring in other things that we consider to be justified.”
The study didn’t just look at partisanship; Jiang also found that comments on politically extreme YouTube videos were 50 percent more likely to be moderated than politically centrist videos. Comments on videos that are factually true are significantly less likely to be moderated, while comments on untrue videos that are posted after a fact-check are slightly more likely to be moderated.
Another research question Jiang discussed was whether audiences believe in misinformation, to what extent, and whether that belief can be changed. He used similar natural language processing methods but applied them instead to Twitter, examining whether the passage of time or the release of a fact-check impacted whether audiences believed a false claim. He found that “waiting out” a piece of misinformation was not effective — statements expressing disbelief increased by 0.001 percent per day, while statements of belief decreased only 0.002 percent. However, a fact check had a more significant impact; after the release of a fact check, statements expressing disbelief in a false claim increased 5 percent, while statements expressing belief decreased by 3.4 percent.
Jiang also looked at whether the type of misinformation being generated by storytellers has changed over time. There is a wide range of different types of misinformation, from humor to mistakes, scams to legends, clickbait to fully fabricated stories. How much misinformation is simply a misprint or a joke in poor taste, compared to an intentionally or maliciously misleading fabrication?
He approached this question by applying natural language processing techniques to fact checking articles, which often have keywords like “common misconception” or “satire” that can help categorize their subjects, and comparing the relative prevalence of different types of misinformation over time. Jiang’s analysis found that over the last 10 years, legends, scams, and mistakes all became less common. Fabricated content and altered content, on the other hand, increased as new technologies made it easier to create believable fakes. Conspiracies have also been on the rise.
He was also able to compare the types of misinformation that plagued specific events; for instance, hoaxes, satire, and clickbait were all more common during the 2016 presidential election than during the 2020 cycle. Coverage of the 2009-2010 H1N1 pandemic had more mistakes and legends than coverage of the current COVID-19 pandemic, but fewer conspiracy theories.
“A lot of people will think a lot of misinformation is purposefully generated to sort of confuse people, to make people believe a certain sense,” said Jiang. “But in my research I also discovered a lot of this kind of misinformation is initially generated in a very benign kind of way, just kind of a regular mistake or someone just making a joke or something like that, and then people writing that without context. And then other people can go, you know, ‘this is true, this is a true story’, and then that becomes misinformation.”
“These issues (of online misinformation) are so big that they are driving national-level politics,” explained Jiang’s advisor Christo Wilson, an associate professor at Khoury College whose research focuses on big data, security and privacy, and algorithm auditing. “Are the platforms “censoring” political speech or doing their best to moderate toxic misinformation? Is online misinformation really driving polarization and radicalization, in which regulators should act immediately, or is this a tempest in a teapot? The CEOs do their best to deflect criticism, but nobody believes them. Research like Shan’s is what will ultimately help us answer these questions.”
So what should individuals do to avoid consuming or spreading misinformation online? Jiang gave two recommendations.
“A very high-level suggestion would be doing a little bit more research. You’ll see one news article ¾ just, you know, copying the title and doing some Google-ing and searching even related articles…that’s sort of the first thing you might want to do.”
His other suggestion requires more scrutiny. “When we know (there’s) some kind of event that’s happening, we usually look for news across the political spectrum,” said Jiang. “You’ll have these very neutral resources like AP or Reuters, and you’ll have media outlets that, it’s arguable (that) they’re a little bit left leaning like the New York Times, and then you have (more right leaning) Fox News.” With one specific, widely covered event, he advised readers to “look at how the reporting goes across the political spectrum.”
Growing up in China, Jiang was always fascinated by the relationships between rumor, propaganda, and misinformation; he quickly found it to be an international phenomenon. After completing an undergraduate degree studying information systems management at Beijing University of Posts and Telecommunications, Jiang was drawn to Northeastern and Khoury College’s graduate program by the opportunity to work with Wilson.
“At Northeastern, I have my advisor, Christo Wilson, and I also work closely with Alan Mislove. I think they are amazing people,” said Jiang. “They have research interests aligned with my own, and they also give students research freedom to explore.”
After he graduates in July, Jiang will join Facebook as a research scientist on the company’s Integrity team. In the position, he will be working to improve the detection of harmful content, such as posts selling guns or drugs. He is excited to continue to explore the intersection of AI and the social good.
“These are very hard problems,” said Jiang, “but I think we have to take it one step at a time to try to solve these problems.”
“It’s a pretty big problem,” said Shan Jiang, a Ph.D. candidate at Khoury College who’s studying online misinformation. “There are a considerable portion of people who are affected by misinformation, whether they are believing in the false claims, supporting false claims, or spreading them, or whether they themselves were being triggered, being angered (by false stories).”
Jiang is fascinated by the ways misinformation spreads and evolves, and by how both individuals and platforms (like Facebook or YouTube) respond to it. His research uses computational tools to attempt to answer questions of social science: for instance, are conservatives really more heavily moderated on YouTube than liberals? This spring, as part of the college’s Speaker Spotlight Series, he shared some of his findings in a virtual talk, “Measuring the Misinformation Ecosystem.”
To investigate his question about YouTube’s content moderation, Jiang collected 84,068 comments from 258 videos posted in January and June of 2018. He was able to make sense of a data set consisting of more words than are contained in most books by using natural language processing, which helped to tag the comments as left- or right-leaning and moderated or left live on the site. He also tagged whether the comments included anything that would bring about justifiable, nonpartisan moderation, such as hate speech.
“Our initial hypothesis was that this claim (that conservative viewpoints are subject to more moderation) is kind of true, but it’s kind of exaggerated — it’s actually not true at all,” said Jiang. “(Moderation is) evenly distributed, after factoring in other things that we consider to be justified.”
The study didn’t just look at partisanship; Jiang also found that comments on politically extreme YouTube videos were 50 percent more likely to be moderated than politically centrist videos. Comments on videos that are factually true are significantly less likely to be moderated, while comments on untrue videos that are posted after a fact-check are slightly more likely to be moderated.
Another research question Jiang discussed was whether audiences believe in misinformation, to what extent, and whether that belief can be changed. He used similar natural language processing methods but applied them instead to Twitter, examining whether the passage of time or the release of a fact-check impacted whether audiences believed a false claim. He found that “waiting out” a piece of misinformation was not effective — statements expressing disbelief increased by 0.001 percent per day, while statements of belief decreased only 0.002 percent. However, a fact check had a more significant impact; after the release of a fact check, statements expressing disbelief in a false claim increased 5 percent, while statements expressing belief decreased by 3.4 percent.
Jiang also looked at whether the type of misinformation being generated by storytellers has changed over time. There is a wide range of different types of misinformation, from humor to mistakes, scams to legends, clickbait to fully fabricated stories. How much misinformation is simply a misprint or a joke in poor taste, compared to an intentionally or maliciously misleading fabrication?
He approached this question by applying natural language processing techniques to fact checking articles, which often have keywords like “common misconception” or “satire” that can help categorize their subjects, and comparing the relative prevalence of different types of misinformation over time. Jiang’s analysis found that over the last 10 years, legends, scams, and mistakes all became less common. Fabricated content and altered content, on the other hand, increased as new technologies made it easier to create believable fakes. Conspiracies have also been on the rise.
He was also able to compare the types of misinformation that plagued specific events; for instance, hoaxes, satire, and clickbait were all more common during the 2016 presidential election than during the 2020 cycle. Coverage of the 2009-2010 H1N1 pandemic had more mistakes and legends than coverage of the current COVID-19 pandemic, but fewer conspiracy theories.
“A lot of people will think a lot of misinformation is purposefully generated to sort of confuse people, to make people believe a certain sense,” said Jiang. “But in my research I also discovered a lot of this kind of misinformation is initially generated in a very benign kind of way, just kind of a regular mistake or someone just making a joke or something like that, and then people writing that without context. And then other people can go, you know, ‘this is true, this is a true story’, and then that becomes misinformation.”
“These issues (of online misinformation) are so big that they are driving national-level politics,” explained Jiang’s advisor Christo Wilson, an associate professor at Khoury College whose research focuses on big data, security and privacy, and algorithm auditing. “Are the platforms “censoring” political speech or doing their best to moderate toxic misinformation? Is online misinformation really driving polarization and radicalization, in which regulators should act immediately, or is this a tempest in a teapot? The CEOs do their best to deflect criticism, but nobody believes them. Research like Shan’s is what will ultimately help us answer these questions.”
So what should individuals do to avoid consuming or spreading misinformation online? Jiang gave two recommendations.
“A very high-level suggestion would be doing a little bit more research. You’ll see one news article ¾ just, you know, copying the title and doing some Google-ing and searching even related articles…that’s sort of the first thing you might want to do.”
His other suggestion requires more scrutiny. “When we know (there’s) some kind of event that’s happening, we usually look for news across the political spectrum,” said Jiang. “You’ll have these very neutral resources like AP or Reuters, and you’ll have media outlets that, it’s arguable (that) they’re a little bit left leaning like the New York Times, and then you have (more right leaning) Fox News.” With one specific, widely covered event, he advised readers to “look at how the reporting goes across the political spectrum.”
Growing up in China, Jiang was always fascinated by the relationships between rumor, propaganda, and misinformation; he quickly found it to be an international phenomenon. After completing an undergraduate degree studying information systems management at Beijing University of Posts and Telecommunications, Jiang was drawn to Northeastern and Khoury College’s graduate program by the opportunity to work with Wilson.
“At Northeastern, I have my advisor, Christo Wilson, and I also work closely with Alan Mislove. I think they are amazing people,” said Jiang. “They have research interests aligned with my own, and they also give students research freedom to explore.”
After he graduates in July, Jiang will join Facebook as a research scientist on the company’s Integrity team. In the position, he will be working to improve the detection of harmful content, such as posts selling guns or drugs. He is excited to continue to explore the intersection of AI and the social good.
“These are very hard problems,” said Jiang, “but I think we have to take it one step at a time to try to solve these problems.”
“It’s a pretty big problem,” said Shan Jiang, a Ph.D. candidate at Khoury College who’s studying online misinformation. “There are a considerable portion of people who are affected by misinformation, whether they are believing in the false claims, supporting false claims, or spreading them, or whether they themselves were being triggered, being angered (by false stories).”
Jiang is fascinated by the ways misinformation spreads and evolves, and by how both individuals and platforms (like Facebook or YouTube) respond to it. His research uses computational tools to attempt to answer questions of social science: for instance, are conservatives really more heavily moderated on YouTube than liberals? This spring, as part of the college’s Speaker Spotlight Series, he shared some of his findings in a virtual talk, “Measuring the Misinformation Ecosystem.”
To investigate his question about YouTube’s content moderation, Jiang collected 84,068 comments from 258 videos posted in January and June of 2018. He was able to make sense of a data set consisting of more words than are contained in most books by using natural language processing, which helped to tag the comments as left- or right-leaning and moderated or left live on the site. He also tagged whether the comments included anything that would bring about justifiable, nonpartisan moderation, such as hate speech.
“Our initial hypothesis was that this claim (that conservative viewpoints are subject to more moderation) is kind of true, but it’s kind of exaggerated — it’s actually not true at all,” said Jiang. “(Moderation is) evenly distributed, after factoring in other things that we consider to be justified.”
The study didn’t just look at partisanship; Jiang also found that comments on politically extreme YouTube videos were 50 percent more likely to be moderated than politically centrist videos. Comments on videos that are factually true are significantly less likely to be moderated, while comments on untrue videos that are posted after a fact-check are slightly more likely to be moderated.
Another research question Jiang discussed was whether audiences believe in misinformation, to what extent, and whether that belief can be changed. He used similar natural language processing methods but applied them instead to Twitter, examining whether the passage of time or the release of a fact-check impacted whether audiences believed a false claim. He found that “waiting out” a piece of misinformation was not effective — statements expressing disbelief increased by 0.001 percent per day, while statements of belief decreased only 0.002 percent. However, a fact check had a more significant impact; after the release of a fact check, statements expressing disbelief in a false claim increased 5 percent, while statements expressing belief decreased by 3.4 percent.
Jiang also looked at whether the type of misinformation being generated by storytellers has changed over time. There is a wide range of different types of misinformation, from humor to mistakes, scams to legends, clickbait to fully fabricated stories. How much misinformation is simply a misprint or a joke in poor taste, compared to an intentionally or maliciously misleading fabrication?
He approached this question by applying natural language processing techniques to fact checking articles, which often have keywords like “common misconception” or “satire” that can help categorize their subjects, and comparing the relative prevalence of different types of misinformation over time. Jiang’s analysis found that over the last 10 years, legends, scams, and mistakes all became less common. Fabricated content and altered content, on the other hand, increased as new technologies made it easier to create believable fakes. Conspiracies have also been on the rise.
He was also able to compare the types of misinformation that plagued specific events; for instance, hoaxes, satire, and clickbait were all more common during the 2016 presidential election than during the 2020 cycle. Coverage of the 2009-2010 H1N1 pandemic had more mistakes and legends than coverage of the current COVID-19 pandemic, but fewer conspiracy theories.
“A lot of people will think a lot of misinformation is purposefully generated to sort of confuse people, to make people believe a certain sense,” said Jiang. “But in my research I also discovered a lot of this kind of misinformation is initially generated in a very benign kind of way, just kind of a regular mistake or someone just making a joke or something like that, and then people writing that without context. And then other people can go, you know, ‘this is true, this is a true story’, and then that becomes misinformation.”
“These issues (of online misinformation) are so big that they are driving national-level politics,” explained Jiang’s advisor Christo Wilson, an associate professor at Khoury College whose research focuses on big data, security and privacy, and algorithm auditing. “Are the platforms “censoring” political speech or doing their best to moderate toxic misinformation? Is online misinformation really driving polarization and radicalization, in which regulators should act immediately, or is this a tempest in a teapot? The CEOs do their best to deflect criticism, but nobody believes them. Research like Shan’s is what will ultimately help us answer these questions.”
So what should individuals do to avoid consuming or spreading misinformation online? Jiang gave two recommendations.
“A very high-level suggestion would be doing a little bit more research. You’ll see one news article ¾ just, you know, copying the title and doing some Google-ing and searching even related articles…that’s sort of the first thing you might want to do.”
His other suggestion requires more scrutiny. “When we know (there’s) some kind of event that’s happening, we usually look for news across the political spectrum,” said Jiang. “You’ll have these very neutral resources like AP or Reuters, and you’ll have media outlets that, it’s arguable (that) they’re a little bit left leaning like the New York Times, and then you have (more right leaning) Fox News.” With one specific, widely covered event, he advised readers to “look at how the reporting goes across the political spectrum.”
Growing up in China, Jiang was always fascinated by the relationships between rumor, propaganda, and misinformation; he quickly found it to be an international phenomenon. After completing an undergraduate degree studying information systems management at Beijing University of Posts and Telecommunications, Jiang was drawn to Northeastern and Khoury College’s graduate program by the opportunity to work with Wilson.
“At Northeastern, I have my advisor, Christo Wilson, and I also work closely with Alan Mislove. I think they are amazing people,” said Jiang. “They have research interests aligned with my own, and they also give students research freedom to explore.”
After he graduates in July, Jiang will join Facebook as a research scientist on the company’s Integrity team. In the position, he will be working to improve the detection of harmful content, such as posts selling guns or drugs. He is excited to continue to explore the intersection of AI and the social good.
“These are very hard problems,” said Jiang, “but I think we have to take it one step at a time to try to solve these problems.”