There are other fake news detectors out there. What makes this one different?

In previous work, fake news detectors have been centered around authors and sources that are known producers of fake news. But what if the source is unknown? This detector is unique in that it analyzes patterns in language found in the body of the news article, regardless of source or author. Previous detectors were trained and tested on the same news topics and may simply capture a bias in the dataset at the topic level (for example, there are more fake articles about "Trump" than "soccer.") In this paper, we study a fake news detector that leverages deep neural networks, and by evaluating a topic that is not included in the training dataset, we demonstrate that it can capture patterns that are not subject-specific. We also address a well-known problem of deep learning, which is the lack of transparency in the decision-making process, also known as the black-box problem. Not knowing what underlies decisions made by the network leaves uncertainty as to whether the algorithm can be trusted or not. Our analysis reveals that the network learns to detect language patterns in fake news articles that can be generalized toward detecting fake news covering distinct topics. This is a first step towards understanding the reliability of deep learning based fake news detectors.

How did the researchers train the detector?

We trained our neural network with approximately 24,000 news articles published between October 26 and November 25, 2016. These dates were specifically chosen because they span the period of time directly before, during, and after the 2016 United States presidential election. Our fake news dataset consisted of approximately 12,000 articles pulled from Kaggle, which maintains a blacklist of fake news source websites. Our real news dataset was comprised of 9,000 Guardian articles and over 2,000 New York Times articles.

What is the highlighted text?

Based on its training (as described above), the detector searches for language patterns that it has taught itself to associate with fake news and real news. It identifies up to 128 patterns per scan and highlights up to 20 of the top ranked phrases with a color that it corresponds to fake news (red) and real news (green).

How can I learn more about this detector?

For more information, read this MIT news story or take a look at our original manuscript, "The Language of Fake News: Opening the Black-Box of Deep Learning Based Detectors," which we presented at a workshop called "AI for Social Good" at the 32nd Conference on Neural Information Processing Systems in Montreal, Canada.

Disclaimer

THIS WEBSITE IS FOR ONGOING RESEARCH PURPOSES ONLY AND SHOULD NOT BE CONSTRUED IN ANY WAY, SHAPE OR FORM AS A VALID MEASURE FOR THE ACCURACY, LEGITIMACY OR ANY OTHER NATURE OF ANY TEXT THAT IS SUBMITTED TO THIS WEBSITE.

This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.