In this project, we aim to develop the first system that can detect fake news on Arabic social media, analyze its spread among online users, and identify the key influencers behind it. Our system is planned to be integrated with AlJazeera network as an industrial partner and end user.
Fake news is everywhere nowadays. For a long time, such news propagated over newspapers and traditional mainstream media; however, the recent emergence and popularity of social media platforms (e.g., Twitter) made it very easy to spread a rumor in just a few hours or even seconds across continents. That usually happens even before the news is picked up by the mainstream media. The spread of wrong information arises to have a strong (negative) influence, not only on involved individuals, but also on large communities and even countries. As a clear example, fake news played a major role in the current GCC crisis since the beginning, and it continues to be a big factor.
With the advances in artificial intelligence, machine learning, and several other related fields, the problem of fake news detection has been studied in the past few years, and more extensively in the past couple of years. However, most of this work has focused mainly on English (and few other languages), while no attention was directed towards Arabic. In this project, we propose to design, implement, and deploy an end-to-end system that monitors Arabic social media (in particular Twitter) to early-detect fake news, analyze its propagation, and provide supporting and/or refuting evidence that can be understood and verified by the end user. The entire process would be very lengthy, costly, and tedious if performed manually.
The project has five major scientific objectives: (1) early detection of fake news/claims that are propagated through the social media but did not make it yet to the news, (2) effective verification of such claims that estimates its veracity, (3) analyzing the propagation pattern of the detected claims and identifying the key influencers behind it, (4) building an end-to-end system serving an end-user in the media field with a dashboard summarizing detected fake news over time, and (5) achieving all of that over Arabic content, with all of the challenges of the Arabic language and its dialects that are widely used over social media.
Our proposed solution decomposes the problem into five sub-problems: (1) topic tracking, to identify relevant posts to given topics of interest, (2) detection of worth-checking claims, to identify them among the emerging claims within the relevant posts, (3) credibility estimation of users and posts related to the claim, (4) veracity classification of the claim, and (5) spread analysis, to discover propagation dynamics of the claims. We adopt an approach that integrates natural language processing, information retrieval, social analytics, image processing, and machine learning techniques, leveraging signals from the social media textual and imagery content, social networks, history of social posts, history of news articles, the Web, and the user feedback.
We expect our project to have several potential contributions. First, we propose the first fake new detection system over Arabic content. Second, our proposed system provides both a confidence score and a justification for supporting/refuting the detected claims, which allows the user to understand the system decision and verify it. Third, we tackle the problem in a personalized mode, where the system adapts its algorithms based on the user’s (continuous) implicit/explicit feedback. Fourth, our system integrates both the textual and visual features to detect and verify the claims. Fifth, we provide several labelled datasets as evaluation testbeds for further research in the area, especially on Arabic. Finally, we plan to conduct a case study on the most propagated fake news during the gulf-crisis.
By the end of the project, we envision several major outcomes. Our research advances the state-of-the-art in the area of fake news detection in general and over Arabic in particular; several conference and journal publications at top venues are planned. We also plan to release annotated data on Arabic fake news (the first of its kind). A real-time end-to-end system will be developed and integrated with end users (e.g., journalists); we plan to provide a system of technical readiness level TRL 6-7. Moreover, our system will provide a handy tool to verify news that can be used by normal users too. Finally, 2 graduate students will be co-supervised within the project team and 1-2 patents will potentially be filed in the USPTO and/or EPO to protect the developed technology.
Our project is expected to have a clear impact on the society and journalism profession in addition to the research community. Our system has the potential of changing the way that journalists work nowadays, by adding another source of evidence. We also aim to allow normal users to verify news and significantly decrease the propagation of fake news. Identifying the key influencers can also inspire proactive actions towards future possible incidents. The research on fake news detection is still in its infancy; therefore, we plan to organize a workshop or a shared task in either SemEval (a top NLP evaluation forum) or ICWSM (a top social computing conference), and make our annotated Arabic data available on Kaggle to further promote the research in the new emerging area.
The outcomes of a real-time fake news detection system would clearly benefit news agencies around the world. We envision our system to be used by journalists as a source of evidence when newly emerging claims appear in social media and were not yet verified or even picked up by mainstream media. A direct beneficiary of our proposed system is the AlJazeera network in Qatar, which is a leading news outlet with recognized worldwide influence. In this project, we will work very closely with AlJazeera as an end user.