Example: air traffic controller

Fake News Detection Using Machine Learning - uliege.be

fake news Detection Using MachineLearningAuthor:Simon LorentSupervisor:Ashwin ItooA thesis presented for the degree ofMaster in data ScienceUniversity Of Li`egeFaculty Of Applied ScienceBelgiumAccademic Year 2018-2019 Contents1 What are fake news ? .. news Characterization .. Feature Extraction .. Content Features .. Context Features .. news Content Models .. models .. Model .. social Context Models .. Related Works .. news Detection .. of the Art Text classification .. Conclusion .. 122 Related Introduction .. Supervised Learning for fake news Detection [12] .. CSI: A Hybrid Deep Model for fake news Detection .

of news. In order to work on fake news detection, it is important to understand what is fake news and how they are characterized. The following is based on Fake News Detection on Social Media: A Data Mining Perspective[9]. The rst is characterization or what is fake news and the second is detection. In order

Tags:

  Social, Media, Data, Mining, News, Efka, Detection, Fake news detection on social media, Fake news, A data mining, Fake news detection

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Fake News Detection Using Machine Learning - uliege.be

1 fake news Detection Using MachineLearningAuthor:Simon LorentSupervisor:Ashwin ItooA thesis presented for the degree ofMaster in data ScienceUniversity Of Li`egeFaculty Of Applied ScienceBelgiumAccademic Year 2018-2019 Contents1 What are fake news ? .. news Characterization .. Feature Extraction .. Content Features .. Context Features .. news Content Models .. models .. Model .. social Context Models .. Related Works .. news Detection .. of the Art Text classification .. Conclusion .. 122 Related Introduction .. Supervised Learning for fake news Detection [12] .. CSI: A Hybrid Deep Model for fake news Detection .

2 Some Like it Hoax: Automated fake news Detection in social Networks [16] fake news Detection Using Stacked Ensemble of Classifiers .. Convolutional Neural Networks for fake news Detection [19] .. Conclusion .. 173 data Introduction .. Datasets .. news Corpus .. , Liar Pants on Fire .. Dataset statistics .. news Corpus .. Corpus .. Visualization With t-SNE .. Conclusion .. 291 CONTENTS24 Machine Learning Introduction .. Text to vectors .. Methodology .. Metrics .. Models .. ve-Bayes[7] .. SVM .. Tree[36] .. Classifier .. Models on liar-liar dataset.

3 SVC .. Tree .. Classifier .. Feature Number .. Models on fake corpus dataset .. : Synthetic Minority Over-sampling Technique[37] .. selection without Using SMOTE .. selection with SMOTE .. Results on testing set .. Conclusion .. 475 Attention Introduction .. Text to Vectors .. LSTM .. Attention Mechanism .. Results .. dataset results .. Mechanism .. Analysis .. Attention Mechanism on fake news corpus .. Selection .. Conclusion .. 776 Result analysis .. Future works .. TF-IDF max features row results on liar-liar corpus .. Weighted Average Metrics.

4 Per Class Metrics .. TF-IDF max features row results for fake news corpus without SMOTE .. Training plot for attention mechanism .. 89 Master thesisFake news Detection Using Machine learningSimon LorentAcknowledgementI would start by saying thanks to my family, who have always been supportive and whohave always believed in would also thanks Professor Itoo for his help and the opportunity he gave me to workson this very interesting addition I would also thank all the professors of the faculty of applied science for whatthey taught me during these five years at the University of Li` thesisFake news Detection Using Machine learningSimon LorentAbstractFor some years, mostly since the rise of social media , fake news have become a societyproblem, in some occasion spreading more and faster than the true information.

5 In thispaper I evaluate the performance of Attention Mechanism for fake news Detection ontwo datasets, one containing traditional online news articles and the second one newsfrom various sources. I compare results on both dataset and the results of AttentionMechanism against LSTMs and traditional Machine Learning methods. It shows thatAttention Mechanism does not work as well as expected. In addition, I made changesto original Attention Mechanism paper[1], by Using word2vec embedding, that proves toworks better on this particular What are fake news ? DefinitionFake news has quickly become a society problem, being used to propagate false or rumourinformation in order to change peoples behaviour.

6 It has been shown that propagation offake news has had a non-negligible influence of 2016 US presidential elections[2]. A fewfacts on fake news in the United States: 62% of US citizens get their news for social medias[3] fake news had more share on Facebook than mainstream news [4]. fake news has also been used in order to influence the referendum in the United Kingdomfor the Brexit .In this paper I experiment the possibility to detect fake news based only on textual infor-mation by applying traditional Machine Learning techniques[5, 6, 7] as well as bidirectional-LSTM[8] and attention mechanism[1] on two different datasets that contain different kindsof order to work on fake news Detection , it is important to understand what is fake newsand how they are characterized.

7 The following is based onFake news Detection on SocialMedia: a data mining Perspective[9].The first is characterization or what is fake news and the second is Detection . In orderto build Detection models, it is need to start by characterization, indeed, it is need tounderstand what is fake news before trying to detect fake news CharacterizationFake news definition is made of two parts: authenticity and intent. Authenticity meansthat fake news content false information that can be verified as such, which means thatconspiracy theory is not included in fake news as there are difficult to be proven true orfalse in most cases. The second part, intent, means that the false information has beenwritten with the goal of misleading the 1.

8 INTRODUCTION8 Figure : fake news on social media : from characterization to Detection .[9]Definition 1 fake news is a news article that is intentionally and verifiable Feature news Content FeaturesNow that fake news has been defined and the target has been set, it is needed to analysewhat features can be used in order to classify fake news . Starting by looking at newscontent, it can be seen that it is made of four principal raw components: Source: Where does the news come from, who wrote it, is this source reliable ornot. Headline: Short summary of the news content that try to attract the reader. Body Text: The actual text content of the news . Image/Video: Usualy, textual information is agremented with visual informationsuch as images, videos or will be extracted from these four basic components, with the mains featuresbeing linguistic-based and visual-based.

9 As explained before, fake news is used to influ-ence the consumer, and in order to do that, they often use a specific language in orderto attract the readers. On the other hand, non- fake news will mostly stick to a differentlanguage register, being more formal. This is linguistic-based features, to which can beadded lexical features such as the total number of words, frequency of large words orunique second features that need to be taken into account are visual features. Indeed,modified images are often used to add more weight to the textual information. Forexample, theFigure supposed to show the progress of deforestation, but the twoimages are actually from the same original one, and in addition the WWF logo makes itlook like to be from a trusted social Context FeaturesIn the context of news sharing on social media , multiple aspect can be taken into account,such as user aspect, post aspect and group aspect.

10 For instance, it is possible to analysethe behaviour of specific users and use their metadata in order to find if a user is at riskCHAPTER 1. INTRODUCTION9 Figure : The two images provided to show deforestation between two dates are fromthe same image taken at the same time. [10]CHAPTER 1. INTRODUCTION10of trusting or sharing false information. For instance, this metadata can be its centre ofinterest, its number of followers, or anything that relates to aspect is in a sense similar to users based: it can use post metadata in order toprovide useful information, but in addition to metadata, the actual content can be is also possible to extract features from the content Using latent Dirichlet allocation(LDA)[11].