Εντοπισμός Spam σε διαδικτυακές σελίδες με χρήση μεθόδων μηχανικής μάθησης (Bachelor thesis)

Γραβάνης, Γεώργιος


Main target of this thesis is the implementation of machine learning algorithms in order to classify whether a web page has spam content or not. The algorithms we are going to use for this study are Support Vector Machines (SVM’s) in combination with linear, Gaussian and polynomial kernel and Naïve Bayes. Furthermore, we are trying a Convolutional Neural Network implementation (LeNet5) as it is proposed by Yann LeCun. The metric we use is Area Under the Curve with the best method in our case to score 78%. The code is written in Python and the environment we use is Eclipse Mars. SVM’s and Naïve Bayes functions can be found in Scikit library. Moreover, in order to achieve best results we wrote some methods for data preprocessing such as balance_dataset() and softmax(). Finally, in this thesis you can find a small review for collections about web spam.
Institution and School/Department of submitter: Σχολή Τεχνολογικών Εφαρμογών - Τμήμα Πληροφορικής - Μεταπτυχιακό Πρόγραμμα Σπουδών Ευφυείς Τεχνολογίες Διαδικτύου - Web Intelligence
Subject classification: Machine learning
Μηχανική μάθηση
Data protection
Προστασία δεδομένων
Spam filtering (Electronic mail)
Φιλτράρισμα σπαμ (Ηλεκτρονικό ταχυδρομείο)
Keywords: Μηχανική μάθηση;Web spam;SVM;Support Vector Machines;Naïve Bayes;CNN;Convolutional Neural Network;Content Based Features
Description: μεταπτυχιακή εργασία -- ΣΤΕΦ -- Τμήμα Πληροφορικής, ΠΜΣ : Ευφυείς Τεχνολογίες Διαδικτύου, 2016 (α/α8065)
URI: http://195.251.240.227/jspui/handle/123456789/13237
Appears in Collections:Μεταπτυχιακές Διατριβές

Files in This Item:
There are no files associated with this item.



 Please use this identifier to cite or link to this item:
http://195.251.240.227/jspui/handle/123456789/13237
  This item is a favorite for 0 people.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.