السنة | 2024-07-23 |
---|---|
التخصص | ماجستير هندسة البرمجيات |
العنوان | Automated detection of cyberbullying in arabic content |
اسم المشرف الرئيسي | ثامر سامي حسين الروسان | Thamer Al-Rousan |
اسم المشرف المشارك | عدي عبدالحليم معايطة | Adi Abdelhalim Maaita |
اسم الطالب | محمد ابراهيم عبدالله الزحيمات | Mohammed Ibrahim AL-Zhaimat |
Abstract | Automated detection of cyberbullying is important because of the negative effects on individuals and society, especially after the emergence of cyberbullying as an issue of concern that needs to be addressed. This research aims to explore the effective models for automated detection of cyberbullying in Arabic content. The study made use of two datasets that contained Arabic-language content collected from several social media sites, including YouTube, Facebook, and Twitter. The first data set consisted of 13244 comments, balanced between content that does not represent cyberbullying and content that does represent cyberbullying. There are 15050 comments in the second dataset that have an uneven distribution of classes. Two preprocessing methods were used to maximize the effectiveness of the models, and their effects on the performance of the classifiers were compared: using the Tashaphyne library for stemming, and Farasa for dataset segmentation. Measures of recall, precision, accuracy, and F1 score were used to measure the effectiveness of the machine learning classification algorithms examined ensemble classifiers, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Long Short Term Memory (LSTM) and a quantitative measure of accuracy was performed using confusion matrices and Receiver Operating Characteristic (ROC) curves. The voting classifier ensemble method achieved a high result on dataset1, with an accuracy of 97.96% in scenario 1 (stemming) and 98.50% in scenario 2 (segmentation). The CNN, RNN, and LSTM models also show strong performance, with accuracy ranging from 98.27% to 98.73% on Dataset 1. On Dataset 2, the voting classifiers achieve an accuracy of 76.31% and 78.20% in Scenarios 1 and 2, respectively. CNN, RNN, and LSTM models achieve an accuracy of 76.80% to 77.70% in Scenario 1 and 78.60% to 79.23% in Scenario 2. Marginally superior results are observed for segmentation preprocessing compared to stemming. This study contributes to the field of automatic detection of cyberbullying in Arabic content by applying different machine-learning algorithms. The results indicated the importance and impact of preprocessing to increase the accuracy of cyberbullying detection. The models can be used as automated tools to identify and combat cyberbullying, which in turn leads to the promotion of a safer online environment for Arabic-speaking users. Keywords: Cyberbullying, Arabic content detection, Automated detection, Machine learning, Ensemble method. |
الأبحاث المستلة |