Hate Speech Identification in West Africa, Using Machine- Learning Techniques
Abstract
West Africa has witnessed an unprecedented surge in hate speech activities as a result of the sharp increase in social media usage over the past decade. Her unity is constantly in jeopardy because of the tense climate this has created. The existing efforts by security agencies to monitor hate speech on social media by employing human monitors and site spiders to determine what constitutes hate speech are inadequate. This study suggested using machine - learning techniques to create a detection model as a solution to this issue. In order to extract valuable features from the cleaned dataset, the data was pre-processed using word embeddings, Count Vectorizer, and Term Frequency-Inverse Document Frequency (Tf-Idf). The dataset was trained using five different classifiers: Logistic Regression(LR), Naïve Bayes (NB), Extreme Gradient Boost (XGBOOST), Deep Neural Network (DNN), and Bidirectional Long and Short-Term Memory (Bi-LSTM). The experiment's best result was an accuracy of 92% and an F1-Score of 83% when the Bi-LSTM fitted on GloVe embedding was evaluated on a test set. In general, the machine learning models performed well on test data, indicating that they had learned from the training set and could apply that information to the analysis of fresh data.