Deteksi Seksisme Online menggunakan Support Vector Machine dan Naïve Bayes

DIYANK SHABIRA, SARIFUDDIN MADENDA, AL HAFIZ AKBAR MAULANA SIAGIAN, SLAMET RIYANTO

Sari


Abstrak

Seksisme online menjadi topik penting di media sosial yang mempengaruhi perkembangan internet, menimbulkan efek negatif dan menjadi ancaman serius bagi wanita yang menjadi target. Penelitian ini menggunakan machine learning untuk mendeteksi seksisme pada kalimat bahasa Inggris. Algoritma yang digunakan adalah Support Vector Machine dan Naive Bayes. Grid search diterapkan pada model untuk mencari kombinasi hyperparameter terbaik sehingga menghasilkan skor terbaik. Pelatihan dibagi menjadi dua tugas, yaitu (1) pelatihan model menggunakan data tanpa penanganan imbalanced dan (2) pelatihan model menggunakan data yang telah dilakukan SMOTE. Hasil dari pelatihan model menunjukkan model SVM+SMOTE menghasilkan rata-rata skor F1 terbaik paling tinggi yaitu sebesar 0,96. Pengujian menggunakan data uji menunjukkan model SVM+SMOTE menghasilkan skor F1 tertinggi, yaitu sebesar 0,90 dengan 1467 kalimat diklasifikasikan benar 'not sexist’, 47 kalimat ‘not sexist’ diklasifikasikan sebagai ‘sexist’, 189 kalimat ‘sexist’ diklasifikasikan benar dan 297 kalimat ‘sexist’ diklasifikasikan sebagai ‘not sexist’.

Kata kunci: Seksisme, Deteksi, SVM, Naive Bayes, SMOTE

Abstract

Online sexism has become a significant issue on social media, impacting internet progress and posing a serious threat to targeted women. This research uses machine learning to detect sexism in English sentences. The algorithms used are Support Vector Machine and Naive Bayes. Grid search is applied in the model to find the best combination of hyperparameters to produce the best score. The training is divided into two tasks: (1) training the model using unhandle the imbalanced data and (2) training the model using data with SMOTE. The training results show that the SVM+SMOTE model produces the highest average best F1 score is 0.96. The testing results show that the SVM+SMOTE model produces the highest F1 score is 0.90 with 1467 sentences correctly classified as 'not sexist', 47 'not sexist' sentences classified as 'sexist', 189 sentences classified as 'sexist' correctly and 297 'sexist' sentences were classified as 'not sexist'.

Keywords: Sexism, Detection, SVM, Naive Bayes, SMOTE

Teks Lengkap:

PDF

Referensi


Appel, G., Grewal, L., Hadi, R., & Stephen, A. T. (2020). The Future of Social media in Marketing. Journal of the Academy of Marketing Science, 79-95.

Asogwa, D. C., Chukwuneke, C. I., Ngene, C. C., & Anigbogu, G. N. (2022, March 21). Hate Speech Classification Using SVM and Naive Bayes. Diambil kembali dari ArXiv: arxiv.org

Brownlee, J. (2020). Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python. Melbourne: Machine Learning Mastery. Diambil kembali dari Machine Learning Mastery: machinelearningmastery.com

Bustami, B. (2013). Penerapan Algoritma Naive Bayes Untuk Mengklasifikasi Data Nasabah Asuransi. TECHSI-Jurnal Teknik Informatika, 127-146.

Cambridge. (2023, June 14). Sexism. Diambil kembali dari Cambridge Dictionary: https://dictionary.cambridge.org/dictionary/english/sexism

Chen, J., Kudjo, P. K., Mensah, S., Brown, S. A., & Akorfu, G. (2020). An Automatic Software Vulnerability Classification Framework Using Term Frequency-Inverse Gravity Moment and Feature Selection. Journal of Systems and Software, 1-20.

Chiril, P., Moriceau, V., Benamara, F., Mari, A., Origgi, G., & Coulomb-Gully, M. (2020). An Annotated Corpus for Sexism Detection in French Tweets. Conference on Language Resources and Evaluation, (hal. 1397-1403).

Duong, H. T., & Nguyen-Thi, T. A. (2021). A Review: Preprocessing Techniques and Data Augmentation for Sentiment Analysis. Computational Social Networks, 1-16.

Fayed, H. A., & Atiya, A. F. (2019). Speed Up Grid-Search for Parameter Selection of Support Vector Machines. Applied Soft Computing, 1-16.

Fox, J., Cruz, C., & Lee, J. Y. (2015). Perpetuating Online Sexism Offline: Anonymity, Interactiviy, and the Effect of Sexist Hashtags on Social Media. Computers in Human Behavior, 436-442.

Kirk, H. R., Vidgen, B., Röttger, P., & Wenjie, Y. (2023, March 7). SemEval-2023 Task 10: Explainable Detection of Online Sexism. Diambil kembali dari ArXiv: arxiv.org

Kumar, R., Pal, S., & Pamula, R. (2021). Sexism Detection in English and Spanish Tweets. Conference of the Spanish Society for Natural Language Processing, (hal. 500-505).

Napier, J. L., Suppes, A., & Bettinsoli, M. L. (2020). Denial of Gender Discrimination is Associated with Better Subjective Well-Being Among Women: A System Justification Account. European Journal of Social Psychology, 1191-1209.

Powers, D. M. (2020, October 10). Evaluation: From Precision, Recall and F-Measure to Roc, Informedness, Markedness and Correlation. Diambil kembali dari ArXiv: arxif.org

Rahman, M. F., Alamsah, D., Darmawidjadja, M. I., & Nurma, I. (2017). Klasifikasi Untuk Diagnosa Diabetes Menggunakan Metode Bayesian Regularization Neural Network (RBNN). Jurnal Informatika, 36-45.

Rezkisari, I. (2019, December 2). Studi: Wanita Sasaran Seksisme Rentan Alami Depresi. Diambil kembali dari Republika: sindikasi.republika.co.id

Tanaka, M., & Okutomi, M. (2014). A Novel Inference of a Restricted Boltzmann Machine. International Conference on Pattern Recognition, (hal. 1526-1531).

Umer, M., Sadiq, S., Missen, M. M., Hameed, Z., Aslam, Z., Siddique, M. A., & Nappi, M. (2021). Scientific Papers Citation Analysis Using Textual Features and SMOTE Resampling Techniques. Pattern Recognition Letters, 250-257.

Xu, Z., Shen, D., Nie, T., & Kou, Y. (2020). A Hybrid Sampling Algorithm Combining M-Smote and Enn Based on Random Forest for Medical Imbalanced Data. ournal of Biomedical Informatics, 1-11.

Yu, S., Li, X., Zhang, X., & Wang, H. (2019). The OCS-SVM: An Objective-Cost-Sensitive Svm With Sample-Based Misclassification Cost Invariance. IEEE Access, 118931-118942.




DOI: https://doi.org/10.26760/mindjournal.v8i2.254-266

Refbacks

  • Saat ini tidak ada refbacks.


____________________________________________________________

ISSN (cetak) : 2338-8323   |  ISSN (elektronik) :   2528-0902 

diterbitkan oleh :

Informatika Institut Teknologi Nasional Bandung

Alamat : Gedung 2 Jl. PHH. Mustofa 23 Bandung 40124

Kontak : Tel. 7272215 (ext. 181)  Fax. 7202892

Email : mind.journal@itenas.ac.id

____________________________________________________________

Statistik Pengunjung :

  Flag Counter

  Web
Analytics Statistik Pengunjung

 Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License