Klasifikasi Cyberbullying Pada Tweet Bahasa Sunda Dengan Menggunakan Hybrid Learning Model

Anisa Putri Setyaningrum, Muhammad Fahmy Nadhif

Sari


ABSTRAK
Cyberbullying dalam bahasa Sunda semakin marak di media sosial, dengan kasus seperti penghinaan fisik, body shaming, dan ancaman yang dapat berdampak negatif pada korban. Namun, deteksi otomatis masih menghadapi tantangan, terutama dalam keterbatasan dataset dan efektivitas metode pemrosesan bahasa alami. Penelitian ini bertujuan untuk mengembangkan sistem deteksi cyberbullying bahasa Sunda menggunakan gabungan model stemming dan hybrid learning. Peneliti menerapkan beberapa model machine learning yaitu random forest dan Support Vector Machine (SVM) serta model deep learning yaitu convolutional neural network-bidirectional long short-term memory (CNN-BiLSTM), CNN, dan BiLSTM. Peneliti melakukan eksperimen untuk mengevaluasi kinerja masing-masing model dengan mengukur akurasi dan F1-score. Berdasarkan hasil penelitian, model hybrid learning memperoleh kinerja terbaik dengan akurasi sebesar 97,3% dan F1-score sebesar 97%. Selain itu, waktu pelatihan pada CNN-BiLSTM lebih cepat dibandingkan dengan model lainnya yaitu sekitar 30 detik per epoch.

Kata kunci: Bahasa Sunda, Cyberbullying, Hybrid Learning

ABSTRACT
Cyberbullying in the Sundanese language is becoming more common on social media, with cases like physical insults, body shaming, and threats that can seriously affect victims. However, detecting it automatically remains challenging, mainly due to limited datasets and the difficulty of processing the language effectively. This study aims to develop a Sundanese cyberbullying detection system using a combination of stemming and hybrid learning models. The researchers applied several machine learning models, namely random forest and Support Vector Machine (SVM), and deep learning models, namely convolutional neural network-bidirectional long short-term memory (CNN-BiLSTM), CNN, and BiLSTM. The researchers conducted experiments to evaluate the performance of each model by measuring the accuracy and F1-score. Based on the results, the hybrid learning model achieved the best performance, with an accuracy of 97.3% and an F1-score of 97%. Besides that, the training time on CNN-BiLSTM is faster than the others which is approximately 30 seconds per epoch.

Keywords: Sundanese, Cyberbullying, Hybrid Learning


Kata Kunci


Bahasa Sunda, Cyberbullying, Hybrid Learning

Teks Lengkap:

PDF

Referensi


J. VAN DIJCK, “Tracing X: The rise of a microblogging platform,” Int. J. Media Cult. Polit., vol. 7, no. 3, pp. 333–348, 2012, doi: 10.1386/macp.7.3.333_1.

H. Nurrahmi and D. Nurjanah, “Indonesian X Cyberbullying Detection using Text Classification and User Credibility,” 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, pp. 543–548, 2018, doi: 10.1109/ICOIACT.2018.8350758.

S. Bauman and M. L. Newman, “Testing assumptions about cyberbullying: Perceived distress associated with acts of conventional and cyber bullying,” Psychol. Violence, vol. 3, no. 1, pp. 27–38, 2013, doi: 10.1037/a0029867.

G. Gayatri, “Digital Citizenship Safety among Children and Adolescents in Indonesia.” https://web.kominfo.go.id/sites/default/files/users/12/Kominfo-Presentasi Laporan Hasil Penelitian - Gati Gayatri.pdf (accessed Apr. 10, 2023).

A. O. Anindryati and I. Mufidah, Gambaran Kondisi Vitalitas Bahasa Daerah di Indonesia. 2020.

Kemendikbud, “Indikator dan Cara Penanganan Kekerasan dan Bullying di Sekolah,” 2019. https://www.kemendikbud.go.id/main/blog/2019/07/indikator-dan-cara-penanganan-kekerasan-dan-bullying-di-sekolah (accessed Apr. 15, 2023).

R. I. Rafiq, H. Hosseinmardi, R. Han, Q. Lv, and S. Mishra, “Scalable and timely detection of cyberbullying in online social networks,” Proc. ACM Symp. Appl. Comput., pp. 1738–1747, 2018, doi: 10.1145/3167132.3167317.

M. Wongkar and A. Angdresey, “Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: X,” Proc. 2019 4th Int. Conf. Informatics Comput. ICIC 2019, pp. 1–5, 2019, doi: 10.1109/ICIC47613.2019.8985884.

M. Rhanoui, M. Mikram, S. Yousfi, and S. Barzali, “A CNN-BiLSTM Model for Document-Level Sentiment Analysis,” Mach. Learn. Knowl. Extr., vol. 1, no. 3, pp. 832–847, 2019, doi: 10.3390/make1030048.

W. Yue and L. Li, “Sentiment Analysis using Word2vec-CNN-BiLSTM Classification,” 2020 Seventh Int. Conf. Soc. Networks Anal. Manag. Secur., pp. 3–7, 2020, doi: 10.1109/SNAMS52053.2020.9336549.

E. M. Dharma, F. L. Gaol, H. L. H. S. Warnars, and B. Soewito, “the Accuracy Comparison Among Word2Vec, Glove, and Fasttext Towards Convolution Neural Network (Cnn) Text Classification,” J. Theor. Appl. Inf. Technol., vol. 100, no. 2, pp. 349–359, 2022.

H. Christian, M. P. Agus, and D. Suhartono, “Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF),” ComTech Comput. Math. Eng. Appl., vol. 7, no. 4, p. 285, 2016, doi: 10.21512/comtech.v7i4.3746.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Transactions of the Association for Computational Linguistics.,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 135–146, 2017, [Online]. Available: https://transacl.org/ojs/index.php/tacl/article/view/999.

A. Amalia, O. S. Sitompul, E. B. Nababan, and T. Mantoro, “An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification,” 2020 Int. Conf. Data Sci. Artif. Intell. Bus. Anal. DATABIA 2020 - Proc., pp. 69–75, 2020, doi: 10.1109/DATABIA50434.2020.9190447.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 427–431, 2017, doi: 10.18653/v1/e17-2068.

A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R News, vol. 2, no. 3, pp. 18–22, 2002.

K. S. Alam, S. Bhowmik, and P. R. K. Prosun, “Cyberbullying detection: An ensemble based machine learning approach,” Proc. 3rd Int. Conf. Intell. Commun. Technol. Virtual Mob. Networks, ICICV 2021, no. March, pp. 710–715, 2021, doi: 10.1109/ICICV50876.2021.9388499.

A. I. Kadhim, “Survey on supervised machine learning techniques for automatic text classification,” Artif. Intell. Rev., vol. 52, no. 1, pp. 273–292, 2019, doi: 10.1007/s10462-018-09677-1.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.

A. Shenfield and M. Howarth, “A novel deep learning model for the detection and identification of rolling element-bearing faults,” Sensors (Switzerland), vol. 20, no. 18, pp. 1–24, 2020, doi: 10.3390/s20185112.

M. Elgendy, “Human-in-the-Loop Machine Learning Version 1 MEAP Edition Manning Early Access Program Copyright 2019 Manning Publications,” 2019.




DOI: https://doi.org/10.26760/jrh.v9i1.58-69

Refbacks

  • Saat ini tidak ada refbacks.



Alamat redaksi dan tata usaha:

Lembaga Penelitian dan Pengabdian Masyarakat Institut Teknologi Nasional
Fakultas, gedung 14 Lantai 3
Jl. PHH. Mustapa 23 Bandung 40124
Tlp. 022-7272215 Pes. 159, Fax. 022-7202892,
e-mail: hrekayasa@itenas.ac.id


Terindeks:

  


STATISTIK PENGUNJUNG
Flag Counter
 

Lihat Statistik

Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License