Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia

FATAN KASYIDI; RIDWAN ILYAS; NIDA MUTHI ANNISA

doi:10.26760/mindjournal.v6i2.194-204

Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia

FATAN KASYIDI, RIDWAN ILYAS, NIDA MUTHI ANNISA

Sari

Abstrak

Interaksi manusia dengan komputer merupakan fenomena yang terus berkembang diikuti oleh meningkatnya penggunaan komputer yang sering digunakan dalam ranah sosial manusia. Manusia saling berinteraksi dengan melibatkan emosi untuk memahami seseorang. Emosi manusia seringkali terwakili melalui cara berbicara. Penelitian tentang pengenalan emosi melalui suara telah banyak dilakukan, namun terdapat upaya peningkatan pengenalan emosi melalui suara, terutama masalah korpus yang menjadi salah satu faktor yang menjadikan pengenalan emosi ini belum menghasilkan akurasi pengenalan yang optimal, khususnya berkaitan dengan imbalance data. Penelitian ini dilakukan untuk meningkatkan performa pengenalan emosi untuk mengenali lima kelas emosi yaitu senang, marah, sedih dan kepuasan serta netral menggunakan algoritma boosting. Selain itu, digunakan pula metode seperti CNN dan RNN untuk dapat dilakukan perbandingan serta penerapan SMOTE untuk korpusnya. Setelah eksperimen, dapat dihasilkan akurasi pengenalan mencapai 65% untuk akurasi untuk data tes berdasarkan konfigurasi 22050 Hz sebagai sampling rate, MFCCs dan oversampling SMOTE.

Kata kunci: Imbalance data, Algoritma Boosting, CNN, RNN, SMOTE

Abstract

Human interaction with computers are a growing phenomenon followed by the increasing use of computers which are often utilized in human social activities. Humans interact with one another by involving emotions. Plenty of research on speech emotion recognition has been established. Nevertheless, there are still efforts to enhance speech emotion recognition, especially the corpus problem which is one of the factors that the model does not in an optimal performance, especially about imbalance data. This study was conducted to enhance the performance of emotion recognition to recognize five class emotions: happiness, angry, sadness, contentment, and neutral. Furthermore, we employed CNN, RNN, and Boosting Algorithms. Lastly, we applied SMOTE to the corpus. After the experiment, the accuracy reached 65% with 22050 Hz configuration as rate, MFCCs, and SMOTE oversampling.

Keywords: Data Imbalance, Boosting Algorithms, CNN, RNN, SMOTE

Teks Lengkap:

PDF

Referensi

Abdul Qayyum, A. B., Arefeen, A., & Shahnaz, C. (2019). Convolutional Neural Network (CNN) Based Speech-Emotion Recognition. 2019 IEEE International Conference on Signal Processing, Information, Communication Systems (SPICSCON), 122â€“125. https://doi.org/10.1109/SPICSCON48833.2019.9065172

Aouani, H., & Ayed, Y. B. (2018). Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 1â€“5. https://doi.org/10.1109/ATSIP.2018.8364518

Atmaja, B. T., & Akagi, M. (2019). Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model. 2019 IEEE International Conference on Signals and Systems (ICSigSys), 40â€“44. https://doi.org/10.1109/ICSIGSYS.2019.8811080

Ekman, P. (1992). An Argument for Basic Emotions. Cognition and Emotion, 6(3â€“4), 169â€“200. https://doi.org/10.1080/02699939208411068

Hadjadji, I., Falek, L., Demri, L., & Teffahi, H. (2019). Emotion recognition in Arabic speech. 2019 International Conference on Advanced Electrical Engineering (ICAEE), 1â€“5. https://doi.org/10.1109/ICAEE47123.2019.9014809

Hokking, R., Woraratpanya, K., & Kuroki, Y. (2016). Speech recognition of different sampling rates using fractal code descriptor. 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), 1â€“5. https://doi.org/10.1109/JCSSE.2016.7748895

Jurafsky, D., & Martin, J. H. (2013). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2/E. (Second). Pearson Education.

Kasyidi, F., & Lestari, D. P. (2018). Identification of four class emotion from Indonesian spoken language using acoustic and lexical features. Journal of Physics: Conference Series, 971, 012048. https://doi.org/10.1088/1742-6596/971/1/012048

Lasiman, J. J., & Puji Lestari, D. (2018). Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory. 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 40â€“43. https://doi.org/10.1109/IC3INA.2018.8629525

Lubis, N., Lestari, D., Purwarianti, A., Sakti, S., & Nakamura, S. (2014). Emotion recognition on Indonesian television talk shows. 2014 IEEE Spoken Language Technology Workshop (SLT), 466â€“471. https://doi.org/10.1109/SLT.2014.7078619

Rathpisey, H., & Adji, T. B. (2019). Handling Imbalance Issue in Hate Speech Classification using Sampling-based Methods. 2019 5th International Conference on Science in Information Technology (ICSITech), 193â€“198. https://doi.org/10.1109/ICSITech46713.2019.8987500

Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145â€“172. https://doi.org/10.1037/0033-295x.110.1.145

Sarakit, P., Theeramunkong, T., & Haruechaiyasak, C. (2015). Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm. 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 1â€“5. https://doi.org/10.1109/ICAICTA.2015.7335373

Tarunika, K., Pradeeba, R. B., & Aruna, P. (2018). Applying Machine Learning Techniques for Speech Emotion Recognition. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1â€“5. https://doi.org/10.1109/ICCCNT.2018.8494104

Tzirakis, P., Zhang, J., & Schuller, B. W. (2018). End-to-End Speech Emotion Recognition Using Deep Neural Networks. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5089â€“5093. https://doi.org/10.1109/ICASSP.2018.8462677

Umamaheswari, J., & Akila, A. (2019). An Enhanced Human Speech Emotion Recognition Using Hybrid of PRNN and KNN. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 177â€“183. https://doi.org/10.1109/COMITCon.2019.8862221

Winursito, A., Hidayat, R., & Bejo, A. (2018). Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. 2018 International Conference on Information and Communications Technology (ICOIACT), 379â€“383. https://doi.org/10.1109/ICOIACT.2018.8350748

Wu, Y., Mao, J., & Li, W. (2018). Predication of Futures Market by Using Boosting Algorithm. 2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 1â€“4. https://doi.org/10.1109/WiSPNET.2018.8538586

DOI: https://doi.org/10.26760/mindjournal.v6i2.194-204