Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia
Sari
Interaksi manusia dengan komputer merupakan fenomena yang terus berkembang diikuti oleh meningkatnya penggunaan komputer yang sering digunakan dalam ranah sosial manusia. Manusia saling berinteraksi dengan melibatkan emosi untuk memahami seseorang. Emosi manusia seringkali terwakili melalui cara berbicara. Penelitian tentang pengenalan emosi melalui suara telah banyak dilakukan, namun terdapat upaya peningkatan pengenalan emosi melalui suara, terutama masalah korpus yang menjadi salah satu faktor yang menjadikan pengenalan emosi ini belum menghasilkan akurasi pengenalan yang optimal, khususnya berkaitan dengan imbalance data. Penelitian ini dilakukan untuk meningkatkan performa pengenalan emosi untuk mengenali lima kelas emosi yaitu senang, marah, sedih dan kepuasan serta netral menggunakan algoritma boosting. Selain itu, digunakan pula metode seperti CNN dan RNN untuk dapat dilakukan perbandingan serta penerapan SMOTE untuk korpusnya. Setelah eksperimen, dapat dihasilkan akurasi pengenalan mencapai 65% untuk akurasi untuk data tes berdasarkan konfigurasi 22050 Hz sebagai sampling rate, MFCCs dan oversampling SMOTE.
Kata kunci: Imbalance data, Algoritma Boosting, CNN, RNN, SMOTE
AbstractHuman interaction with computers are a growing phenomenon followed by the increasing use of computers which are often utilized in human social activities. Humans interact with one another by involving emotions. Plenty of research on speech emotion recognition has been established. Nevertheless, there are still efforts to enhance speech emotion recognition, especially the corpus problem which is one of the factors that the model does not in an optimal performance, especially about imbalance data. This study was conducted to enhance the performance of emotion recognition to recognize five class emotions: happiness, angry, sadness, contentment, and neutral. Furthermore, we employed CNN, RNN, and Boosting Algorithms. Lastly, we applied SMOTE to the corpus. After the experiment, the accuracy reached 65% with 22050 Hz configuration as rate, MFCCs, and SMOTE oversampling.
Keywords: Data Imbalance, Boosting Algorithms, CNN, RNN, SMOTE
Teks Lengkap:
PDFReferensi
Abdul Qayyum, A. B., Arefeen, A., & Shahnaz, C. (2019). Convolutional Neural Network (CNN) Based Speech-Emotion Recognition. 2019 IEEE International Conference on Signal Processing, Information, Communication Systems (SPICSCON), 122–125. https://doi.org/10.1109/SPICSCON48833.2019.9065172
Aouani, H., & Ayed, Y. B. (2018). Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 1–5. https://doi.org/10.1109/ATSIP.2018.8364518
Atmaja, B. T., & Akagi, M. (2019). Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model. 2019 IEEE International Conference on Signals and Systems (ICSigSys), 40–44. https://doi.org/10.1109/ICSIGSYS.2019.8811080
Ekman, P. (1992). An Argument for Basic Emotions. Cognition and Emotion, 6(3–4), 169–200. https://doi.org/10.1080/02699939208411068
Hadjadji, I., Falek, L., Demri, L., & Teffahi, H. (2019). Emotion recognition in Arabic speech. 2019 International Conference on Advanced Electrical Engineering (ICAEE), 1–5. https://doi.org/10.1109/ICAEE47123.2019.9014809
Hokking, R., Woraratpanya, K., & Kuroki, Y. (2016). Speech recognition of different sampling rates using fractal code descriptor. 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), 1–5. https://doi.org/10.1109/JCSSE.2016.7748895
Jurafsky, D., & Martin, J. H. (2013). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2/E. (Second). Pearson Education.
Kasyidi, F., & Lestari, D. P. (2018). Identification of four class emotion from Indonesian spoken language using acoustic and lexical features. Journal of Physics: Conference Series, 971, 012048. https://doi.org/10.1088/1742-6596/971/1/012048
Lasiman, J. J., & Puji Lestari, D. (2018). Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory. 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 40–43. https://doi.org/10.1109/IC3INA.2018.8629525
Lubis, N., Lestari, D., Purwarianti, A., Sakti, S., & Nakamura, S. (2014). Emotion recognition on Indonesian television talk shows. 2014 IEEE Spoken Language Technology Workshop (SLT), 466–471. https://doi.org/10.1109/SLT.2014.7078619
Rathpisey, H., & Adji, T. B. (2019). Handling Imbalance Issue in Hate Speech Classification using Sampling-based Methods. 2019 5th International Conference on Science in Information Technology (ICSITech), 193–198. https://doi.org/10.1109/ICSITech46713.2019.8987500
Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172. https://doi.org/10.1037/0033-295x.110.1.145
Sarakit, P., Theeramunkong, T., & Haruechaiyasak, C. (2015). Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm. 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 1–5. https://doi.org/10.1109/ICAICTA.2015.7335373
Tarunika, K., Pradeeba, R. B., & Aruna, P. (2018). Applying Machine Learning Techniques for Speech Emotion Recognition. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1–5. https://doi.org/10.1109/ICCCNT.2018.8494104
Tzirakis, P., Zhang, J., & Schuller, B. W. (2018). End-to-End Speech Emotion Recognition Using Deep Neural Networks. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5089–5093. https://doi.org/10.1109/ICASSP.2018.8462677
Umamaheswari, J., & Akila, A. (2019). An Enhanced Human Speech Emotion Recognition Using Hybrid of PRNN and KNN. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 177–183. https://doi.org/10.1109/COMITCon.2019.8862221
Winursito, A., Hidayat, R., & Bejo, A. (2018). Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. 2018 International Conference on Information and Communications Technology (ICOIACT), 379–383. https://doi.org/10.1109/ICOIACT.2018.8350748
Wu, Y., Mao, J., & Li, W. (2018). Predication of Futures Market by Using Boosting Algorithm. 2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 1–4. https://doi.org/10.1109/WiSPNET.2018.8538586
DOI: https://doi.org/10.26760/mindjournal.v6i2.194-204
Refbacks
- Saat ini tidak ada refbacks.
____________________________________________________________
ISSN (cetak) : 2338-8323 | ISSN (elektronik) : 2528-0902
diterbitkan oleh:
Informatika Institut Teknologi Nasional Bandung
Alamat : Gedung 2 Jl. PHH. Mustofa 23 Bandung 40124
Kontak : Tel. 7272215 (ext. 181)Â Fax. 7202892
Email : mind.journal@itenas.ac.id
____________________________________________________________
Statistik Pengunjung :
Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.