HiVAD : A Voice Activity Detection Application Based on Deep Learning
Sari
ABSTRAK
Dalam tulisan ini, deteksi aktivitas suara disajikan pada smartphone secara realtime dengan jaringan saraf konvolusional. Pengurangan waktu komputasi adalah masalah dari studi sebelumnya. Meskipun telah menggunakan pendekatan machine learning, masih banyak kekurangan dari penelitian sebelumnya. Citra sinyal suara dihasikan oleh spektrogram energi log-mel, kemudian citra sinyal suara diinputkan ke dalam deep learning CNN untuk mengklasifikasi suara manusia dan derau. HiVAD mengungguli persentase metode VAD lainnya yaitu G729B, Sohn, dan RF dari hasil tes yang ditunjukkan dengan akurasi rata-rata SHR sebesar 15,89%, 28,98%, 42,13% pada tingkat 0dB, 8,67%, 16,29%, 17,63% pada tingkat 5 dB, dan 1,35%, 7,72%, 5,14% pada tingkat 10 dB. Selain itu, mekanisme Multi-threading memungkinkan komputasi yang efisien untuk waktu secara realtime. Penelitian ini menunjukkan bahwa arsitektur CNN pada HiVAD secara signifikan meningkatkan akurasi deteksi aktivitas suara.
Kata kunci: aplikasi VAD, voice detection, deep learning, CNN
Â
ABSTRACT
In this paper, the detection of sound activity is presented on smartphones in realtime with convolutional neural networks. Reduced computing time is a problem from previous studies. Despite the use of machine learning approaches, there are still many shortcomings from previous research. A log-mel energy spectrogram narrates the sound signal image. Then the sound signal image is inputted into CNN's deep learning to classify the human voice and noise. HiVAD outperformed the percentage of other VAD methods, namely G729B, Sohn, and RF from the test results shown with an average SHR accuracy of 15.89%, 28.98%, 42.13% at 0dB, 8.67%, 16.29% ,17.63% at 5 dB, and 1.35%, 7.72%, 5.14% at 10 dB. In addition, the Multi-threading mechanism enables efficient computing for real-time. This study shows that CNN's architecture on HiVAD significantly improves the accuracy of sound activity detection.
Keywords: VAD App, voice detection, deep learning, CNN
Kata Kunci
Teks Lengkap:
PDF (English)Referensi
Brookes, M. (2019). VOICEBOX: Speech Processing Toolbox for MATLAB. Retrieved from http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Chandra, A. (2018). Voice Activity Detection Sederhana Menggunakan Python. Retrieved from https://medium.com/warung-pintar/membuat-voice-activity-detection-menggunakanpython-d13763ea277f#
Dong, E., Liu, G., Zhou, Y., & Zhang, X. (2002). Applying support vector machines to voice activity detection. International Conference on Signal Processing Proceedings, ICSP, (pp. 1124–1127).
Jo, Q. H., Chang, J. H., Shin, J. W., & Kim, N. S. (2009). Statistical model-based voice activity detection using support vector machine. IET Signal Processing, 3(3), 205–210.
Kehtarnavaz, N., Sehgal, A., Parris, S., & Azarang, A. (2020). Smartphone-based real-time digital signal processing: Third edition. In Synthesis Lectures on Signal Processing (Vol. 11, Issue 2).
Kingma, D. P., & Jimmy, B. (2014). Adam: A Method for Stochastic Optimization. Retrieved from https://arxiv.org/abs/1412.6980
Krizhevsky, A., Sutskever, I., & E. Hinton, G. (2017). ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60(6), 1–1432.
Mathworks. (2017). G.729 Voice Activity Detection—MATLAB & Simulink. Retrieved from https://www.mathworks.com/help/dsp/examples/g-729-voice-activity-detection.html
Mesaros, Annamaria, Heittola, Toni, & Virtanen, T. (2017). TUT Acoustic scenes 2017. Zenodo. Retrieved from https://zenodo.org/record/400515#.YI0uhbUzbIU
Michaeltyson. (2017). TPCircularBuffer. Retrieved from https://github.com/michaeltyson/TPCircularBuffer
Obuchi. (2016). Framewise speech-nonspeech classification by neural networks for voice activity detection with statistical noise suppression. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2016, (pp. 5715–5719).
RamÃrez, J., Yélamos, P., Górriz, J. M., Segura, J. C., & GarcÃa, L. (2006). Speech/non-speech discrimination combining advanced feature extraction and SVM learning. International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, (pp. 1662–1665).
Rishi, S. (2019). Audio Classification Using CNN — An Experiment. Retrieved from https://medium.com/x8-the-ai-community/audio-classification-using-cnn-codingexample-f9cbd272269e
Saki, F., & Kehtarnavaz, N. (2016). Automatic switching between noise classification and speech enhancement for hearing aid devices. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2016-Octob, (pp. 736–739).
Sehgal, A., & Kehtarnavaz, N. (2018). A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection. IEEE Access, 6, 9017–9026.
Superpowered. (2019). Superpowered. Android Audio SDK, Low Latency, Cross Platform, Free. Retrieved from https://superpowered.com/
Thad, H., & Keir, M. (2013). Recurrent neural networks for voice activity detection. EEE International Conference on Acoustics, Speech and Signal Processing 2013, (pp. 7378–7382).
Thomas, S., Sriram, G., George, S., & Hagen, S. (2014). Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , (pp. 2538–2542).
Yang, X., Tan, B., Ding, J., Zhang, J., & Gong, J. (2010). Comparative study on voice activity detection algorithm. Proceedings - International Conference on Electrical and Control Engineering, ICECE 2010, (pp. 599–602).
Zhang, X. L., & Wu, J. (2013). Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech and Language Processing, 21(4), 697–710.
Zohar, J., César, S., Jason, F., Yuxin, P., Hereman, N., & Adhish, T. (2018). Free Spoken Digit Dataset (FSDD). Retrieved from https://www.kaggle.com/joserzapata/free-spokendigit-dataset-fsdd.
DOI: https://doi.org/10.26760/elkomika.v9i4.856
Refbacks
- Saat ini tidak ada refbacks.
_______________________________________________________________________________________________________________________
ISSN (cetak) : 2338-8323 | ISSN (elektronik) : 2459-9638
diterbitkan oleh :
Teknik Elektro Institut Teknologi Nasional Bandung
Alamat : Gedung 20 Jl. PHH. Mustofa 23 Bandung 40124
Kontak : Tel. 7272215 (ext. 206) Fax. 7202892
Surat Elektronik : jte.itenas@itenas.ac.id________________________________________________________________________________________________________________________
Statistik Pengunjung
Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.