Audio Conversion for Music Genre Classification Using Short-Time Fourier Transform and Inception V3
Abstract
This research examines the development of music genres and technological applications in music genre recognition through the MIR (Music Information Retrieval) approach. Automatic music genre labeling is expected to help, reduce, and suppress the role of humans in terms of music genre labeling. This research proposes the use of Mel Spectrogram as an audio representation in the frequency domain as well as Convolutional Neural Network (CNN), specifically the Inception V3 architecture. CNN was chosen for its ability to recognize complex and hierarchical patterns, which corresponds to the musical features represented in the spectrogram. Transfer learning techniques and fine-tuning of models trained on large datasets were applied, which allowed to improve accuracy. This study uses a dataset of 1000 audio files in .wav format, with each genre represented by 100 files, to evaluate the performance and effectiveness of the proposed method in the context of music genre classification.
Keywords
Full Text:
PDFReferences
Arti, Y. (2023). Face Spoofing Detection using Inception-v3 on RGB Modal and Depth Modal. Jurnal Ilmu Komputer dan Informasi, 47-57.
Arrofiqoh, H. (2018). Implementasi Metode Convolutional Neural Network Untuk Klasifikasi Tanaman Pada Citra Resolusi Tinggi. GEOMATIKA, 24(2), 61.
Bahuleyan, H. (2018). Music Genre Classification using Machine Learning Techniques. 3.
Boddapati, V. e. (2017). Classifying environmental sound using image recognition networks. Procedia Comput, 112.
Hao Meng, T. Y. (2019). Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network. IEEE Access, 7.
Kapa, M. R. (2022). Klasifikasi Citra Penyakit Leukemia Menggunakan Convolutional Neural Network Dengan Arsitektur Inception-V3. 129.
Minh Tuan Nguyen, W. W. (2022). Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram. Circuits, Systems, and Signal Processing.
Murni, A. (2023). Face Spoofing Detection using Inception-v3 on RGB Modal and Depth Modal. Jurnal Ilmu Komputer dan Informasi, 47-57.
Nicholson, C. (2019). A Beginner's Guide to LSTMs and Recurrent Neural Networks. Diambil kembali dari https://pathmind.com/wiki/lstm
Robert, L. (2024). Understanding the Mel Spectrogram. Diambil kembali dari Analytics Vidhya: https://medium.com/analytics-vidhya/understanding-the-melspectrogramfca2afa2ce53
S Vishnupriya, K. M. (2018). Automatic Music Genre Classification using Convolution Neural Network. 2018 International Conference on Computer Communication and Informatics (ICCCI).
Shu, M. (2019). Deep learning for image classification on very small datasets using transfer learning. Semantic Scholar, 3-4.
Stevens, S. S. (1937). A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, 8, 185–190.
Szegedy, V. I. (2016). Rethinking the Inception Architecture for Computer Vision.
DOI: https://doi.org/10.26760/elkomika.v13i1.%25p
Refbacks
- There are currently no refbacks.
_______________________________________________________________________________________________________________________
ISSN (print) : 2338-8323 | ISSN (electronic) : 2459-9638
Publisher:
Department of Electrical Engineering Institut Teknologi Nasional Bandung, Indonesia
Address: 20th Building Institut Teknologi Nasional Bandung PHH. Mustofa Street No. 23 Bandung 40124, Indonesia
Contact: +627272215 (ext. 206)
Email: jte.itenas@itenas.ac.id________________________________________________________________________________________________________________________
Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.