Audio Conversion for Music Genre Classification Using Short-Time Fourier Transform and Inception V3

DEWI ROSMALA, MOHAMMAD NOER FADHILAH

Abstract


This research examines the development of music genres and technological applications in music genre recognition through the MIR (Music Information Retrieval) approach. Automatic music genre labeling is expected to help, reduce, and suppress the role of humans in terms of music genre labeling. This research proposes the use of Mel Spectrogram as an audio representation in the frequency domain as well as Convolutional Neural Network (CNN), specifically the Inception V3 architecture. CNN was chosen for its ability to recognize complex and hierarchical patterns, which corresponds to the musical features represented in the spectrogram. Transfer learning techniques and fine-tuning of models trained on large datasets were applied, which allowed to improve accuracy. This study uses a dataset of 1000 audio files in .wav format, with each genre represented by 100 files, to evaluate the performance and effectiveness of the proposed method in the context of music genre classification.


Keywords


Mel Spectrogram; CNN Inception V3

Full Text:

PDF

References


Arti, Y. (2023). Face Spoofing Detection using Inception-v3 on RGB Modal and Depth Modal. Jurnal Ilmu Komputer dan Informasi, 47-57.

Arrofiqoh, H. (2018). Implementasi Metode Convolutional Neural Network Untuk Klasifikasi Tanaman Pada Citra Resolusi Tinggi. GEOMATIKA, 24(2), 61.

Bahuleyan, H. (2018). Music Genre Classification using Machine Learning Techniques. 3.

Boddapati, V. e. (2017). Classifying environmental sound using image recognition networks. Procedia Comput, 112.

Hao Meng, T. Y. (2019). Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network. IEEE Access, 7.

Kapa, M. R. (2022). Klasifikasi Citra Penyakit Leukemia Menggunakan Convolutional Neural Network Dengan Arsitektur Inception-V3. 129.

Minh Tuan Nguyen, W. W. (2022). Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram. Circuits, Systems, and Signal Processing.

Murni, A. (2023). Face Spoofing Detection using Inception-v3 on RGB Modal and Depth Modal. Jurnal Ilmu Komputer dan Informasi, 47-57.

Nicholson, C. (2019). A Beginner's Guide to LSTMs and Recurrent Neural Networks. Diambil kembali dari https://pathmind.com/wiki/lstm

Robert, L. (2024). Understanding the Mel Spectrogram. Diambil kembali dari Analytics Vidhya: https://medium.com/analytics-vidhya/understanding-the-melspectrogramfca2afa2ce53

S Vishnupriya, K. M. (2018). Automatic Music Genre Classification using Convolution Neural Network. 2018 International Conference on Computer Communication and Informatics (ICCCI).

Shu, M. (2019). Deep learning for image classification on very small datasets using transfer learning. Semantic Scholar, 3-4.

Stevens, S. S. (1937). A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, 8, 185–190.

Szegedy, V. I. (2016). Rethinking the Inception Architecture for Computer Vision.




DOI: https://doi.org/10.26760/elkomika.v13i1.%25p

Refbacks

  • There are currently no refbacks.


 

_______________________________________________________________________________________________________________________

ISSN (print) : 2338-8323 | ISSN (electronic) : 2459-9638

Publisher:

Department of Electrical Engineering Institut Teknologi Nasional Bandung, Indonesia

Address: 20th Building  Institut Teknologi Nasional Bandung PHH. Mustofa Street No. 23 Bandung 40124, Indonesia

Contact: +627272215 (ext. 206)

Email: jte.itenas@itenas.ac.id________________________________________________________________________________________________________________________


Free counters!

Web

Analytics Made Easy - StatCounter

Statistic Journal

Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License