Methods of automatic recognition of The Beatles’ songs recordings based on their intoned parts using neural networks and time-frequency analysis
Master thesis, 2024
Author: Rafał Gorecki
Supervisor: Jakub Wagner
Abstract
The thesis „Methods of automatic recognition of The Beatles’ songs recordings based on their intoned parts using neural networks and time-frequency analysis” is an attempt to test the effectiveness of music processing tools based on Fourier’s and Huang-Hilbert’s transformations and convolutional neural networks in recognition of musical patterns (such as melodies or riffs) in non-original songs recordings. Due to the scale of the thesis and limited resources that could be used, the dataset is a subset of songs from The Beatles’ discography, which has been widely used in machine learning area because of its diversity, and availability of reference annotations of high quality.
Independent production of recordings we the basis for the preparation of two datasets of images that were later inserted to some artificial neural network architectures. The first set are spectrograms – plots of magnitudes in the function of time and frequency computed by using the Short-Time Fourier Transform (STFT) algorithm. The second are called holospectra, which are the outputs of the Empiric Mode Decomposition, which consists of the sifting operation and Huang-Hilbert Transform. The thesis is a comparison of the networks’ performance under depending of the kind of dataset given as a training resource. The architectures are a subset of pretrained and non-pretrained models chosen from the available list in popular Python library – PyTorch and some architectures implemented manually. The aim of the thesis was to find which dataset gives better results (and by how much) and, hence which algorithm of computation is more suitable for such solution as a classification of intoned melody parts’ visual representations. As it turned out, spectrograms dataset was the better one.
Successful results of suggested solution may lead to creation of a tool for songs classification of self-made recordings, which would be a continuation of Shazam’s conception. It would find a good use also in anti-plagiarism systems. On the other hand, if suitable songs library provided, the tool could be useful finding patterns (like riffs, solos or passages) similar to already existing. This may be considered as a help for professional musicians.
