Paper available here.
In this project, Artificial Neural Networks were trained to classify a range of musical instruments and a number of comparative experiments were conducted on how much different characteristics of the musical instruments would impact the resulting accuracy. These experiments were, using:
- The whole sample
- The attack of the sound
- Everything but the attack of the sound
- The initial 100 Hz of the frequency spectrum
- The following 900 Hz of the frequency spectrum
The dataset used in this project was the London Philharmonic Orchestra Dataset, consisting of recorded samples from 20 different musical instruments. For each instrument, the samples range over its entire set of tones played in every octave with different levels of strength (piano, forte) and length. In addition to that, the dataset also includes samples where different playing techniques are used with the instrument, such as vibrato, tremolo, pizzicato and ponticello.
In order to limit the scope of this project, the following eight instruments were selected to train the model: Banjo, Cello, Clarinet, English horn, Guitar, Oboe, Trumpet and Violin. This set of instruments was chosen because of the high quality of the samples and them ranging over the three instrument families Brass, String and Woodwind.
To avoid handling potential different harmonics in the same tone across the octaves, only the samples of recordings done in the fourth octave were used.
Index | Instruments | Samples |
---|---|---|
1 | Banjo | 23 |
2 | Cello | 166 |
3 | Clarinet | 131 |
4 | English Horn | 234 |
5 | Guitar | 29 |
6 | Oboe | 155 |
7 | Trumpet | 140 |
8 | Violin | 366 |
The model used for training was a Multilayer Perceptron with Early Stopping, using Resilient Back Propagation as the learning heuristic. The network consisted of:
- 50 inputs
- 30 hidden nodes
- One output node (with eight different outputs, one for each instrument)
The following table displays the results of the training, run over an average of 10 sessions.
Experiment | Accuracy |
---|---|
Base experiment | 93.5% |
Only Attack | 80.2% |
Without Attack | 73.2% |
First 100 Hz | 64.2% |
Following 900 Hz | 90.6% |
Below is a confusion Matrix from one of the training sessions, displaying an example of the accuracy for each class:
To get started, simply run neural_network.m
in MATLAB.
There, you can also select which dataset to train the network with by uncommenting the following:
load 'datasets/<dataset>.mat'
The available processed datasets are:
Dataset | Filename |
---|---|
Base experiment | 1-dataset_avgvalue_chunk50.mat |
Only Attack | 2-dataset_avgvalue_chunk50_only_attack.mat |
Without Attack | 3-dataset_avgvalue_chunk50_without_attack.mat |
First 100 Hz | 4-dataset_avgvalue_chunk50_100hz.mat |
Following 900 Hz | 5-dataset_avgvalue_chunk50_900hz.mat |