This is the code used for benchmarking different feature sets, including musif. Please, cite us:
Simonetta F., Llorens A., Serrano M., García-Portugués E., Torrente A., "Optimizing Feature Extraction for Symbolic Music", ISMIR 2023.
- Python 3.10 (e.g. via conda or pyenv)
pdm, you barely have three options:pipx install pdm(need pipx, recommended)pip install pdm(environment specific)- see https://pdm.fming.dev/latest/ for other alternatives
pdm syncto create the environment and install python packages- Alternatively to
pdm, seecluster.mdfor bare venv approach - MuseScore: download AppImage (4.0.1 has a bug, use 3.6.2, instead)
- Java: install using you OS package manager and check that the
javacommand is available in the PATH - jSymbolic 2.2: download and unzip
- GCC and make: install using your OS package manager
humdrum:git submodule updatecd humdrum-toolsmake updatemake
In symbolic_features/settings.py set the paths to MuseScore and jSymbolic executables.
Download the following datasets and set the paths to the root of each one in symbolic_features/settings.py
- Josquin - La Rue
- ASAP
- Didone
- EWLD
- String quartets:
- Haydn
- Mozart
- Beethoven
- unzip the above three zips into one directory, e.g.:
quartets/haydn,quartets/mozart,quartets/beethoven
Fix invalid file names: pdm fix_names. This will fix names containing , and ;
that cause errors in csv files.
Convert any file to MIDI: pdm convert2midi. You will need to run Xvfb :99 & export DISPLAY=:99 if you are running without display (e.g. in a remote ssh session)
Reproduce experiments: ./extract_all.sh
Detailed commands:
jSymbolic:pdm extract --jsymbolic --extension .midmusif:
pdm extract --musif --extension .midpdm extract --musif --extension .xmlpdm extract --musif --extension .krn
music21:
pdm extract --music21 --extension .midpdm extract --music21 --extension .xmlpdm extract --music21 --extension .krn
Reproduce experiments: pdm validation
Detailed commands
pdm classification: run all experiments with original featurespdm classification --use_first_10_pc: run all experiments with first 10 Principal Components from each task (where a task is a combination of dataset, feature set, and extension)pdm plot: plot the AutoML optimization score across timepdm classification --featureset='music21' --dataset='EWLD' --extension='mid' --automl_time=60: run an experiment on a single task for 60 seconds