fast structure-informed positional encoding for music generation
📔 Paper: PDF
This is the companion website to our ICASSP 2025 submission F-StrIPE: Fast Structure-informed Positional Encoding for Music Generation.
We display samples for the setting: train on 16 bars + test on 16 bars. We provide samples for four methods: X-StrIPE, SPE, F-StrIPE:C and F-StrIPE:SFF:C. Among these methods, F-StrIPE:C and X-StrIPE are methods developed by us, while SPE and F-StrIPE:SFF:C are baselines (please refer to the paper for a full description).
For each method, we give the sample with the best score on the four metrics we reported in the paper:
- Chroma Similarity (best = highest)
- Grooving Similarity (best = highest)
- Note Density Distance (best = lowest)
- Self-Similarity Matrix Distance (best = lowest)
We render each MIDI file into audio with muspy
. Note that, the samples played for a given metric can be different for each method, but we preferred to do so to always play the best example for each method and to limit the number of audio examples on the demo page. We also plot the melody tracks (in red) and the accompaniment track (in blue) as a pianoroll and display them below with the audio. We additionally also provide the melody-only audio, so you can compare what the added accompaniment contributes to the song.
We hope that using several metrics will give you a holistic way to assess our methods against the baselines. In particular, we observe that chroma similarity and grooving similarity demonstrate the difference between our methods and the baselines very effectively!
▶️ You can download the MIDI files here.
At this link, you can also find samples for the setting: train on 16 bars + test on 64 bars.
⚠️ This demo webpage is best viewed on a desktop! You might need to zoom in a little to see the details of the pianoroll plots.
Chroma Similarity
SPE
F-StrIPE:SFF:C
X-StrIPE
F-StrIPE:C
Grooving Similarity
SPE
F-StrIPE:SFF:C
X-StrIPE
F-StrIPE:C
Note Density Distance
SPE
F-StrIPE:SFF:C
X-StrIPE
F-StrIPE:C
Self-Similarity Matrix Distance
Quick side-note: It’s not a bug, it’s a feature. This metric measures the mean square error between the self-similarity matrix of the target and the self-similarity matrix of the prediction. The self-similarity matrices themselves are calculated using cosine similarity. Here, we come across a case from the testing set where the target contains no accompaniment! This leads to a perfect SSMD score when the network predicts the accompaniment track to be empty. This is why the pianorolls of the predictions below do not contain any accompaniment.