Auto-Dynamics

Abstract

In the past decade, there have been large advancements in the field of music processing due to the application of neural networks to classic tasks such as music generation, transcription, and classification. This work concentrates rather on generating musical interpretation, which is a set of custom changes the performer may add to a piece of music during performance to convey personal expression. Specifically, we focus on auto-generating dynamics, or custom volumes, for a given series of notes. We evaluate our dynamics model in five categories and demonstrate that the Transformer architecture works effectively in generating artistically expressive performances. These findings suggest that the Transformer has potential to excel in subjective areas such as artistic style.

Examples

Title of Piece	No Dynamics	Generated Dynamics
Chopin: Nocturne Op. 27 No. 1 in C-sharp minor
Liszt: Concert Etude No. 2, "Gnomenreigen," S. 145
Debussy: Estampes No. 3, "Jardins sous la Pluie," L. 100
Beethoven: Sonata No. 3 in C Major, Op. 2, I. Allegro con brio
Bach: Prelude and Fugue in B-flat Minor, BWV 891

Some examples of score-generated midi files from the internet without expressive timing or pedal:

Title	Original	Generated Dynamics
Happy Birthday Variations (Arr. by Jonny May on Musescore)
Yiruma: River Flows in You

Method

This project uses an encoder-decoder Transformer model to generate musical dynamics given a MIDI file. MIDI events are first converted into a time-by-pitch matrix, which is used as encoder input and then decoded into velocity generations.

Melody Recognition

An excerpt from Chopin’s Nocturne Op. 27 No. 1 in c-sharp minor with dynamics as generated by the base model.

In this excerpt, the model successfully differentiates between the melody line and the accompaniment. The melody can be seen as the set of horizontal lines highlighted blue in the top third of the image, and the accompaniment consists of the notes below the melody. The line of vertical bars at the bottom of the page correspond to the volumes of the notes, with each note having a matching vertical bar.

The generated velocities for this sample are mostly in the quiet 40-50 range, except for the melody line, which is brought out clearly and loudly (see the smattering of taller blue bars at the bottom of the page). This indicates a successful attempt at recognizing and prioritizing the melody.

Voicing

An excerpt from Rachmaninoff’s Elegie Op. 3 No. 1, with dynamics as generated by the base model.

In piano music, chords (three or more notes played at the same time) are generally “voiced,” meaning the most melodically important note within the chord is played louder than the others. Generally, the most important note is the top note.

The excerpt has three sections highlighted in different colors for convenience: the accompaniment is depicted in red, with a line of chords above it in green and blue. The blue depicts the most melodically important note, and the green depicts the rest of the chord which helps harmonize the melody.

The model distinguishes accurately between the three and treats them appropriately. The blue melody line has the tallest bars, indicating that it is correctly “voiced.” The accompaniment is represented by short red bars at the bottom, indicating that it is evenly quiet despite the large range of different pitches. The green bars are generally taller than the red and shorter than the blue.

Phrasing

An excerpt from Mozart’s Sonata No. 15 in F Major, K533, with dynamics as generated by the base model.

Musical “phrasing” refers to how a sequence of notes may have different individual volumes to convey emotion and expression. A general rule of phrasing is that higher notes are played louder. In fact, “Director Musices” project [15] mentioned in the Related Work section had this rule as the first of twenty-four. In this excerpt, the mountain-shaped notes in the middle are successfully accompanied by slightly taller bars at the bottom.

Another rule of phrasing is that the same sequence of notes should not be repeated the same way twice. The notes highlighted in blue correspond with the notes highlighted in green, and the model successfully differentiates them by making the second set softer (the green bars are shorter than the blue bars)

Harmonic Understanding

An excerpt from Chopin’s Nocturne Op. 27 No. 2 in D-flat Major, with dynamics as generated by the base model.

As mentioned above, a general rule of phrasing is that higher notes are played louder. However, performers often choose to override this rule if it conflicts with other rules.

In this excerpt, the highest note of the melody (a Bb5) should theoretically be played the loudest. However, there is a conflicting harmonic rule which dictates that it should be played softer: everything before this note is firmly within the D-flat Major tonic chord, and this note not only ventures outside of the tonic, but is also in the minor key.

The model successfully overrides the “higher notes are louder” rule to play this note softly, which demonstrates that it has some level of harmonic understanding. It is likely that this is not a fluke, since the previous melody notes (all highlighted in blue) follow the “higher notes are louder” rule.

Polyphonic Music and Counterpoint

An excerpt from Bach’s Prelude and Fugue in B Major, WTC I, BWV 868, with dynamics as generated by the base model.

Polyphonic music is defined as music with multiple independent melodic lines played at once. The model is successfully able to distinguish between the different melodic lines by making certain melodies softer, as seen in the excerpt above (the line highlighted in blue is notably softer than the others).

However, some polyphonic compositions such as the fugue are based on a repeated motif that the melodic lines take turns performing. It is common practice within these compositions to emphasize the motif whenever it appears.

In fact, the notes highlighted in blue above represent an iteration of its fugue’s motif. Unfortunately, the model did not successfully recognize this passage as a motif, possibly due to either lack of context or lack of training data. This is a possible area for improvement.

If useful, please cite with the following:

BibTeX


        @article{grace4x2024autodynamics,
          title={Auto-Dynamics: Transformer-Generated Interpretations of Piano Music},
          author={Grace Xu},
          year={2024}
        }