Multitrack Contrapuntal Music Archive (MCMA)¶
About¶
We present Multitrack Contrapuntal Music Archive (MCMA), a symbolic dataset of pieces specifically collated and edited to comprise, for any given polyphonic work, independent parts.
MCMA is geared towards musicological tasks such as (computational) analysis or education, as it brings to the fore contrapuntal interactions by explicit and independent representation. Empirically analyzing counterpoint music is nearly impossible without voice separation (e.g., by means of “exploding” a part into its constituent voices). Using MCMA, each part can be analyzed separately, which offers a unique perspective for investigating voice leading in counterpoint. A musicologist would be able to quickly find examples of retrograde inversions or augmentations in counterpoint melodic lines, or find statistically valid answers for their questions: for example, what is the most common interval to use for parallel contrapuntal motion? Is contrary motion preferred to other types of motion?
Below, an example of multitrack expansion: the first three bars of J.S. Bach’s Fugue No.3 in C# Major BWV 872, from the Well-Tempered Clavier, Book II, (top) against the same extract in MCMA’s corpus (bottom).
Original
Multitrack
Moreover, MCMA can be useful in the context of generative systems and machine learning. For example, the relation between two parts in a contrapuntal piece becomes paramount if one wants to employ natural language processing models, such as attention networks, and treats one part as the musical response to a “query”. Below, an example of this approach: the first two bars of J.S.Bach’s first two-part inventions.
Query-Answer
FAIR¶
MCMA is FAIR-compliant. It has an extensive metadata protocol (F2), which is described in detail in the Metadata section of the Guidelines for Contributors. The metadata is registered in a comma separated values file (F4, A1, A1.1): each music work is assigned a globally unique and persistent identifier (F1) and is richly described with a plurality of accurate and relevant attributes (R1).
Contribution Call¶
We open MCMA to contributions, in the hope that MCMA can continue to grow beyond our efforts. Please see the Guidelines for Contributors section.
If you wish to simply use MCMA without contributing, we provide instructions on how to download the current dataset as .zip file.
How to Cite¶
Aljanaki A., Kalonaris S., Micchi G. & Nichols E. (2021) MCMA: A Symbolic Multitrack Contrapuntal Music Archive. Empirical Musicology Review. Forthcoming.
Download MCMA¶
There are two options to download the dataset.
Zip File¶
Download the latest zip file here: mcma-master.zip
Clone Gitlab Repo¶
Clone the MCMA gitlab repo. This is recommended for those who want to contribute to the project.
git clone https://gitlab.com/skalo/mcma.git
Collections in MCMA¶
The following table gives the number of pieces from each collection that is currently included in MCMA.
Composer | Collection | Number of Pieces |
---|---|---|
Albinoni | ||
Sonate da chiesa Op. 4 | 24 | |
Trattenimenti armonici Op. 6 | 48 | |
Subtotal | 72 | |
Bach | ||
Goldberg Variations | 31 | |
Inventions | 15 | |
Kunst der Fuge | 21 | |
Sinfonias | 15 | |
The Well-Tempered Clavier, Book I | 48 | |
The Well-Tempered Clavier, Book II | 48 | |
Subtotal | 178 | |
Becker | ||
Sonatas and Suites | 45 | |
Subtotal | 45 | |
Buxtehude | ||
Trio Sonatas Op.1 | 45 | |
Trio Sonatas Op.2 | 43 | |
Subtotal | 88 | |
Lully | ||
Persée | 92 | |
Subtotal | 92 | |
Grand Total | 475 |
Guidelines for Contributors¶
Please read the following guidelines carefully.
- Pieces/works/collections MUST be contrapuntal music. Conversely, homophonic pieces are not to be considered.
- Preference must be given to complete works and/or collections.
- When choosing works/collections, you are encouraged to use already existing open data as a base for your submissions but check the copyright notice of the files you start with to make sure they are freely shareable.
- While priority is given to works that are not currently represented in MCMA, we do accept different versions of pieces already collected in the database. These versions should differ either for the reference text or, more importantly, for a different choice in multi-track voice separation.
- Add a metadata.csv for each work/collection. Please see the Metadata section, below, for a detailed description.
- Add one metadata record for every file.
- Every file should be saved with the same name as the ID of the metadata.
- The name of the work/collection folder should directly based on the content of your metadata’s column “Collection”. For example, if “Collection” is “Sonate da chiesa Op. 4”, then your folder will be named “sonate_da_chiesa_op_4”. Use the following convention: all lower case, all delimiters and punctuation marks become an underscore, except for the hyphen (hyphen it is).
- All files must be saved in compressed musicXML format (mxl). Unlike MIDI, mxl supports correct pitch spelling (i.e. G# is not the same as Ab) and that’s important information to have.
- Always reference to an edition that you consider reliable, without edits. This is true for obvious matters such as notes and time signatures but also for more subtle ones.
- Inside each file, every part should have its separate track. Typically, every part would be monophonic, but occasional polyphony might happen and should be kept. For example, in the specific case of JS Bach’s Well Tempered Clavier, if a Fugue is considered to be in 4 voices keep only 4 parts (tracks). Intrapart polyphony can be added as voice expansion or as voice addition, depending on your subjective judgement, but always foregrounding musicological concerns.
- All ornaments should be written in a symbolic way. It’s preferable to omit an ornament than to have it written explicitly.
- All repeats should be written with repeat bars. No repeat should be expanded.
- Performance cues, such as rapidly varying metronome marks as used to indicate ritardando, should be removed.
- At the moment, we do not aim to faithfully represent Figured Bass notations, dynamics, slurs, or tempos. But certainly you shouldn’t throw this information away if you have it already, as our scope might expand in the future.
Metadata¶
The fields of MCMA’s metadata protocol are the following:
- ID: this should univocally describe the work, using some of the subsequent keys, in the following format: Last Name-First Name (capital initials only)-Catalog Number and a selection/combination of compositional style (if explicitly stated in the Title, but abbreviated if possible, e.g., P for Prelude) and/or Movement Number that best describes the work. For example, BachJS-BWV870-1. Since the ID to every record of MCMA should be unique, the string “-An” should be appended to the ID of alternative versions of a piece already represented in the dataset. Here, n is an integer taking the first available value from 1 upwards, example “-A2” if the piece is already represented in MCMA by a reference and a first alternative version.
- Last Name: the composer’s surname.
- First Name: the composer’s forename and middle name(s).
- Title: the full work’s name, e.g., “Allemande from Cello Suite No. 1 in G Major, BWV 1007”.
- Collection: the name of the collection/work the piece belongs to, if it exists. For example, “The Well-Tempered Clavier”.
- Catalog Number: scholarly catalog number, e.g., BWV 1007. In case of conflicting naming, please add a clarification in the Notes entry.
- Movement Number: an integer. In case of a theme/aria in a theme & variation_, or for the prologue in an opera, use 0.
- Number of Tracks: an integer. This will correspond to the number of tracks in the mxl file.
- Instruments: series of instruments names (top to bottom) using a semicolon as delimiter (e.g., violin;viola;cello). If all parts are the same instrument, one instance of it is sufficient.
- Year: the year the work was composed, if known. Note: if only the parent Collection’s dates are known, and if they spanned several years, this will be the year of completion.
- Provenance: name or URL of the dataset/database containing the original file used as basis for the contrapuntal multitrack expansion.
- Reference Edition: self-explanatory, e.g., Universal Edition.
- Link to Reference Edition: the URL of an accessible version of the reference edition.
- Comments: anything else.
MCMA Utilities¶
Metadata Creation¶
The python utility metadata_writer
can be used to create the metadata file. To do so, update the
basic_info
inside metadata_writer.py
with the required information, and follow the instructions
at the top of that file to generate a metadata.csv
file. This requires basic knowledge of python and
running python scripts.
File Cleanup/Normalization Utility¶
Once a set of files is prepared in a directory along with its corresponding metadata.csv
file,
another script, normalize_instruments.py
, should be run to convert the MusicXML files
into a standard format used by the MCMA corpus. This updates the MIDI sounds used in the files,
adjusts volume and panning, and cleans up the title/subtitle/composer/etc displayed on the first page
of each piece.
To run this utility:
- Open a terminal window.
- Change to the
mcma/utilities
directory; e.g.,cd mcma/utilities
python3 normalize_instruments.py
Note that python version 3.6 or higher is required.
After running, the resulting new versions of the files are written to
subdirectories titled normalized
. It is recommended to verify the output of the utility
and then to submit these cleaned up versions to MCMA via a merge request.
N.B. This utility depends on accurate metadata in the metadata.csv
file.
In particular, for each part in each track, the names of the instruments
must be listed:
- If all instruments are the same for a piece, just write the instrument name in the Instrument column in the metadata; e.g.,
Harpsichord
- If each track has different instruments, list the instruments, separated by
;
marks. E.g.,Violin;Gamba;Harpsichord
If an instrument is not known by the script, it will be replaced by the instrument
with the closest name. If this results in the wrong instrument chosen, the file utilities/midi_instruments.csv
can be updated to add the new instrument name to an existing
general MIDI sound. For instance, Gamba
has been added to the MIDI cello
sound in this file so that
the sound is approximated in MIDI playback.