Guidelines for Contributors

Please read the following guidelines carefully.

  1. Pieces/works/collections MUST be contrapuntal music. Conversely, homophonic pieces are not to be considered.
  2. Preference must be given to complete works and/or collections.
  3. When choosing works/collections, you are encouraged to use already existing open data as a base for your submissions but check the copyright notice of the files you start with to make sure they are freely shareable.
  4. While priority is given to works that are not currently represented in MCMA, we do accept different versions of pieces already collected in the database. These versions should differ either for the reference text or, more importantly, for a different choice in multi-track voice separation.
  5. Add a metadata.csv for each work/collection. Please see the Metadata section, below, for a detailed description.
  6. Add one metadata record for every file.
  7. Every file should be saved with the same name as the ID of the metadata.
  8. The name of the work/collection folder should directly based on the content of your metadata’s column “Collection”. For example, if “Collection” is “Sonate da chiesa Op. 4”, then your folder will be named “sonate_da_chiesa_op_4”. Use the following convention: all lower case, all delimiters and punctuation marks become an underscore, except for the hyphen (hyphen it is).
  9. All files must be saved in compressed musicXML format (mxl). Unlike MIDI, mxl supports correct pitch spelling (i.e. G# is not the same as Ab) and that’s important information to have.
  10. Always reference to an edition that you consider reliable, without edits. This is true for obvious matters such as notes and time signatures but also for more subtle ones.
  11. Inside each file, every part should have its separate track. Typically, every part would be monophonic, but occasional polyphony might happen and should be kept. For example, in the specific case of JS Bach’s Well Tempered Clavier, if a Fugue is considered to be in 4 voices keep only 4 parts (tracks). Intrapart polyphony can be added as voice expansion or as voice addition, depending on your subjective judgement, but always foregrounding musicological concerns.
  12. All ornaments should be written in a symbolic way. It’s preferable to omit an ornament than to have it written explicitly.
  13. All repeats should be written with repeat bars. No repeat should be expanded.
  14. Performance cues, such as rapidly varying metronome marks as used to indicate ritardando, should be removed.
  15. At the moment, we do not aim to faithfully represent Figured Bass notations, dynamics, slurs, or tempos. But certainly you shouldn’t throw this information away if you have it already, as our scope might expand in the future.

Metadata

The fields of MCMA’s metadata protocol are the following:

  • ID: this should univocally describe the work, using some of the subsequent keys, in the following format: Last Name-First Name (capital initials only)-Catalog Number and a selection/combination of compositional style (if explicitly stated in the Title, but abbreviated if possible, e.g., P for Prelude) and/or Movement Number that best describes the work. For example, BachJS-BWV870-1. Since the ID to every record of MCMA should be unique, the string “-An” should be appended to the ID of alternative versions of a piece already represented in the dataset. Here, n is an integer taking the first available value from 1 upwards, example “-A2” if the piece is already represented in MCMA by a reference and a first alternative version.
  • Last Name: the composer’s surname.
  • First Name: the composer’s forename and middle name(s).
  • Title: the full work’s name, e.g., “Allemande from Cello Suite No. 1 in G Major, BWV 1007”.
  • Collection: the name of the collection/work the piece belongs to, if it exists. For example, “The Well-Tempered Clavier”.
  • Catalog Number: scholarly catalog number, e.g., BWV 1007. In case of conflicting naming, please add a clarification in the Notes entry.
  • Movement Number: an integer. In case of a theme/aria in a theme & variation_, or for the prologue in an opera, use 0.
  • Number of Tracks: an integer. This will correspond to the number of tracks in the mxl file.
  • Instruments: series of instruments names (top to bottom) using a semicolon as delimiter (e.g., violin;viola;cello). If all parts are the same instrument, one instance of it is sufficient.
  • Year: the year the work was composed, if known. Note: if only the parent Collection’s dates are known, and if they spanned several years, this will be the year of completion.
  • Provenance: name or URL of the dataset/database containing the original file used as basis for the contrapuntal multitrack expansion.
  • Reference Edition: self-explanatory, e.g., Universal Edition.
  • Link to Reference Edition: the URL of an accessible version of the reference edition.
  • Comments: anything else.

MCMA Utilities

Metadata Creation

The python utility metadata_writer can be used to create the metadata file. To do so, update the basic_info inside metadata_writer.py with the required information, and follow the instructions at the top of that file to generate a metadata.csv file. This requires basic knowledge of python and running python scripts.

File Cleanup/Normalization Utility

Once a set of files is prepared in a directory along with its corresponding metadata.csv file, another script, normalize_instruments.py, should be run to convert the MusicXML files into a standard format used by the MCMA corpus. This updates the MIDI sounds used in the files, adjusts volume and panning, and cleans up the title/subtitle/composer/etc displayed on the first page of each piece.

To run this utility:

  • Open a terminal window.
  • Change to the mcma/utilities directory; e.g., cd mcma/utilities
  • python3 normalize_instruments.py

Note that python version 3.6 or higher is required.

After running, the resulting new versions of the files are written to subdirectories titled normalized. It is recommended to verify the output of the utility and then to submit these cleaned up versions to MCMA via a merge request.

N.B. This utility depends on accurate metadata in the metadata.csv file. In particular, for each part in each track, the names of the instruments must be listed:

  • If all instruments are the same for a piece, just write the instrument name in the Instrument column in the metadata; e.g., Harpsichord
  • If each track has different instruments, list the instruments, separated by ; marks. E.g., Violin;Gamba;Harpsichord

If an instrument is not known by the script, it will be replaced by the instrument with the closest name. If this results in the wrong instrument chosen, the file utilities/midi_instruments.csv can be updated to add the new instrument name to an existing general MIDI sound. For instance, Gamba has been added to the MIDI cello sound in this file so that the sound is approximated in MIDI playback.