Guidelines for Contributors¶
Please read the following guidelines carefully.
- Pieces/works/collections MUST be contrapuntal music. Conversely, homophonic pieces are not to be considered.
- Preference must be given to complete works and/or collections.
- When choosing works/collections, you are encouraged to use already existing open data as a base for your submissions but check the copyright notice of the files you start with to make sure they are freely shareable.
- While priority is given to works that are not currently represented in MCMA, we do accept different versions of pieces already collected in the database. These versions should differ either for the reference text or, more importantly, for a different choice in multi-track voice separation.
- Add a metadata.csv for each work/collection. Please see the Metadata section, below, for a detailed description.
- Add one metadata record for every file.
- Every file should be saved with the same name as the ID of the metadata.
- The name of the work/collection folder should directly based on the content of your metadata’s column “Collection”. For example, if “Collection” is “Sonate da chiesa Op. 4”, then your folder will be named “sonate_da_chiesa_op_4”. Use the following convention: all lower case, all delimiters and punctuation marks become an underscore, except for the hyphen (hyphen it is).
- All files must be saved in compressed musicXML format (mxl). Unlike MIDI, mxl supports correct pitch spelling (i.e. G# is not the same as Ab) and that’s important information to have.
- Always reference to an edition that you consider reliable, without edits. This is true for obvious matters such as notes and time signatures but also for more subtle ones.
- Inside each file, every part should have its separate track. Typically, every part would be monophonic, but occasional polyphony might happen and should be kept. For example, in the specific case of JS Bach’s Well Tempered Clavier, if a Fugue is considered to be in 4 voices keep only 4 parts (tracks). Intrapart polyphony can be added as voice expansion or as voice addition, depending on your subjective judgement, but always foregrounding musicological concerns.
- All ornaments should be written in a symbolic way. It’s preferable to omit an ornament than to have it written explicitly.
- All repeats should be written with repeat bars. No repeat should be expanded.
- Performance cues, such as rapidly varying metronome marks as used to indicate ritardando, should be removed.
- At the moment, we do not aim to faithfully represent Figured Bass notations, dynamics, slurs, or tempos. But certainly you shouldn’t throw this information away if you have it already, as our scope might expand in the future.
Metadata¶
The fields of MCMA’s metadata protocol are the following:
- ID: this should univocally describe the work, using some of the subsequent keys, in the following format: Last Name-First Name (capital initials only)-Catalog Number and a selection/combination of compositional style (if explicitly stated in the Title, but abbreviated if possible, e.g., P for Prelude) and/or Movement Number that best describes the work. For example, BachJS-BWV870-1. Since the ID to every record of MCMA should be unique, the string “-An” should be appended to the ID of alternative versions of a piece already represented in the dataset. Here, n is an integer taking the first available value from 1 upwards, example “-A2” if the piece is already represented in MCMA by a reference and a first alternative version.
- Last Name: the composer’s surname.
- First Name: the composer’s forename and middle name(s).
- Title: the full work’s name, e.g., “Allemande from Cello Suite No. 1 in G Major, BWV 1007”.
- Collection: the name of the collection/work the piece belongs to, if it exists. For example, “The Well-Tempered Clavier”.
- Catalog Number: scholarly catalog number, e.g., BWV 1007. In case of conflicting naming, please add a clarification in the Notes entry.
- Movement Number: an integer. In case of a theme/aria in a theme & variation_, or for the prologue in an opera, use 0.
- Number of Tracks: an integer. This will correspond to the number of tracks in the mxl file.
- Instruments: series of instruments names (top to bottom) using a semicolon as delimiter (e.g., violin;viola;cello). If all parts are the same instrument, one instance of it is sufficient.
- Year: the year the work was composed, if known. Note: if only the parent Collection’s dates are known, and if they spanned several years, this will be the year of completion.
- Provenance: name or URL of the dataset/database containing the original file used as basis for the contrapuntal multitrack expansion.
- Reference Edition: self-explanatory, e.g., Universal Edition.
- Link to Reference Edition: the URL of an accessible version of the reference edition.
- Comments: anything else.
MCMA Utilities¶
Metadata Creation¶
The python utility metadata_writer
can be used to create the metadata file. To do so, update the
basic_info
inside metadata_writer.py
with the required information, and follow the instructions
at the top of that file to generate a metadata.csv
file. This requires basic knowledge of python and
running python scripts.
File Cleanup/Normalization Utility¶
Once a set of files is prepared in a directory along with its corresponding metadata.csv
file,
another script, normalize_instruments.py
, should be run to convert the MusicXML files
into a standard format used by the MCMA corpus. This updates the MIDI sounds used in the files,
adjusts volume and panning, and cleans up the title/subtitle/composer/etc displayed on the first page
of each piece.
To run this utility:
- Open a terminal window.
- Change to the
mcma/utilities
directory; e.g.,cd mcma/utilities
python3 normalize_instruments.py
Note that python version 3.6 or higher is required.
After running, the resulting new versions of the files are written to
subdirectories titled normalized
. It is recommended to verify the output of the utility
and then to submit these cleaned up versions to MCMA via a merge request.
N.B. This utility depends on accurate metadata in the metadata.csv
file.
In particular, for each part in each track, the names of the instruments
must be listed:
- If all instruments are the same for a piece, just write the instrument name in the Instrument column in the metadata; e.g.,
Harpsichord
- If each track has different instruments, list the instruments, separated by
;
marks. E.g.,Violin;Gamba;Harpsichord
If an instrument is not known by the script, it will be replaced by the instrument
with the closest name. If this results in the wrong instrument chosen, the file utilities/midi_instruments.csv
can be updated to add the new instrument name to an existing
general MIDI sound. For instance, Gamba
has been added to the MIDI cello
sound in this file so that
the sound is approximated in MIDI playback.