Organ Tablature OCR Data Set

Scans of old organ tabulatures.
Photo: Public Domain

Organ tablature music notation differs considerably in structure and form from the music notation used today. The manual transcription of organ tablature compositions to modern music notation is time-consuming and often prone to errors.

For the paper Automatic Transcription of Organ Tablature Music Notation with Deep Neural Networks we created a data set for training a neural network to perform an optical character recognition on scans of organ tablatures.

We utilize two scanned organ tablature books as sources for the data set and manually annotated 1,200 staves from each book with label sequences. To increase the amount of available data we employed data augmentation and a synthetic data generator that randomly arranges images of single characters into tablature rows.

The data set with annotated staves and the open-source code of our data generation and augmentation tools are openly available on Github. More information can be found in the scientific paper published in Transactions of the International Society for Music Information Retrieval in 2021.

