EVALUATION :

Here is a comparison benchmark with audiveris.

Those results were produced with mir-evaluation software. Here is the list of the piece use with the script used to compare the musicxml files :

The Entertainer Prelude in D-Major / Scriabin Six Dances in Bulgarian Rythm Gnosienne No.1 Traumerei / Schumann La Campanella Sonate No. 14, “Moonlight” 3rd Movement Clair de Lune Waltz No.7 in C sharp minor Op.64 No.2 / Frédéric Chopin Fugue No. 2 BWV 847 in C Minor

Here is the python script used for the comparisons and scoring here

A note on those results show clear sign of how the model can be good but the limitations it has too. On scores with no clean arrangment following the stave the model suffers a lot ! Beceause the model has been trained to recognize stave by stave it misses a lot if it's not proprely organized. That's why i picked a variety of scores in the benchmark to show that without cherry picking the model performance can be very low.

USING IT:

On this repo you will find a tool to use the omr model as well as information on how to install it. Since it's a model how fast it is depends a lot of your hardware too. On my gpu on normal mode it's 3* times faster than audiveris with an average of about 10 seconds per page. Those exemples is with a rtx 3080 gpu. But it should run well on any gpu.

And What?

On this section i will be discussing as of how the model could be better with this approach of a davit base model with dora training for omr purposes and how it could get better right now. First for a bit of context i am no scientist i am a student so i was limited in ressources for the training itself. So an individual or a proper team with the ressources could probably (like a cluster of gpu) produce a better model easily. Second the model is quite slow and is part due to a choice i made during the choice of a dora rank for the model. Right now 64 is used but 32 would be enough and could lead to 30-40% improvments in inference speed. I also think that the model can get extra better with help of vision where it struggles or just a different approach that my stave by stave basis. Any training code i produced for training the model will be on this repo there is also all my architecturals choices and the research papers i used. I hope that a better model can be produced and/or someone implement the model x vision too make the accuracy even better !

Downloads last month: 48

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for clquwu/Clarity-OMR

Base model

timm/davit_base.msft_in1k

Finetuned

(1)

this model

clquwu
/

Clarity-OMR

Model tree for clquwu/Clarity-OMR

Space using clquwu/Clarity-OMR 1