evaluate it on the OLMOCR benchmark

#1
by yeekal - opened

is there plans to evaluate it on the OLMOCR benchmark and compare it with LightOnOCR-2.

PaddlePaddle org

We found that there are quite a few issues with this benchmark during our evaluation of olmocr-bench, as I mentioned in my response to this issue.

PaddlePaddle org

@yeekal Currently, the evaluation metrics used by olmOCR-bench have some limitations and cannot effectively and fairly assess a model’s true capability in document parsing. For example, as shown in the figure, olmOCR-bench splits multi-line formulas into individual single-line formulas for evaluation, which does not comply with common standards for formula recognition. This results in abnormally high accuracy for models whose outputs match the training data distribution.
Additionally, using pass rate as the sole metric is too strict. For a complex formula, if the model misrecognizes just one character, the score is 0; if it misrecognizes 100 characters, the score is still 0. This fails to reflect the actual performance differences between models.

eadabc8c236125ed33b88fd074a9042f

b78a1bbeda55fc8e40317c1a4362c661

@yeekal Currently, the evaluation metrics used by olmOCR-bench have some limitations and cannot effectively and fairly assess a model’s true capability in document parsing. For example, as shown in the figure, olmOCR-bench splits multi-line formulas into individual single-line formulas for evaluation, which does not comply with common standards for formula recognition. This results in abnormally high accuracy for models whose outputs match the training data distribution.
Additionally, using pass rate as the sole metric is too strict. For a complex formula, if the model misrecognizes just one character, the score is 0; if it misrecognizes 100 characters, the score is still 0. This fails to reflect the actual performance differences between models.

应该用一些有难度的文档去比较他们的能力,太简单的文档难以区分他们的能力,能不能输出图片的bbox?

Clip_20260202_020507

Clip_20260202_020521

Sign up or log in to comment