Poor Russian language generation.

#41
by borisbek - opened

I am trying to use GLM-OCR to recognize Russian accounting and financial documents.
I am satisfied with table recognition, but the quality of the generated Russian text is quite poor, despite the good quality of the input documents.
What is the recommended way to fine-tune the model for language and domain adaptation?

@borisbek Hi, thank you for your interest in GLM-OCR.

The current version still has limitations in multilingual scenarios, the recognition quality for non-Chinese/English text is not yet fully optimized. Enhancing multilingual text (including Russian) performance is one of our key priorities in the next release.

For language and domain adaptation, we recommend fine-tuning the model with your own Russian financial document dataset. You can find the step-by-step fine-tuning guide here:
GLM-OCR Fine-tuning Guide

If possible, please share some bad cases with us so that we can use them for evaluation.

iyuge2 changed discussion status to closed

Sign up or log in to comment