**Observed Hallucinations and Arabic OCR Quality Issues in the Fine-Tuned GLM Model**

#1
by JamesGs - opened

In our tests, the model shows significant hallucination, particularly repeated text patterns such as:

B
نصر
M
DZ
B
M
نصر
B

We also observed issues in the JSON responses, including invalid JSON structures or duplicated fields, for example:

"arabic last name": "نصرالله",
"arabic first name": "نصرالله",
"arabic last name": "نصرالله"

Additionally, there are many spelling errors in the Arabic text.

Could you clarify how the model was fine-tuned and which dataset was used for the training?

I'm working on version 2 of the model, which has very strong improvements and is much better than this model in terms of vision, language, and attention settings. I'll release it in a few days, God willing.

Sign up or log in to comment