Encircled Letter/Number
Hello. Your project is truly impressive. I didn’t expect Korean character recognition to become this good. It produced results that I am completely satisfied with. However, there are issues with recognizing circled characters (enclosed alphanumerics) in both Korean and English.
Circled numbers are recognized almost perfectly.
ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ
㉮ ㉯ ㉰ ㉱ ㉲ ㉳ ㉴ ㉵ ㉶ ㉷ ㉸ ㉹ ㉺ ㉻
㉠ ㉡ ㉢ ㉣ ㉤ ㉥ ㉦ ㉧ ㉨ ㉩ ㉪ ㉫ ㉬ ㉭
The recognition rate for these enclosed characters is not good. I attempted fine-tuning using LoRA, but I ran into an overfitting issue. Even after changing the resolution and other parameters, the problem still remains.
In addition, the following items are necessary for your product to advance into a fully production-ready OCR solution:
- Improve the accuracy of enclosed (circled) character recognition
- A method for handling underlines, text inside rectangular borders, bold text, and italic text
你好。你们的项目非常令人惊叹。我没想到韩文识别能达到这么高的水平,结果让我完全满意。不过,在韩文和英文两方面,对圈/框字符(带外框的字母或韩文字母)的识别仍然存在问题。
带圈数字几乎可以做到完美识别。
ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ
㉮ ㉯ ㉰ ㉱ ㉲ ㉳ ㉴ ㉵ ㉶ ㉷ ㉸ ㉹ ㉺ ㉻
㉠ ㉡ ㉢ ㉣ ㉤ ㉥ ㉦ ㉧ ㉨ ㉩ ㉪ ㉫ ㉬ ㉭
这些圈/框字符的识别率不高。我尝试使用 LoRA 进行微调,但遇到了过拟合问题。即使调整分辨率和其他参数,问题依然存在。
另外,要让贵司产品迈向真正可用于生产环境的完善 OCR 产品,我认为还需要具备以下内容:
- 提升圈/框字符的识别准确率
- 能够处理下划线、矩形边框内文字、加粗文字以及斜体文字的方式