Add paper metadata and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +45 -8
README.md CHANGED
@@ -1,17 +1,54 @@
1
  ---
 
2
  base_model: OpenGVLab/InternVL3-8B-Instruct
 
 
 
 
 
 
 
 
3
  ---
4
- # Model Card for Model ID
5
 
 
6
 
7
- <!-- Provide a quick summary of what the model is/does. -->
8
- This model is trained using ms-swift.
 
 
 
9
 
10
  ## Model Details
11
- ### Model Sources
12
- <!-- Provide the basic links for the model. -->
13
- - **Repository:** https://github.com/CIawevy/TextPecker/tree/main
14
- - **Paper:** https://www.arxiv.org/pdf/2602.20903
 
 
 
 
 
 
15
 
16
  ## Uses
17
- To use our model, please following our official repo: [TextPecker_deploy](https://github.com/CIawevy/TextPecker/tree/main) and [TextPecker_demo](https://github.com/CIawevy/TextPecker/blob/main/eval/TextPecker_eval/demo.py)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  base_model: OpenGVLab/InternVL3-8B-Instruct
4
+ library_name: transformers
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - multimodal
8
+ - ocr
9
+ - vtr
10
+ - text-rendering
11
+ - ms-swift
12
  ---
 
13
 
14
+ # TextPecker-8B-InternVL3
15
 
16
+ TextPecker-8B-InternVL3 is an evaluator model presented in the paper [TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering](https://huggingface.co/papers/2602.20903).
17
+
18
+ While standard Multimodal LLMs often fail to notice fine-grained text errors like distortion or misalignment in generated images, TextPecker is specifically designed to perceive and quantify these structural anomalies to provide reliable reward signals for RL-based optimization of text-to-image models.
19
+
20
+ This checkpoint is based on the **InternVL3-8B-Instruct** architecture and was trained using the [ms-swift](https://github.com/modelscope/ms-swift) framework on the [TextPecker-1.5M](https://huggingface.co/datasets/CIawevy/TextPecker-1.5M) dataset.
21
 
22
  ## Model Details
23
+ - **Developed by:** Hanshen Zhu, Yuliang Liu, et al. (Huazhong University of Science & Technology and ByteDance)
24
+ - **Model Type:** Multimodal Large Language Model (MLLM)
25
+ - **Base Model:** [OpenGVLab/InternVL3-8B-Instruct](https://huggingface.co/OpenGVLab/InternVL3-8B-Instruct)
26
+ - **Task:** Image-to-Text (Structural Anomaly Perception / OCR Evaluator)
27
+ - **License:** Apache 2.0
28
+
29
+ ## Model Sources
30
+ - **Repository:** [https://github.com/CIawevy/TextPecker](https://github.com/CIawevy/TextPecker)
31
+ - **Paper:** [https://huggingface.co/papers/2602.20903](https://huggingface.co/papers/2602.20903)
32
+ - **Dataset:** [CIawevy/TextPecker-1.5M](https://huggingface.co/datasets/CIawevy/TextPecker-1.5M)
33
 
34
  ## Uses
35
+ TextPecker can be used to evaluate text structural quality and semantic consistency for text generation or editing scenarios. It helps bridge the gap in Visual Text Rendering (VTR) optimization by providing reliable feedback on character-level structural fidelity.
36
+
37
+ To use the model for deployment or evaluation, please follow the instructions in the official repository:
38
+ - [TextPecker Deployment Guide](https://github.com/CIawevy/TextPecker/tree/main)
39
+ - [TextPecker Evaluation Demo](https://github.com/CIawevy/TextPecker/blob/main/eval/TextPecker_eval/demo.py)
40
+
41
+ ## Citation
42
+ If you find TextPecker useful in your research, please cite:
43
+
44
+ ```bibtex
45
+ @article{zhu2026TextPecker,
46
+ title = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
47
+ author = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
48
+ journal = {arXiv preprint arXiv:2602.20903},
49
+ year = {2026}
50
+ }
51
+ ```
52
+
53
+ ## Acknowledgement
54
+ Training was conducted using the **ms-swift** framework. We thank the authors of InternVL and ms-swift for their excellent open-source contributions.