TextPecker-8B-Qwen3VL

TextPecker is a structural anomaly perceptive model designed to enhance Visual Text Rendering (VTR). It addresses a critical bottleneck where standard MLLMs and OCR models fail to perceive structural anomalies such as distortion, blurriness, and misalignment in generated text. This model acts as a plug-and-play evaluator and reward signal for RL-based optimization (e.g., using Flow-GRPO), enabling the generation of structurally faithful visual text.

This checkpoint is built upon the Qwen3-VL-8B-Instruct architecture and was trained using ms-swift.

Model Details

Uses

TextPecker can be used to evaluate text structural quality and semantic consistency for text-to-image generation or editing tasks. It is particularly useful for:

  • Structural Anomaly Quantification: Identifying distortion, blurriness, and misalignment in rendered text.
  • Reward Modeling: Providing reward signals for Reinforcement Learning (RL) to improve text rendering in generators like Flux or SD3.5.

To use this model, please follow the official deployment and testing instructions:

Citation

If you find TextPecker useful in your research or work, please cite the paper:

@article{zhu2026TextPecker,
  title   = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
  author  = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Dingkang Yang and Chao Feng and Can Huang and Jingqun Tang and Xiang Bai},
  journal = {arXiv preprint arXiv:2602.20903},
  year    = {2026}
}
Downloads last month
63
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CIawevy/TextPecker-8B-Qwen3VL

Finetuned
(214)
this model

Collection including CIawevy/TextPecker-8B-Qwen3VL

Paper for CIawevy/TextPecker-8B-Qwen3VL