Instructions to use CIawevy/TextPecker-8B-InternVL3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CIawevy/TextPecker-8B-InternVL3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="CIawevy/TextPecker-8B-InternVL3", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("CIawevy/TextPecker-8B-InternVL3", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CIawevy/TextPecker-8B-InternVL3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CIawevy/TextPecker-8B-InternVL3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CIawevy/TextPecker-8B-InternVL3", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/CIawevy/TextPecker-8B-InternVL3
- SGLang
How to use CIawevy/TextPecker-8B-InternVL3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CIawevy/TextPecker-8B-InternVL3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CIawevy/TextPecker-8B-InternVL3", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CIawevy/TextPecker-8B-InternVL3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CIawevy/TextPecker-8B-InternVL3", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use CIawevy/TextPecker-8B-InternVL3 with Docker Model Runner:
docker model run hf.co/CIawevy/TextPecker-8B-InternVL3
TextPecker-8B-InternVL3
TextPecker-8B-InternVL3 is an evaluator model presented in the paper TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering.
While standard Multimodal LLMs often fail to notice fine-grained text errors like distortion or misalignment in generated images, TextPecker is specifically designed to perceive and quantify these structural anomalies to provide reliable reward signals for RL-based optimization of text-to-image models.
This checkpoint is based on the InternVL3-8B-Instruct architecture and was trained using the ms-swift framework on the TextPecker-1.5M dataset.
Model Details
- Developed by: Hanshen Zhu, Yuliang Liu, et al. (Huazhong University of Science & Technology and ByteDance)
- Model Type: Multimodal Large Language Model (MLLM)
- Base Model: OpenGVLab/InternVL3-8B-Instruct
- Task: Image-to-Text (Structural Anomaly Perception / OCR Evaluator)
- License: Apache 2.0
Model Sources
- Repository: https://github.com/CIawevy/TextPecker
- Paper: https://huggingface.co/papers/2602.20903
- Dataset: CIawevy/TextPecker-1.5M
Uses
TextPecker can be used to evaluate text structural quality and semantic consistency for text generation or editing scenarios. It helps bridge the gap in Visual Text Rendering (VTR) optimization by providing reliable feedback on character-level structural fidelity.
To use the model for deployment or evaluation, please follow the instructions in the official repository:
Citation
If you find TextPecker useful in your research, please cite:
@article{zhu2026TextPecker,
title = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
author = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
journal = {arXiv preprint arXiv:2602.20903},
year = {2026}
}
Acknowledgement
Training was conducted using the ms-swift framework. We thank the authors of InternVL and ms-swift for their excellent open-source contributions.
- Downloads last month
- 25
Model tree for CIawevy/TextPecker-8B-InternVL3
Base model
OpenGVLab/InternVL3-8B-Pretrained