Image-Text-to-Text
Safetensors
qwen2_5_vl
historical
conversational
churro-3B / README.md
s-jse's picture
Update README.md
ca2150e verified
metadata
license: other
license_name: qwen-research
license_link: LICENSE
datasets:
  - stanford-oval/churro-dataset
base_model:
  - Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
tags:
  - historical

CHURRO Logo

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Model Dataset Paper GitHub Stars

Handwritten and printed text recognition across 22 centuries and 46 language clusters, including historical and dead languages.

Cost vs Performance comparison showing CHURRO's accuracy advantage at significantly lower cost
Cost vs. accuracy: CHURRO (3B) achieves higher accuracy than much larger commercial and open-weight VLMs while being substantially cheaper.

CHURRO is a 3B-parameter open-weight vision-language model (VLM) for historical document transcription. It is trained on CHURRO-DS, a curated dataset of ~100K pages from 155 historical collections spanning 22 centuries and 46 language clusters. On the CHURRO-DS test set, CHURRO delivers 15.5× lower cost than Gemini 2.5 Pro while exceeding its accuracy.

For more details and code see https://github.com/stanford-oval/Churro.