FireRed-OCR — CrispEmbed GGUF

GGUF conversion of FireRedTeam/FireRed-OCR for CrispEmbed.

FireRed-OCR is a Qwen3-VL 2B fine-tune with Format-Constrained GRPO training for tables, formulas, and structured document OCR. 92.94% on OmniDocBench v1.5.

Architecture

Vision: Qwen3-VL ViT (24 layers, 1024d, patch 16, deepstack [5,11,17])
LLM: Qwen3-2B (28 layers, 2048d, GQA 16/8, QK norms, mRoPE interleaved)
Training: Format-Constrained GRPO with rewards for formula syntax, table integrity, hierarchical closure

Models

File	Quant	Size	Notes
`firered-ocr-f16.gguf`	F16	4.0 GB	Full precision
`firered-ocr-q8_0.gguf`	Q8_0	2.2 GB	Best quality/size
`firered-ocr-q4_k.gguf`	Q4_K	1.6 GB	Smallest
`firered-ocr-ref.gguf`	F32	17 MB	Reference activations for parity testing

Usage

crispembed --ocr firered-ocr-q8_0.gguf document.png

Runs on the qwen2vl_ocr engine (auto-detected from GGUF architecture qwen3vl).

License

Apache-2.0 (FireRedTeam/FireRed-OCR)

Downloads last month: 238

GGUF

Model size

2B params

Architecture

qwen3vl

Hardware compatibility

8-bit

16-bit

View +1 variant

Model tree for cstr/firered-ocr-crispembed-GGUF

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

FireRedTeam/FireRed-OCR

Quantized

(5)

this model