FireRed-OCR-GGUF / README.md
prithivMLmods's picture
Update README.md
c7e088b verified
metadata
license: apache-2.0
base_model:
  - FireRedTeam/FireRed-OCR
language:
  - en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - text-generation-inference
  - llama.cpp

FireRed-OCR-GGUF

FireRed-OCR from FireRedTeam is a specialized framework that transforms general Large Vision-Language Models into pixel-precise structural document parsing experts, tackling "Structural Hallucination" issues like disordered rows and invented formulas through a shift to "structural engineering" paradigms, achieving SOTA 92.94% on OmniDocBench v1.5—vastly outperforming DeepSeek-OCR 2, OCRVerse, and giants like Gemini-3.0 Pro or Qwen3-VL-235B. Its key innovations include Format-Constrained GRPO (Group Relative Policy Optimization) for enforcing syntactic validity (no unclosed tables or invalid LaTeX), a "Geometry + Semantics" data factory with geometric clustering and multi-dimensional tagging for balanced long-tail layouts, and a progressive pipeline: multi-task pre-alignment for spatial grounding, specialized SFT for standardized full-image Markdown output, and GRPO self-correction via RL. Demonstrating in-the-wild robustness on FireRedBench complex layouts over traditional systems like PaddleOCR, it excels in high-fidelity parsing of tables, equations, forms, and multi-column documents for real-world automation.

Model Files

File Name Quant Type File Size File Link
FireRed-OCR.BF16.gguf BF16 3.45 GB Download
FireRed-OCR.F16.gguf F16 3.45 GB Download
FireRed-OCR.F32.gguf F32 6.89 GB Download
FireRed-OCR.Q8_0.gguf Q8_0 1.83 GB Download
FireRed-OCR.mmproj-bf16.gguf mmproj-bf16 823 MB Download
FireRed-OCR.mmproj-f16.gguf mmproj-f16 823 MB Download
FireRed-OCR.mmproj-f32.gguf mmproj-f32 1.63 GB Download
FireRed-OCR.mmproj-q8_0.gguf mmproj-q8_0 445 MB Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png