prithivMLmods
/

FireRed-OCR-GGUF

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

FireRed-OCR-GGUF / README.md

prithivMLmods's picture

Update README.md

c7e088b verified 1 day ago

|

history blame contribute delete

2.94 kB

	---
	license: apache-2.0
	base_model:
	- FireRedTeam/FireRed-OCR
	language:
	- en
	pipeline_tag: image-text-to-text
	library_name: transformers
	tags:
	- text-generation-inference
	- llama.cpp
	---

	# FireRed-OCR-GGUF

	> FireRed-OCR from FireRedTeam is a specialized framework that transforms general Large Vision-Language Models into pixel-precise structural document parsing experts, tackling "Structural Hallucination" issues like disordered rows and invented formulas through a shift to "structural engineering" paradigms, achieving SOTA 92.94% on OmniDocBench v1.5—vastly outperforming DeepSeek-OCR 2, OCRVerse, and giants like Gemini-3.0 Pro or Qwen3-VL-235B. Its key innovations include Format-Constrained GRPO (Group Relative Policy Optimization) for enforcing syntactic validity (no unclosed tables or invalid LaTeX), a "Geometry + Semantics" data factory with geometric clustering and multi-dimensional tagging for balanced long-tail layouts, and a progressive pipeline: multi-task pre-alignment for spatial grounding, specialized SFT for standardized full-image Markdown output, and GRPO self-correction via RL. Demonstrating in-the-wild robustness on FireRedBench complex layouts over traditional systems like PaddleOCR, it excels in high-fidelity parsing of tables, equations, forms, and multi-column documents for real-world automation.

	## Model Files

	\| File Name \| Quant Type \| File Size \| File Link \|
	\| - \| - \| - \| - \|
	\| FireRed-OCR.BF16.gguf \| BF16 \| 3.45 GB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.BF16.gguf) \|
	\| FireRed-OCR.F16.gguf \| F16 \| 3.45 GB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.F16.gguf) \|
	\| FireRed-OCR.F32.gguf \| F32 \| 6.89 GB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.F32.gguf) \|
	\| FireRed-OCR.Q8_0.gguf \| Q8_0 \| 1.83 GB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.Q8_0.gguf) \|
	\| FireRed-OCR.mmproj-bf16.gguf \| mmproj-bf16 \| 823 MB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.mmproj-bf16.gguf) \|
	\| FireRed-OCR.mmproj-f16.gguf \| mmproj-f16 \| 823 MB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.mmproj-f16.gguf) \|
	\| FireRed-OCR.mmproj-f32.gguf \| mmproj-f32 \| 1.63 GB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.mmproj-f32.gguf) \|
	\| FireRed-OCR.mmproj-q8_0.gguf \| mmproj-q8_0 \| 445 MB \| [Download](https://huggingface.co/prithivMLmods/FireRed-OCR-GGUF/blob/main/FireRed-OCR.mmproj-q8_0.gguf) \|

	## Quants Usage

	(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

	Here is a handy graph by ikawrakow comparing some lower-quality quant
	types (lower is better):

	![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)