SmolDocling-256M GGUF

GGUF conversions of ds4sd/SmolDocling-256M-preview for CrispEmbed inference.

Ultra-compact document conversion model (256M params). Generates DocTags structured markup from page images โ€” OCR, layout, tables, formulas, code, charts.

Model variants

File Quant Size Notes
smoldocling-f16.gguf F16 491 MB Full precision
smoldocling-q8_0.gguf Q8_0 261 MB Recommended
smoldocling-q4_k.gguf Q4_K 153 MB Max compression

Architecture

  • Vision: SigLIP ViT (12L, 768d, 12 heads, patch=16, 512px)
  • Connector: Pixel shuffle (scale=4, 1024โ†’64 tokens) + Linear(12288โ†’576)
  • LLM: SmolLM2-135M (30L, 576d, GQA 9/3, SwiGLU, RoPE)
  • Parameters: 256M total (93M vision + 135M LLM + connector)
  • Output: DocTags (structured XML-like document markup)

Parity vs HF reference: vision cos=0.9998, connector cos=0.9999.

Usage

# CLI
./crispembed -m smoldocling-q8_0.gguf --ocr document.png

# Server
./crispembed-server --ocr smoldocling-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@document.png"
from crispembed import CrispMathOcr

ocr = CrispMathOcr("smoldocling-q8_0.gguf")
doctags = ocr.recognize("document.png")
print(doctags)  # <doctag><text>...</text>...</doctag>

License

Apache-2.0 โ€” same as the base model.

Credits

Original model by Docling Team, IBM Research. GGUF conversion and inference engine by CrispEmbed.

Downloads last month
233
GGUF
Model size
0.3B params
Architecture
smoldocling
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/smoldocling-GGUF