How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="batiai/batisee",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

batisee — On-device Korean Document OCR by BatiAI

ℹ️ Ollama: batisee uses the brand-new DeepSeek-OCR (deepseek2ocr) architecture, which the bundled Ollama engine does not load yet. Run it today with llama.cpp (below); Ollama support will follow once the engine merges this architecture.

batisee is BatiAI's on-device document-OCR model — part of the BatiAI perception family (batisay = speech-to-text, batispeak = diarization, batisee = document/OCR).

Built on baidu/Unlimited-OCR (DeepSeek-OCR architecture, MIT), converted to GGUF directly from the original weights by BatiAI (not a re-host of community quants), BatiAI-signed, and verified for Korean so you can run it on a Mac with confidence.

batisee 는 BatiAI 인지(perception) 제품군의 문서 OCR 모델입니다 (batisay=음성인식, batispeak=화자분리, batisee=문서/OCR). baidu/Unlimited-OCR(DeepSeek-OCR 아키텍처, MIT)를 베이스로, 원본 가중치에서 BatiAI가 직접 GGUF 변환(타사 양자화물 재배포 아님)하고, BatiAI 서명 + 한국어 검증을 거쳐 Mac에서 바로 쓰도록 패키징했습니다.

Why batisee?

  • On-device — runs locally on a Mac (no cloud, no upload). Q4_K_M is 1.9 GB.
  • Korean-verified — measured on rendered Korean documents (see results below): clean text CER 0%, hard document (small font + table + blur) 100% key-content recall with table structure preserved.
  • Document-native — outputs layout boxes (<|det|>) and converts tables to HTML <table>.
  • Our own conversion — GGUF built directly from baidu/Unlimited-OCR original safetensors, BatiAI-signed (general.author = BatiAI).
  • MIT — fully commercial-friendly.

⭐ Korean OCR results / 한국어 OCR 검증

Rendered Korean documents (ground-truth known) → OCR → compared. Method & images: ocr-poc/gate-results.

Test / 테스트 Difficulty / 난이도 Hangul kept / 한글보존 Key recall / 핵심recall Table / 표 CER
Gate 1 (clean) clean text 100% 0.0%
Gate 2 (hard) small font + table + blur 100% 100% <table>

Both Q8_0 and Q4_K_M pass with no degradation and no decoding loops. Q8/Q4 모두 품질 저하·디코딩 루프 없이 통과.

Available files

File Size Use
batisee-text-Q8_0.gguf 3.0 GB highest quality / 최고품질
batisee-text-Q4_K_M.gguf 1.9 GB 16 GB Mac sweet spot (recommended)
mmproj-batisee-BF16.gguf 826 MB vision encoder (required) / 비전 인코더(필수)

How to run (llama.cpp)

⚠️ This is a multimodal model — you always need both the text GGUF and mmproj-batisee-BF16.gguf.

🍎 On a Mac: brew install llama.cpp (version ≥ 9430) provides llama-mtmd-cli and loads batisee directly — verified on M4 Max, no source build needed.

hf download batiai/batisee --include "batisee-text-Q4_K_M.gguf" --include "mmproj-batisee-BF16.gguf" --local-dir ./batisee

llama-mtmd-cli \
    -m ./batisee/batisee-text-Q4_K_M.gguf \
    --mmproj ./batisee/mmproj-batisee-BF16.gguf \
    --image your-document.png \
    -p "document parsing." \
    --jinja --temp 0 --repeat-penalty 1.05 -ngl 99

⭐ Recipe matters (learned the hard way)

Flag Why
-p "document parsing." The prompt must be this. "Free OCR." triggers a buggy reasoning mode that emits meta-commentary instead of the text.
--jinja Without it the chat-template step crashes.
--temp 0 --repeat-penalty 1.05 Without the penalty the decoder can fall into an infinite repeat loop.

Model details

  • Base: baidu/Unlimited-OCRDeepSeek-OCR architecture
    • Text: DeepSeek-3B-MoE (12 layers, 64 routed experts top-6, standard MHA, 32K context) → deepseek2ocr
    • Vision: DeepEncoder (CLIP-L-14 + SAM-ViT-B, 1024px) + linear projector
  • Conversion: built directly from original safetensors with llama.cpp (DeepSeek-OCR support). Image normalization mean = std = [0.5, 0.5, 0.5].
  • License: MIT (inherited)

BatiAI signing

All GGUFs carry:

  • general.author = BatiAI
  • general.url = https://flow.bati.ai

Attribution & License

This model is a GGUF distribution of baidu/Unlimited-OCR (MIT), which is built on the DeepSeek-OCR architecture. Original authors' work and license are retained; BatiAI's contribution is the from-original GGUF conversion, signing, Korean verification, and on-device packaging.

본 모델은 baidu/Unlimited-OCR(MIT)의 GGUF 배포본입니다. 원저작자 작업·라이선스를 유지하며, BatiAI 기여는 원본에서의 직접 GGUF 변환·서명·한국어 검증·온디바이스 패키징입니다.

Roadmap

  • Korean-specialized fine-tuning (handwriting / camera photos / low-quality scans) — current verification covers clean digital-rendered text; real-world robustness is the next milestone.
  • 한국어 특화 파인튜닝(손글씨·카메라 사진·저품질 스캔) 예정 — 현 검증은 디지털 렌더 텍스트 기준.

About BatiFlow

BatiFlow — free, unlimited, on-device AI for Mac.

On-device benchmark — MacBook Pro M4 Max (Q4_K_M)

Measured with brew llama-mtmd-cli 9430, on the same 4 stress documents as the desktop GPU.

Metric Value
Engine Homebrew llama.cpp (llama-mtmd-cli) 9430 — loads deepseek2ocr fine, no source build needed
Page latency (full pipeline) ~3.0 s/page cold, ~3 s warm (≈ desktop GPU's 2.56 s/page)
Memory (max RSS) 2.94 GB (peak 2.97 GB)
Quality digital docs/tables near-perfect (numbers 100%, occasional single KR-glyph slip); heavy degradation / skew = known limits → v2 roadmap

tokens/sec and standalone mmproj-encode time are not emitted by the 9430 Homebrew bottle (its perf block is suppressed); available via a source build if needed. Page latency + RSS are the user-facing numbers and confirm M4 Max ≈ desktop-GPU class.

Downloads last month
102
GGUF
Model size
3B params
Architecture
deepseek2-ocr
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for batiai/batisee

Quantized
(10)
this model