Spaces:

ricklon
/

DeepSeek-OCR-2-Math

Running on Zero

App Files Files Community

DeepSeek-OCR-2-Math / README.md

ricklon

Add LaTeX lint/correction item to backlog

f3954b3 11 days ago

preview code

raw

history blame contribute delete

3.59 kB

	---
	title: DeepSeek OCR 2 — Math Rendering Edition
	emoji: 🧮
	colorFrom: red
	colorTo: blue
	sdk: gradio
	sdk_version: 6.8.0
	app_file: app.py
	pinned: true
	short_description: DeepSeek-OCR-2 with MathJax math rendering
	license: mit
	python_version: "3.12"
	suggested_hardware: zero-a10g
	---

	# DeepSeek-OCR-2 — Math Rendering Edition

	Built on top of the excellent [DeepSeek-OCR-2 Demo](https://huggingface.co/spaces/merterbak/DeepSeek-OCR-2) by Mert Erbak. Many thanks for the clean foundation — the OCR pipeline, PDF support, bounding box visualisation, and grounding features are all his work.

	## What's new in this fork

	- MathJax rendering — the Markdown Preview tab now renders LaTeX math notation (inline `$...$` and display `$$...$$`) using MathJax 3, so equations from scanned papers and textbooks display as proper math rather than raw LaTeX source.

	## Features (inherited + extended)

	\| Feature \| Description \|
	\|---\|---\|
	\| 📋 Markdown \| Convert documents to structured markdown with layout detection \|
	\| 📝 Free OCR \| Simple text extraction without layout analysis \|
	\| 📍 Locate \| Find and highlight specific text or elements with bounding boxes \|
	\| 🔍 Describe \| General image description \|
	\| ✏️ Custom \| Provide your own prompt \|
	\| 🧮 Math Preview \| Rendered MathJax output for equations and formulas (new) \|

	## Model

	Uses `deepseek-ai/DeepSeek-OCR-2` with DeepEncoder v2. Achieves 91.09% on OmniDocBench (+3.73% over v1).

	Configuration: 1024 base + 768 patches with dynamic cropping (2–6 patches). 144 tokens per patch + 256 base tokens.

	## How it works

	The model processes images and PDFs using a prompt-based interface with special tokens that control its behaviour:

	- `<image>` — replaced at inference time with visual patch embeddings from the input
	- `<\|grounding\|>` — activates layout detection; the model then annotates every element it finds with a label and bounding box coordinates
	- `<\|ref\|>label<\|/ref\|><\|det\|>[[x1,y1,x2,y2]]<\|/det\|>` — the format the model uses to output detected regions

	When grounding is active, the model self-labels regions as `title`, `text`, `image`, `table`, etc. Regions labelled `image` are automatically cropped out and appear in the Cropped Images tab. All regions get bounding boxes drawn in the Boxes tab.

	See [TECHNICAL.md](TECHNICAL.md) for a full breakdown of the pipeline, including some non-obvious implementation details.

	## Running locally

	```bash
	python3 -m venv .venv
	source .venv/bin/activate
	pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
	pip install -r requirements.txt
	pip install gradio spaces markdown pymdown-extensions
	python app.py
	```

	Requires a CUDA-capable GPU. The model is downloaded from HuggingFace on first run.

	## Secrets (Spaces + local)

	For Hugging Face Spaces, store tokens in Space Settings -> Variables and secrets.
	Use `HF_TOKEN` as the secret name.

	For local workflows:

	```bash
	cp .env.example .env
	# edit .env and set HF_TOKEN=...
	set -a; source .env; set +a
	```

	`HF_TOKEN` is ignored by git via `.gitignore`.

	To stream Space logs with token-based auth:

	```bash
	./scripts/fetch_space_logs.sh ricklon/DeepSeek-OCR-2-Math run
	./scripts/fetch_space_logs.sh ricklon/DeepSeek-OCR-2-Math build
	```

	## TODO / Backlog

	- Add a LaTeX lint/correction pipeline for OCR output:
	- Detect malformed math with `chktex` (or equivalent).
	- Normalize equivalent expressions (for example `^2` vs `^{2}`) before display/export.
	- Apply safe auto-fixes for common OCR-LaTeX artifacts.