Spaces:
Running on Zero
Running on Zero
| title: DeepSeek OCR 2 — Math Rendering Edition | |
| emoji: 🧮 | |
| colorFrom: red | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.8.0 | |
| app_file: app.py | |
| pinned: true | |
| short_description: DeepSeek-OCR-2 with MathJax math rendering | |
| license: mit | |
| python_version: "3.12" | |
| suggested_hardware: zero-a10g | |
| # DeepSeek-OCR-2 — Math Rendering Edition | |
| Built on top of the excellent [DeepSeek-OCR-2 Demo](https://huggingface.co/spaces/merterbak/DeepSeek-OCR-2) by **Mert Erbak**. Many thanks for the clean foundation — the OCR pipeline, PDF support, bounding box visualisation, and grounding features are all his work. | |
| ## What's new in this fork | |
| - **MathJax rendering** — the Markdown Preview tab now renders LaTeX math notation (inline `$...$` and display `$$...$$`) using MathJax 3, so equations from scanned papers and textbooks display as proper math rather than raw LaTeX source. | |
| ## Features (inherited + extended) | |
| | Feature | Description | | |
| |---|---| | |
| | 📋 Markdown | Convert documents to structured markdown with layout detection | | |
| | 📝 Free OCR | Simple text extraction without layout analysis | | |
| | 📍 Locate | Find and highlight specific text or elements with bounding boxes | | |
| | 🔍 Describe | General image description | | |
| | ✏️ Custom | Provide your own prompt | | |
| | 🧮 Math Preview | Rendered MathJax output for equations and formulas *(new)* | | |
| ## Model | |
| Uses `deepseek-ai/DeepSeek-OCR-2` with DeepEncoder v2. Achieves **91.09% on OmniDocBench** (+3.73% over v1). | |
| Configuration: 1024 base + 768 patches with dynamic cropping (2–6 patches). 144 tokens per patch + 256 base tokens. | |
| ## How it works | |
| The model processes images and PDFs using a prompt-based interface with special tokens that control its behaviour: | |
| - **`<image>`** — replaced at inference time with visual patch embeddings from the input | |
| - **`<|grounding|>`** — activates layout detection; the model then annotates every element it finds with a label and bounding box coordinates | |
| - **`<|ref|>label<|/ref|><|det|>[[x1,y1,x2,y2]]<|/det|>`** — the format the model uses to output detected regions | |
| When grounding is active, the model self-labels regions as `title`, `text`, `image`, `table`, etc. Regions labelled `image` are automatically cropped out and appear in the **Cropped Images** tab. All regions get bounding boxes drawn in the **Boxes** tab. | |
| See [TECHNICAL.md](TECHNICAL.md) for a full breakdown of the pipeline, including some non-obvious implementation details. | |
| ## Running locally | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124 | |
| pip install -r requirements.txt | |
| pip install gradio spaces markdown pymdown-extensions | |
| python app.py | |
| ``` | |
| Requires a CUDA-capable GPU. The model is downloaded from HuggingFace on first run. | |
| ## Secrets (Spaces + local) | |
| For Hugging Face Spaces, store tokens in **Space Settings -> Variables and secrets**. | |
| Use `HF_TOKEN` as the secret name. | |
| For local workflows: | |
| ```bash | |
| cp .env.example .env | |
| # edit .env and set HF_TOKEN=... | |
| set -a; source .env; set +a | |
| ``` | |
| `HF_TOKEN` is ignored by git via `.gitignore`. | |
| To stream Space logs with token-based auth: | |
| ```bash | |
| ./scripts/fetch_space_logs.sh ricklon/DeepSeek-OCR-2-Math run | |
| ./scripts/fetch_space_logs.sh ricklon/DeepSeek-OCR-2-Math build | |
| ``` | |
| ## TODO / Backlog | |
| - Add a LaTeX lint/correction pipeline for OCR output: | |
| - Detect malformed math with `chktex` (or equivalent). | |
| - Normalize equivalent expressions (for example `^2` vs `^{2}`) before display/export. | |
| - Apply safe auto-fixes for common OCR-LaTeX artifacts. | |