--- title: DeepSeek OCR 2 — Math Rendering Edition emoji: 🧮 colorFrom: red colorTo: blue sdk: gradio sdk_version: 6.8.0 app_file: app.py pinned: true short_description: DeepSeek-OCR-2 with MathJax math rendering license: mit python_version: "3.12" suggested_hardware: zero-a10g --- # DeepSeek-OCR-2 — Math Rendering Edition Built on top of the excellent [DeepSeek-OCR-2 Demo](https://huggingface.co/spaces/merterbak/DeepSeek-OCR-2) by **Mert Erbak**. Many thanks for the clean foundation — the OCR pipeline, PDF support, bounding box visualisation, and grounding features are all his work. ## What's new in this fork - **MathJax rendering** — the Markdown Preview tab now renders LaTeX math notation (inline `$...$` and display `$$...$$`) using MathJax 3, so equations from scanned papers and textbooks display as proper math rather than raw LaTeX source. ## Features (inherited + extended) | Feature | Description | |---|---| | 📋 Markdown | Convert documents to structured markdown with layout detection | | 📝 Free OCR | Simple text extraction without layout analysis | | 📍 Locate | Find and highlight specific text or elements with bounding boxes | | 🔍 Describe | General image description | | ✏️ Custom | Provide your own prompt | | 🧮 Math Preview | Rendered MathJax output for equations and formulas *(new)* | ## Model Uses `deepseek-ai/DeepSeek-OCR-2` with DeepEncoder v2. Achieves **91.09% on OmniDocBench** (+3.73% over v1). Configuration: 1024 base + 768 patches with dynamic cropping (2–6 patches). 144 tokens per patch + 256 base tokens. ## How it works The model processes images and PDFs using a prompt-based interface with special tokens that control its behaviour: - **``** — replaced at inference time with visual patch embeddings from the input - **`<|grounding|>`** — activates layout detection; the model then annotates every element it finds with a label and bounding box coordinates - **`<|ref|>label<|/ref|><|det|>[[x1,y1,x2,y2]]<|/det|>`** — the format the model uses to output detected regions When grounding is active, the model self-labels regions as `title`, `text`, `image`, `table`, etc. Regions labelled `image` are automatically cropped out and appear in the **Cropped Images** tab. All regions get bounding boxes drawn in the **Boxes** tab. See [TECHNICAL.md](TECHNICAL.md) for a full breakdown of the pipeline, including some non-obvious implementation details. ## Running locally ```bash python3 -m venv .venv source .venv/bin/activate pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124 pip install -r requirements.txt pip install gradio spaces markdown pymdown-extensions python app.py ``` Requires a CUDA-capable GPU. The model is downloaded from HuggingFace on first run. ## Secrets (Spaces + local) For Hugging Face Spaces, store tokens in **Space Settings -> Variables and secrets**. Use `HF_TOKEN` as the secret name. For local workflows: ```bash cp .env.example .env # edit .env and set HF_TOKEN=... set -a; source .env; set +a ``` `HF_TOKEN` is ignored by git via `.gitignore`. To stream Space logs with token-based auth: ```bash ./scripts/fetch_space_logs.sh ricklon/DeepSeek-OCR-2-Math run ./scripts/fetch_space_logs.sh ricklon/DeepSeek-OCR-2-Math build ``` ## TODO / Backlog - Add a LaTeX lint/correction pipeline for OCR output: - Detect malformed math with `chktex` (or equivalent). - Normalize equivalent expressions (for example `^2` vs `^{2}`) before display/export. - Apply safe auto-fixes for common OCR-LaTeX artifacts.