Image-Text-to-Text
Transformers
GGUF
german
deutsch
ocr
vision
document-ai
invoice
rechnung
structured-extraction
json-extraction
kie
ollama
vllm
llama-cpp
apache-2.0
conversational
Instructions to use Keyven/german-ocr-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Keyven/german-ocr-3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Keyven/german-ocr-3") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Keyven/german-ocr-3", dtype="auto") - llama-cpp-python
How to use Keyven/german-ocr-3 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Keyven/german-ocr-3", filename="german-ocr-3-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Keyven/german-ocr-3 with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Keyven/german-ocr-3:Q4_K_M # Run inference directly in the terminal: llama cli -hf Keyven/german-ocr-3:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Keyven/german-ocr-3:Q4_K_M # Run inference directly in the terminal: llama cli -hf Keyven/german-ocr-3:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Keyven/german-ocr-3:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Keyven/german-ocr-3:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Keyven/german-ocr-3:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Keyven/german-ocr-3:Q4_K_M
Use Docker
docker model run hf.co/Keyven/german-ocr-3:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Keyven/german-ocr-3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Keyven/german-ocr-3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Keyven/german-ocr-3", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Keyven/german-ocr-3:Q4_K_M
- SGLang
How to use Keyven/german-ocr-3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Keyven/german-ocr-3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Keyven/german-ocr-3", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Keyven/german-ocr-3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Keyven/german-ocr-3", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use Keyven/german-ocr-3 with Ollama:
ollama run hf.co/Keyven/german-ocr-3:Q4_K_M
- Unsloth Studio
How to use Keyven/german-ocr-3 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Keyven/german-ocr-3 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Keyven/german-ocr-3 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Keyven/german-ocr-3 to start chatting
- Pi
How to use Keyven/german-ocr-3 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Keyven/german-ocr-3:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Keyven/german-ocr-3:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Keyven/german-ocr-3 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Keyven/german-ocr-3:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Keyven/german-ocr-3:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Keyven/german-ocr-3 with Docker Model Runner:
docker model run hf.co/Keyven/german-ocr-3:Q4_K_M
- Lemonade
How to use Keyven/german-ocr-3 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Keyven/german-ocr-3:Q4_K_M
Run and chat with the model
lemonade run user.german-ocr-3-Q4_K_M
List all available models
lemonade list
| language: | |
| - de | |
| - en | |
| - fr | |
| - es | |
| - ar | |
| - fa | |
| - it | |
| - sv | |
| - ru | |
| - zh | |
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - german | |
| - deutsch | |
| - ocr | |
| - vision | |
| - document-ai | |
| - invoice | |
| - rechnung | |
| - structured-extraction | |
| - json-extraction | |
| - kie | |
| - ollama | |
| - vllm | |
| - llama-cpp | |
| - apache-2.0 | |
| inference: true | |
| datasets: | |
| - neuralabs/german-synth-ocr | |
| - Aoschu/German_invoices_dataset_for_donut | |
| base_model: | |
| - Qwen/Qwen3.5-2B | |
| new_version: Keyven/german-ocr-3 | |
| <p align="center"> | |
| <img src="https://app.german-ocr.de/icon.png" alt="German-OCR-3" width="140" height="140" /> | |
| </p> | |
| <h1 align="center">German-OCR-3</h1> | |
| <p align="center"><strong>Deutsche Vision-OCR. Kompakt. Lokal. Open Source.</strong><br/> | |
| <sub>Aus deutschem Dokument-Bild → strikt validiertes JSON. In unter 60 Sekunden lokal lauffähig.</sub></p> | |
| <p align="center"> | |
| <a href="https://german-ocr.de"><img alt="Site" src="https://img.shields.io/badge/site-german--ocr.de-3B82F6?style=flat-square&labelColor=0B1220"/></a> | |
| <a href="https://ollama.com/Keyvan/german-ocr-3"><img alt="Ollama" src="https://img.shields.io/badge/Ollama-Keyvan%2Fgerman--ocr--3-F59E0B?style=flat-square&labelColor=0B1220"/></a> | |
| <a href="https://github.com/Keyvanhardani/German-OCR-3-Dev"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-source-181717?style=flat-square&labelColor=0B1220"/></a> | |
| <a href="#license"><img alt="License: Apache 2.0" src="https://img.shields.io/badge/License-Apache_2.0-22C55E?style=flat-square&labelColor=0B1220"/></a> | |
| <img alt="Language" src="https://img.shields.io/badge/lang-Deutsch-3B82F6?style=flat-square&labelColor=0B1220"/> | |
| <img alt="Hallucination" src="https://img.shields.io/badge/Halluzination-0%25-22C55E?style=flat-square&labelColor=0B1220"/> | |
| </p> | |
| --- | |
| ## ⚡ At a glance | |
| <table align="center"> | |
| <tr> | |
| <td align="center" width="180"><h2>100 %</h2><sub>Gültiges JSON</sub></td> | |
| <td align="center" width="180"><h2>95 %</h2><sub>Sender korrekt</sub></td> | |
| <td align="center" width="180"><h2>0 %</h2><sub>Halluzination</sub></td> | |
| <td align="center" width="180"><h2>5.0 s</h2><sub>Latenz / Doc</sub></td> | |
| </tr> | |
| </table> | |
| <p align="center"><sub>Auf <strong>200+ echten anonymisierten deutschen Rechnungen</strong> (Default-Edition, 2.7 GB)</sub></p> | |
| --- | |
| ## Was ist German-OCR-3? | |
| **German-OCR-3** ist eine kompakte, schnelle und voll lokal lauffähige **Vision-OCR-Distribution für deutsche Geschäftsdokumente** — Rechnungen, Briefe, Formulare, Quittungen, Bescheide. Aus dem Bild kommt **strikt validiertes JSON** nach unserem deutschen Extraktions-Schema. Ohne Cloud-Pflicht, ohne Vendor-Lock-in. | |
| Zwei Editionen, beide Apache 2.0, beide unter 3 GB: | |
| | Edition | Ollama | Größe | Zielhardware | Stärke | | |
| |---|---|---:|---|---| | |
| | **Nano** | `Keyvan/german-ocr-nano` | **1.0 GB** | CPU · Edge · Mobile | „läuft überall" | | |
| | **Default** ⭐ | `Keyvan/german-ocr-3` | **2.7 GB** | 4–6 GB VRAM | beste Field-Erkennung | | |
| > **Fine-tuned adapter** für deutsche Geschäftsdokument-Extraktion. Apache 2.0. | |
| --- | |
| ## 📊 Praxistest — 200+ echte deutsche Rechnungen (anonymisiert) | |
| <p align="center"> | |
| <img src="https://huggingface.co/Keyven/german-ocr-3/resolve/main/charts/02_ionos_validity.png" alt="Praxistest" width="820"/> | |
| </p> | |
| | Edition | Valid JSON | Sender korrekt | **Halluzination** | Latenz | | |
| |---|---:|---:|---:|---:| | |
| | `Keyvan/german-ocr-nano` | 84 % | 79 % | **0 %** | 6.6 s | | |
| | **`Keyvan/german-ocr-3`** ⭐ | **100 %** | **95 %** | **0 %** | **5.0 s** | | |
| **Keine "Mustermann"-Defaults.** German-OCR-3 liest echte Firma, Kundenadresse, Produkte, Beträge — statt zu raten. | |
| --- | |
| ## 📐 Größenvergleich | |
| <p align="center"> | |
| <img src="https://huggingface.co/Keyven/german-ocr-3/resolve/main/charts/01_size_vs_competitors.png" alt="Modellgrößen" width="820"/> | |
| </p> | |
| `german-ocr-3` (2.7 GB) ist **6× kleiner** als ein typischer 7B-OCR-VLM. Läuft auf einer **8 GB-Gaming-GPU** oder über CPU auf einem normalen Laptop. | |
| <p align="center"> | |
| <img src="https://huggingface.co/Keyven/german-ocr-3/resolve/main/charts/04_latency.png" alt="Latenz" width="620"/> | |
| </p> | |
| --- | |
| ## 🚀 Quickstart | |
| ### Ollama (empfohlen, eine Zeile) | |
| ```bash | |
| ollama pull Keyvan/german-ocr-3 | |
| ollama run Keyvan/german-ocr-3 "Extrahiere die Rechnung im Bild als JSON." ./meine_rechnung.png | |
| ``` | |
| <details> | |
| <summary><b>Beispiel-Output (anonymisiert, aus Praxistest)</b> — klicken zum Aufklappen</summary> | |
| ```json | |
| { | |
| "document_type": "invoice", | |
| "language": "de", | |
| "invoice_number": "100137xXXXXX", | |
| "invoice_date": "2024-01-22", | |
| "due_date": "2024-01-27", | |
| "sender": { | |
| "name": "IONOS SE", | |
| "address": "Elgendorfer Str. 57, 56410 Montabaur", | |
| "vat_id": "DE81556XXX", | |
| "iban": null | |
| }, | |
| "recipient": { | |
| "name": "Firma e.K.", | |
| "address": "Muster Straße 32, 80335 München", | |
| "customer_id": "5835XXX" | |
| }, | |
| "line_items": [ | |
| {"position": 1, "description": "Mail Business 1 Liz.", "quantity": 1, | |
| "unit": "Monat", "unit_price_net": 4.20, "amount_net": 4.20, "vat_rate": 19} | |
| ], | |
| "amount_net": 4.20, | |
| "amount_vat": 0.80, | |
| "amount_total": 5.00, | |
| "currency": "EUR", | |
| "notes": ["Entsprechend Ihrem SEPA-Lastschriftmandat ..."] | |
| } | |
| ``` | |
| </details> | |
| ### Python (via Ollama HTTP API) | |
| ```python | |
| import base64, json, requests | |
| from pathlib import Path | |
| b64 = base64.b64encode(Path("rechnung.png").read_bytes()).decode() | |
| resp = requests.post("http://localhost:11434/api/generate", json={ | |
| "model": "Keyvan/german-ocr-3", | |
| "prompt": "Extrahiere die Rechnung im Bild als JSON.", | |
| "images": [b64], | |
| "stream": False, | |
| "options": {"temperature": 0, "num_ctx": 32768}, | |
| }) | |
| data = json.loads(resp.json()["response"]) | |
| print(json.dumps(data, indent=2, ensure_ascii=False)) | |
| ``` | |
| ### Bundle herunterladen | |
| ```bash | |
| huggingface-cli download Keyven/german-ocr-3 --local-dir ./german-ocr-3 | |
| # Enthält: Modelfile · JSON-Schemas · System-Prompt · GGUF-Quants · Charts | |
| ``` | |
| ### llama.cpp (GGUF direkt) | |
| ```bash | |
| llama-cli -m ./german-ocr-3/german-ocr-3-Q4_K_M.gguf \ | |
| --system-prompt-file ./german-ocr-3/system_prompt.txt \ | |
| -p "Extrahiere die Rechnung als JSON:" --temp 0 | |
| ``` | |
| --- | |
| ## 📚 Trainings- und Evaluations-Datensätze | |
| | Datensatz | Umfang | Typ | | |
| |---|---|---| | |
| | [`neuralabs/german-synth-ocr`](https://huggingface.co/datasets/neuralabs/german-synth-ocr) | 4 500+ | Deutsche OCR-Samples (synthetisch, Apache-2.0) | | |
| | [`Aoschu/German_invoices_dataset_for_donut`](https://huggingface.co/datasets/Aoschu/German_invoices_dataset_for_donut) | 129 | Echte deutsche Rechnungen (Donut-Format) | | |
| | Eigenes synthetisches DE-Rechnungs-Set | 100 | Rechnungen mit Golden-JSON, deterministisch generiert | | |
| | Anonymisierter DACH-Praxistest | 200+ | Echte Rechnungen verschiedener DACH-Anbieter (intern, DSGVO) | | |
| --- | |
| ## 🎯 Zielgruppen | |
| - **Solo-Builder & Indies** — deutsche Dokumente lokal extrahieren, ohne Cloud-OCR-Kosten. | |
| - **DACH-KMU mit Datenschutz-Anspruch** — lokal / on-prem hosten. | |
| - **Agenturen & Studios** — Open-Source-Fundament unter der eigenen Pipeline. | |
| Wer es **gemanagt** und mit noch größeren Modellen will: | |
| > 🌐 **[german-ocr.de](https://german-ocr.de)** — gehostete deutsche OCR-API mit Premium-Modellen, höherer Genauigkeit, ohne eigene Hardware. Daten bleiben in der EU. | |
| --- | |
| ## ⚠️ Limitations | |
| - Optimiert für **deutsche** Dokumente — andere Sprachen keine Garantie. | |
| - Beste Qualität bei klaren, hochauflösenden Scans/Fotos. | |
| - Handschriftliche Dokumente: nur begrenzt. | |
| - Bei kritischen Vorgängen (Buchhaltung, Recht): **immer Human-in-the-Loop**. | |
| --- | |
| ## 🙏 Credit & Attribution | |
| German-OCR-3 baut auf der hervorragenden Arbeit des **Qwen-Teams bei Alibaba Group** auf. Die zugrundeliegende Vision-Language-Architektur stammt aus der **Qwen 3.5 Small Series**, veröffentlicht unter Apache License 2.0. Ohne die offene Forschung und die saubere Veröffentlichung der Qwen-Weights wäre dieses Projekt nicht möglich. | |
| - **Qwen 3.5** — https://huggingface.co/Qwen · https://qwen.ai | |
| - **Apache License 2.0** (Weights) — © 2025–2026 Qwen Team, Alibaba Group | |
| - **Qwen2.5-VL Technical Report** — [arXiv:2502.13923](https://arxiv.org/abs/2502.13923) | |
| Vollständiger Attribution-Text in [`NOTICE`](NOTICE). | |
| --- | |
| ## <a id="license"></a>📄 License | |
| **Apache License 2.0** für die gesamte German-OCR-3-Distribution (Modelfiles, System-Prompt, Schemas, Docs, GGUFs). | |
| --- | |
| ## 📑 Citation | |
| Wenn du German-OCR-3 in Forschung oder Produktion verwendest, zitiere bitte **beides** — unsere Distribution und die Qwen-Basisarbeit: | |
| ```bibtex | |
| @misc{german_ocr_3_2026, | |
| title = {German-OCR-3: A compact German document-OCR distribution}, | |
| author = {Hardani, Keyvan}, | |
| year = {2026}, | |
| url = {https://github.com/Keyvanhardani/German-OCR} | |
| } | |
| @misc{qwen35_2026, | |
| title = {Qwen 3.5 Small Series}, | |
| author = {{Qwen Team, Alibaba Group}}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/Qwen}}, | |
| note = {Apache License 2.0} | |
| } | |
| @article{qwen25vl_2025, | |
| title = {Qwen2.5-VL Technical Report}, | |
| author = {{Qwen Team, Alibaba Group}}, | |
| journal = {arXiv preprint arXiv:2502.13923}, | |
| year = {2025} | |
| } | |
| ``` | |
| --- | |
| ## 👤 Author | |
| **Keyvan Hardani** | |
| · Website: [keyvan.ai](https://keyvan.ai) | |
| · LinkedIn: [linkedin.com/in/keyvanhardani](https://linkedin.com/in/keyvanhardani) | |
| · GitHub: [@Keyvanhardani](https://github.com/Keyvanhardani) | |
| · Hosted Premium: [german-ocr.de](https://german-ocr.de) | |