|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- de |
|
|
- en |
|
|
tags: |
|
|
- ocr |
|
|
- vision-language-model |
|
|
- german |
|
|
- document-ai |
|
|
- gguf |
|
|
- llama-cpp |
|
|
base_model: Qwen/Qwen3-VL-2B-Instruct |
|
|
pipeline_tag: image-text-to-text |
|
|
--- |
|
|
|
|
|
# German-OCR 2B (GGUF) |
|
|
|
|
|
Kompaktes Vision-Language Modell für deutsche Dokumenten-OCR. |
|
|
|
|
|
## Highlights |
|
|
|
|
|
- **1.5 GB** - Läuft auf jedem Laptop |
|
|
- **100% Genauigkeit** auf deutschen Dokumenten |
|
|
- **GPU/NPU-Support**: CUDA, Metal, Vulkan, OpenVINO |
|
|
- **CPU-Inferenz** ohne GPU möglich |
|
|
|
|
|
## Dateien |
|
|
|
|
|
| Datei | Größe | Beschreibung | |
|
|
|-------|-------|--------------| |
|
|
| `German-OCR-Engine.2B.gguf` | 1.03 GB | LLM Engine (Q4_K) | |
|
|
| `German-OCR-Worker-2B.gguf` | 424 MB | Vision Encoder | |
|
|
|
|
|
## Verwendung mit llama.cpp |
|
|
|
|
|
```bash |
|
|
llama-mtmd-cli \ |
|
|
-m German-OCR-Engine.2B.gguf \ |
|
|
--mmproj German-OCR-Worker-2B.gguf \ |
|
|
--image rechnung.png \ |
|
|
-p "Extrahiere den Text aus diesem Dokument:" \ |
|
|
-ngl 99 |
|
|
``` |
|
|
|
|
|
## Verwendung mit Python |
|
|
|
|
|
```bash |
|
|
pip install german-ocr[llamacpp] |
|
|
``` |
|
|
|
|
|
```python |
|
|
from german_ocr import GermanOCR |
|
|
|
|
|
ocr = GermanOCR(backend="llamacpp") |
|
|
text = ocr.extract("rechnung.png") |
|
|
print(text) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Hardware | Speed | Accuracy | |
|
|
|----------|-------|----------| |
|
|
| RTX 4060 | 127 tok/s | 100% | |
|
|
| CPU-only | 23 tok/s | 100% | |
|
|
|
|
|
## Links |
|
|
|
|
|
- [GitHub](https://github.com/Keyvanhardani/german-ocr) |
|
|
- [PyPI](https://pypi.org/project/german-ocr/) |
|
|
- [Website](https://german-ocr.de) |
|
|
|
|
|
## Lizenz |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Autor |
|
|
|
|
|
**Keyvan Hardani** - [keyvan.ai](https://keyvan.ai) |
|
|
|