forkjoin-ai
/

deepseek-ocr-2

+---
+language:
+  - en
+license: mit
+library_name: gguf
+tags:
+  - gguf
+  - deepseek
+  - ocr
+  - document
+  - vision
+  - affectively
+  - edgework
+  - aether
+  - distributed-inference
+  - edge-deployment
+base_model: deepseek-ai/deepseek-vl2-tiny
+base_model_relation: quantized
+pipeline_tag: image-text-to-text
+---
+# DeepSeek OCR 2 (GGUF, Q4_K_M)
+> **Production-ready** GGUF quantization of [deepseek-ai/deepseek-vl2-tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) for distributed optical character recognition — powered by the [Aether](https://github.com/affectively-ai/aether) edge inference runtime.
+## Highlights
+- **~2B parameters** — Second-generation OCR model based on DeepSeek VL2. Improved text extraction accuracy.
+- **~2 GB** Q4_K_M quantized — optimized for distributed edge inference
+- **LLaMA architecture** — proven, stable, well-tested
+- **Aether runtime compatible** — layer-sharded across distributed nodes via [Edgework.ai](https://edgework.ai)
+## Model Details
+| Property | Value |
+|----------|-------|
+| Base model | [deepseek-ai/deepseek-vl2-tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) |
+| Parameters | ~2B |
+| Architecture | LLaMA |
+| Quantization | Q4_K_M |
+| Format | GGUF |
+| Size | ~2 GB |
+| License | mit |
+## Usage
+### With llama.cpp
+```bash
+./llama-cli -m deepseek-ocr-2-q4_k_m.gguf -p "Your prompt here" -n 256
+```
+### With Aether (Distributed Inference)
+This model is deployed across the [Aether](https://github.com/affectively-ai/aether) distributed inference network. Weights are layer-sharded and distributed across multiple edge nodes for parallel inference.
+## Deployment Architecture
+This model runs on the **Aether distributed inference runtime** — our custom engine that shards model layers across multiple nodes for parallel execution:
+1. **Coordinator** receives requests and manages token generation
+2. **Layer nodes** each hold a subset of model layers
+3. **Hidden states flow** between nodes via gRPC
+4. **Zero cold start** via warm pool scheduling
+Deployed via [Edgework.ai](https://edgework.ai) — bringing fast, cheap, and private inference as close to the user as possible.
+## About
+Published by [AFFECTIVELY](https://huggingface.co/affectively-ai) · Managed by [@buley](https://huggingface.co/buley)
+We quantize and publish **production-ready models** for distributed edge inference via the [Aether](https://github.com/affectively-ai/aether) runtime. Every release is tested for correctness and stability before publication.
+- [All models](https://huggingface.co/affectively-ai) · [GitHub](https://github.com/affectively-ai) · [Edgework.ai](https://edgework.ai)