buley commited on
Commit
3ff9bcb
·
verified ·
1 Parent(s): a3f10e5

Update model card with proper metadata and documentation

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: gguf
6
+ tags:
7
+ - gguf
8
+ - deepseek
9
+ - ocr
10
+ - document
11
+ - vision
12
+ - affectively
13
+ - edgework
14
+ - aether
15
+ - distributed-inference
16
+ - edge-deployment
17
+ base_model: deepseek-ai/deepseek-vl2-tiny
18
+ base_model_relation: quantized
19
+ pipeline_tag: image-text-to-text
20
+ ---
21
+
22
+ # DeepSeek OCR 2 (GGUF, Q4_K_M)
23
+
24
+ > **Production-ready** GGUF quantization of [deepseek-ai/deepseek-vl2-tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) for distributed optical character recognition — powered by the [Aether](https://github.com/affectively-ai/aether) edge inference runtime.
25
+
26
+ ## Highlights
27
+
28
+ - **~2B parameters** — Second-generation OCR model based on DeepSeek VL2. Improved text extraction accuracy.
29
+ - **~2 GB** Q4_K_M quantized — optimized for distributed edge inference
30
+ - **LLaMA architecture** — proven, stable, well-tested
31
+ - **Aether runtime compatible** — layer-sharded across distributed nodes via [Edgework.ai](https://edgework.ai)
32
+
33
+ ## Model Details
34
+
35
+ | Property | Value |
36
+ |----------|-------|
37
+ | Base model | [deepseek-ai/deepseek-vl2-tiny](https://huggingface.co/deepseek-ai/deepseek-vl2-tiny) |
38
+ | Parameters | ~2B |
39
+ | Architecture | LLaMA |
40
+ | Quantization | Q4_K_M |
41
+ | Format | GGUF |
42
+ | Size | ~2 GB |
43
+ | License | mit |
44
+
45
+ ## Usage
46
+
47
+ ### With llama.cpp
48
+
49
+ ```bash
50
+ ./llama-cli -m deepseek-ocr-2-q4_k_m.gguf -p "Your prompt here" -n 256
51
+ ```
52
+
53
+ ### With Aether (Distributed Inference)
54
+
55
+ This model is deployed across the [Aether](https://github.com/affectively-ai/aether) distributed inference network. Weights are layer-sharded and distributed across multiple edge nodes for parallel inference.
56
+
57
+ ## Deployment Architecture
58
+
59
+ This model runs on the **Aether distributed inference runtime** — our custom engine that shards model layers across multiple nodes for parallel execution:
60
+
61
+ 1. **Coordinator** receives requests and manages token generation
62
+ 2. **Layer nodes** each hold a subset of model layers
63
+ 3. **Hidden states flow** between nodes via gRPC
64
+ 4. **Zero cold start** via warm pool scheduling
65
+
66
+ Deployed via [Edgework.ai](https://edgework.ai) — bringing fast, cheap, and private inference as close to the user as possible.
67
+
68
+ ## About
69
+
70
+ Published by [AFFECTIVELY](https://huggingface.co/affectively-ai) · Managed by [@buley](https://huggingface.co/buley)
71
+
72
+ We quantize and publish **production-ready models** for distributed edge inference via the [Aether](https://github.com/affectively-ai/aether) runtime. Every release is tested for correctness and stability before publication.
73
+
74
+ - [All models](https://huggingface.co/affectively-ai) · [GitHub](https://github.com/affectively-ai) · [Edgework.ai](https://edgework.ai)