Update README.md
Browse files
README.md
CHANGED
|
@@ -3,4 +3,14 @@ library_name: transformers
|
|
| 3 |
tags: []
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
tags: []
|
| 4 |
---
|
| 5 |
|
| 6 |
+
Moondream 3 (Preview) is vision language model with a mixture of experts architecture (9B total parameters, 2B active).
|
| 7 |
+
|
| 8 |
+
Architecture details:
|
| 9 |
+
|
| 10 |
+
1. 24 layers; the first four are dense, the rest have MoE FFNs with 64 experts, 8 activated per token
|
| 11 |
+
2. MoE FFNs have GeGLU architecture, with inner/gate dim of 1024. The model's hidden dim is 2048.
|
| 12 |
+
3. Usable context length increased to 32K, with [a custom efficient SuperBPE tokenizer](https://huggingface.co/moondream/starmie-v1)
|
| 13 |
+
4. Multi-headed attention with learned position- and data-dependent temperature scaling
|
| 14 |
+
5. Vision encoder initialized from SigLIP-SO-400M, with multi-crop channel concatenation for token-efficient high resolution image processing
|
| 15 |
+
|
| 16 |
+
For more details, please refer to our ||coming soon release blog post||.
|