File size: 888 Bytes
27b0c6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# DeepSeek-3B-MoE-Decoder

This is the decoder component of DeepSeek-OCR, a 3B parameter Mixture-of-Experts (MoE) language model.

## Architecture

- **Model**: DeepSeek 3B MoE
- **Active Parameters**: ~570M per token
- **Total Parameters**: ~3B
- **Architecture**: Mixture-of-Experts with routing

## Usage

This decoder should be used with vision embeddings from the encoder component.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load decoder
model = AutoModelForCausalLM.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
tokenizer = AutoTokenizer.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")

# Use with vision embeddings from encoder
# vision_embeddings = ... (from DeepEncoder)
# outputs = model(inputs_embeds=vision_embeddings, ...)
```

## Source

Extracted from [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)