|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- test |
|
|
- custom-architecture |
|
|
- deepseek |
|
|
--- |
|
|
|
|
|
# DeepSeek-R1-Channel-INT8_4L (4 Layers) |
|
|
|
|
|
⚠️ **For Testing Purposes Only** |
|
|
This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with **random weights**, used for architecture experiments. |
|
|
|
|
|
## Key Modifications |
|
|
- Reduced to **4 layers** |
|
|
- Contains: |
|
|
- First 3 layers: **MLA** (Multi-head Latent Attention) |
|
|
- Layer 4: **MoE** (Mixture of Experts) |
|
|
- All weights randomly initialized (not performance-optimized) |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM |
|
|
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L") |