MollyHexapotato's picture
Create README.md
107a524 verified
---
language: en
license: apache-2.0
tags:
- test
- custom-architecture
- deepseek
---
# DeepSeek-R1-Channel-INT8_4L (4 Layers)
⚠️ **For Testing Purposes Only**
This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with **random weights**, used for architecture experiments.
## Key Modifications
- Reduced to **4 layers**
- Contains:
- First 3 layers: **MLA** (Multi-head Latent Attention)
- Layer 4: **MoE** (Mixture of Experts)
- All weights randomly initialized (not performance-optimized)
## Usage
```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L")