File size: 680 Bytes
107a524
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
language: en
license: apache-2.0
tags:
- test
- custom-architecture
- deepseek
---

# DeepSeek-R1-Channel-INT8_4L (4 Layers)

⚠️ **For Testing Purposes Only**  
This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with **random weights**, used for architecture experiments.

## Key Modifications
- Reduced to **4 layers** 
- Contains:
  - First 3 layers: **MLA** (Multi-head Latent Attention)
  - Layer 4: **MoE** (Mixture of Experts)
- All weights randomly initialized (not performance-optimized)

## Usage
```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L")