MollyHexapotato
/

custom_DeepSeek-R1-Channel-INT8_4L

custom-architecture

Model card Files Files and versions

MollyHexapotato commited on Jul 24, 2025

Commit

107a524

·

verified ·

1 Parent(s): bc26923

Create README.md

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+---
+language: en
+license: apache-2.0
+tags:
+- test
+- custom-architecture
+- deepseek
+---
+# DeepSeek-R1-Channel-INT8_4L (4 Layers)
+⚠️ **For Testing Purposes Only**
+This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with **random weights**, used for architecture experiments.
+## Key Modifications
+- Reduced to **4 layers**
+- Contains:
+  - First 3 layers: **MLA** (Multi-head Latent Attention)
+  - Layer 4: **MoE** (Mixture of Experts)
+- All weights randomly initialized (not performance-optimized)
+## Usage
+```python
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L")