MollyHexapotato
/

custom_DeepSeek-R1-Channel-INT8_4L

custom-architecture

Model card Files Files and versions

custom_DeepSeek-R1-Channel-INT8_4L / README.md

MollyHexapotato's picture

MollyHexapotato

Create README.md

107a524 verified 6 months ago

|

history blame contribute delete

680 Bytes

	---
	language: en
	license: apache-2.0
	tags:
	- test
	- custom-architecture
	- deepseek
	---

	# DeepSeek-R1-Channel-INT8_4L (4 Layers)

	⚠️ For Testing Purposes Only
	This is a modified version of meituan/DeepSeek-R1-Channel-INT8 with random weights, used for architecture experiments.

	## Key Modifications
	- Reduced to 4 layers
	- Contains:
	- First 3 layers: MLA (Multi-head Latent Attention)
	- Layer 4: MoE (Mixture of Experts)
	- All weights randomly initialized (not performance-optimized)

	## Usage
	```python
	from transformers import AutoModelForCausalLM
	model = AutoModelForCausalLM.from_pretrained("MollyHexapotato/custom_DeepSeek-R1-Channel-INT8_4L")