Small-scale faithful replicas of the DeepSeek-V4 architecture for ablation and weight-transfer research.
-
kshitijthakkar/deepseek-v4-mini-300M-init
Text Generation • 0.3B • Updated • 19 -
kshitijthakkar/deepseek-v4-mini-1B-init
Text Generation • 1B • Updated • 17 -
kshitijthakkar/deepseek-v4-mini-3B-init
Text Generation • 3B • Updated • 9 -
kshitijthakkar/deepseek-v4-mini-6B-init
Text Generation • 8B • Updated • 42