| # DeepSeek-V3 Mini 50M Parameters | |
| A compact version of DeepSeek-V3 Mini with exactly **58,283,136 parameters** (reduced from ~181M). | |
| ## Model Specifications | |
| - **Parameters**: 58,283,136 | |
| - **Hidden Size**: 448 | |
| - **Layers**: 6 | |
| - **Attention Heads**: 8 | |
| - **Intermediate Size**: 1200 | |
| - **Memory (FP16)**: ~111.2 MB | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("./deepseek_v3_mini_50m") | |
| tokenizer = AutoTokenizer.from_pretrained("./deepseek_v3_mini_50m") | |
| # Quick test | |
| inputs = tokenizer("The future of AI is", return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=50) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| ## Reductions Applied | |
| - Hidden Size: 448 | |
| - Layers: 6 | |
| - Attention Heads: 8 | |
| - Intermediate Size: 1200 | |
| - KV LoRA Rank: 96 | |