HuggingFaceH4/ultrachat_200k
Viewer β’ Updated β’ 515k β’ 58k β’ 737
Nutral v2.1 Tiny is a lightweight decoder-only Transformer language model built using the custom Nutral v2.1 Architecture. The model is designed for educational research, experimentation, chatbot development, and low-resource deployment.
The architecture uses RMSNorm, Multi-Head Self Attention, GELU feed-forward networks, and causal language modeling objectives.
| Parameter | Value |
|---|---|
| Model Name | Nutral v2.1 Tiny |
| Total Parameters | ~15.2 Million |
| Vocabulary Size | 50,257 |
| Tokenizer | GPT-2 |
| Hidden Size | 256 |
| Transformer Layers | 4 |
| Attention Heads | 4 |
| Context Length | 256 Tokens |
| Activation Function | GELU |
| Normalization | RMSNorm |
| Attention Type | Causal Self-Attention |
The model was trained on:
| Setting | Value |
|---|---|
| Target Training Tokens | ~100 Million |
| Learning Rate | 8e-4 |
| Weight Decay | 0.01 |
| Optimizer | AdamW Torch Fused |
| Batch Size | 8 |
| Gradient Accumulation | 4 |
| Effective Batch Size | 32 |
| Precision | FP16 |
| Max Training Steps | 6100 |
Each Nutral Block contains:
Nutral v2.1 Tiny is suitable for:
Built using:
Apache 2.0
Nutral v2.1 Tiny is released as an open-source project for research, education, and community-driven development.