Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ pipeline_tag: text-generation
|
|
| 10 |
|
| 11 |
# pruned_olmo3_5120_16_29
|
| 12 |
|
| 13 |
-
>
|
| 14 |
>
|
| 15 |
> Performance will be degraded compared to the original model. This is a structural pruning checkpoint intended as a starting point for knowledge distillation or fine-tuning.
|
| 16 |
|
|
@@ -20,22 +20,16 @@ Structurally pruned version of [allenai/OLMo-3-7B-Instruct](https://huggingface.
|
|
| 20 |
|
| 21 |
## Pruning Configuration
|
| 22 |
|
| 23 |
-
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
##
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
## ⚠️ Important Notes
|
| 33 |
|
| 34 |
1. **This model has NOT been retrained** after pruning
|
| 35 |
2. **Performance will be significantly degraded** compared to the original
|
| 36 |
3. **Intended use**: Starting checkpoint for distillation/fine-tuning
|
| 37 |
4. For the distillation training data, see [hbfreed/dolci-distill-packed](https://huggingface.co/datasets/hbfreed/dolci-distill-packed)
|
| 38 |
-
|
| 39 |
-
## Citation
|
| 40 |
-
|
| 41 |
-
If you use this model, please cite the original OLMo work.
|
|
|
|
| 10 |
|
| 11 |
# pruned_olmo3_5120_16_29
|
| 12 |
|
| 13 |
+
> **WARNING: This model is PRUNED ONLY, NOT retrained or distilled\!**
|
| 14 |
>
|
| 15 |
> Performance will be degraded compared to the original model. This is a structural pruning checkpoint intended as a starting point for knowledge distillation or fine-tuning.
|
| 16 |
|
|
|
|
| 20 |
|
| 21 |
## Pruning Configuration
|
| 22 |
|
| 23 |
+
| Parameter | Original | Pruned |
|
| 24 |
+
|-----------|----------|--------|
|
| 25 |
+
| Intermediate size (MLP) | 11008 | 5120 |
|
| 26 |
+
| Attention heads | 32 | 16 |
|
| 27 |
+
| Layers | 32 | 29 |
|
| 28 |
+
| Hidden size | 4096 | 4096 (unchanged) |
|
| 29 |
|
| 30 |
+
## Important Notes
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
1. **This model has NOT been retrained** after pruning
|
| 33 |
2. **Performance will be significantly degraded** compared to the original
|
| 34 |
3. **Intended use**: Starting checkpoint for distillation/fine-tuning
|
| 35 |
4. For the distillation training data, see [hbfreed/dolci-distill-packed](https://huggingface.co/datasets/hbfreed/dolci-distill-packed)
|
|
|
|
|
|
|
|
|
|
|
|