hbfreed commited on
Commit
07fedeb
·
verified ·
1 Parent(s): 57e3fca

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -14
README.md CHANGED
@@ -10,7 +10,7 @@ pipeline_tag: text-generation
10
 
11
  # pruned_olmo3_5120_16_29
12
 
13
- > ⚠️ **WARNING: This model is PRUNED ONLY, NOT retrained or distilled!**
14
  >
15
  > Performance will be degraded compared to the original model. This is a structural pruning checkpoint intended as a starting point for knowledge distillation or fine-tuning.
16
 
@@ -20,22 +20,16 @@ Structurally pruned version of [allenai/OLMo-3-7B-Instruct](https://huggingface.
20
 
21
  ## Pruning Configuration
22
 
23
- - **Hidden size**: 5120
24
- - **Num attention heads**: 16
25
- - **Num layers**: 29
26
- - **Original model**: allenai/OLMo-3-7B-Instruct (hidden=4096, heads=32, layers=32)
 
 
27
 
28
- ## Usage
29
-
30
-
31
-
32
- ## ⚠️ Important Notes
33
 
34
  1. **This model has NOT been retrained** after pruning
35
  2. **Performance will be significantly degraded** compared to the original
36
  3. **Intended use**: Starting checkpoint for distillation/fine-tuning
37
  4. For the distillation training data, see [hbfreed/dolci-distill-packed](https://huggingface.co/datasets/hbfreed/dolci-distill-packed)
38
-
39
- ## Citation
40
-
41
- If you use this model, please cite the original OLMo work.
 
10
 
11
  # pruned_olmo3_5120_16_29
12
 
13
+ > **WARNING: This model is PRUNED ONLY, NOT retrained or distilled\!**
14
  >
15
  > Performance will be degraded compared to the original model. This is a structural pruning checkpoint intended as a starting point for knowledge distillation or fine-tuning.
16
 
 
20
 
21
  ## Pruning Configuration
22
 
23
+ | Parameter | Original | Pruned |
24
+ |-----------|----------|--------|
25
+ | Intermediate size (MLP) | 11008 | 5120 |
26
+ | Attention heads | 32 | 16 |
27
+ | Layers | 32 | 29 |
28
+ | Hidden size | 4096 | 4096 (unchanged) |
29
 
30
+ ## Important Notes
 
 
 
 
31
 
32
  1. **This model has NOT been retrained** after pruning
33
  2. **Performance will be significantly degraded** compared to the original
34
  3. **Intended use**: Starting checkpoint for distillation/fine-tuning
35
  4. For the distillation training data, see [hbfreed/dolci-distill-packed](https://huggingface.co/datasets/hbfreed/dolci-distill-packed)