lapp0 commited on
Commit
f1d0d6e
·
verified ·
1 Parent(s): 2ca7fb4

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 201.4308
19
- - eval_frwikippl: 134811.7969
20
- - eval_zhwikippl: 2802169.0
21
- - eval_tinystoriesppl: 8.5017
22
- - eval_loss: 1.1632
23
- - eval_runtime: 13.1959
24
- - eval_samples_per_second: 75.781
25
- - eval_steps_per_second: 9.473
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,7 +47,7 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 0.04
51
  - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
@@ -57,38 +57,38 @@ The following hyperparameters were used during training:
57
  - num_epochs: 1.0
58
 
59
  ### Resource Usage
60
- Peak GPU Memory: 8.0557 GB
61
 
62
  ### Eval-Phase Metrics
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
66
- | 0 | 0 | 106439.8672 | 83269.1172 | 6.7670 | 13.084 | 76.429 | 9.554 | 124919.8828 | 108523.1641 |
67
- | 500 | 0.0404 | 540.4661 | 2634702.25 | 4.5202 | 13.2137 | 75.679 | 9.46 | 8.4174 | 83522400.0 |
68
- | 1000 | 0.0808 | 1497.5195 | 15870873.0 | 2.2889 | 13.1749 | 75.902 | 9.488 | 14.0065 | 464464704.0 |
69
- | 1500 | 0.1212 | 1866.7501 | 54564276.0 | 1.7749 | 13.1457 | 76.071 | 9.509 | 12.2171 | 3062959872.0 |
70
- | 2000 | 0.1616 | 354.3223 | 675337.5625 | 1.3000 | 13.1662 | 75.952 | 9.494 | 9.5618 | 7180616.0 |
71
- | 2500 | 0.2020 | 237.2996 | 200161.4531 | 1.2027 | 13.1313 | 76.154 | 9.519 | 9.2355 | 2578359.25 |
72
- | 3000 | 0.2424 | 209.1669 | 144669.0625 | 1.1789 | 13.1334 | 76.142 | 9.518 | 8.7617 | 2323565.0 |
73
- | 3500 | 0.2828 | 199.7487 | 140412.8906 | 1.1786 | 13.1764 | 75.893 | 9.487 | 8.3659 | 2391488.5 |
74
- | 4000 | 0.3232 | 194.1800 | 130293.7578 | 1.1813 | 13.1424 | 76.089 | 9.511 | 8.1468 | 2006979.125 |
75
- | 4500 | 0.3636 | 192.8458 | 132104.8594 | 1.1847 | 13.2475 | 75.486 | 9.436 | 8.0278 | 1976689.25 |
76
- | 5000 | 0.4040 | 204.8733 | 171334.5781 | 1.1910 | 13.2362 | 75.55 | 9.444 | 7.8049 | 7161488.0 |
77
- | 5500 | 0.4444 | 195.0279 | 158004.8125 | 1.1950 | 13.2423 | 75.516 | 9.439 | 7.6020 | 5050465.0 |
78
- | 6000 | 0.4848 | 190.4297 | 152376.5 | 1.1980 | 13.2444 | 75.504 | 9.438 | 7.4381 | 3569331.0 |
79
- | 6500 | 0.5253 | 188.9310 | 144618.1562 | 1.1982 | 13.2202 | 75.642 | 9.455 | 7.4149 | 3188412.75 |
80
- | 7000 | 0.5657 | 194.4358 | 148488.4219 | 1.1898 | 13.2162 | 75.664 | 9.458 | 7.6775 | 3383867.0 |
81
- | 7500 | 0.6061 | 197.3188 | 155367.4844 | 1.1877 | 13.216 | 75.666 | 9.458 | 7.7428 | 3572188.0 |
82
- | 8000 | 0.6465 | 201.1151 | 143734.7344 | 1.1764 | 13.2338 | 75.564 | 9.446 | 8.2883 | 3349733.0 |
83
- | 8500 | 0.6869 | 200.3220 | 141594.625 | 1.1751 | 13.1923 | 75.802 | 9.475 | 8.2352 | 3220043.0 |
84
- | 9000 | 0.7273 | 200.8854 | 134460.8906 | 1.1695 | 13.2139 | 75.678 | 9.46 | 8.4530 | 2981093.25 |
85
- | 9500 | 0.7677 | 201.4308 | 134811.7969 | 1.1632 | 13.1959 | 75.781 | 9.473 | 8.5017 | 2802169.0 |
86
- | 10000 | 0.8081 | 199.1423 | 119070.3125 | 1.1540 | 13.2527 | 75.456 | 9.432 | 8.8006 | 2319847.5 |
87
- | 10500 | 0.8485 | 196.0125 | 108148.8438 | 1.1479 | 13.2351 | 75.556 | 9.445 | 8.9736 | 2071720.5 |
88
- | 11000 | 0.8889 | 184.3195 | 84100.1484 | 1.1397 | 13.2039 | 75.735 | 9.467 | 9.2554 | 1337898.0 |
89
- | 11500 | 0.9293 | 187.4005 | 83163.6484 | 1.1355 | 13.2213 | 75.635 | 9.454 | 9.7272 | 1300591.875 |
90
- | 12000 | 0.9697 | 192.9990 | 87212.625 | 1.1357 | 13.295 | 75.216 | 9.402 | 10.3592 | 1611324.75 |
91
- | 12375 | 1.0 | 193.3956 | 88350.2109 | 1.1344 | 13.1951 | 75.786 | 9.473 | 10.2582 | 1620378.125 |
92
 
93
  ### Framework versions
94
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 177.7982
19
+ - eval_frwikippl: 71457.8281
20
+ - eval_zhwikippl: 1401097.25
21
+ - eval_tinystoriesppl: 9.7578
22
+ - eval_loss: 1.1698
23
+ - eval_runtime: 13.1698
24
+ - eval_samples_per_second: 75.932
25
+ - eval_steps_per_second: 9.491
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 0.01
51
  - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
 
57
  - num_epochs: 1.0
58
 
59
  ### Resource Usage
60
+ Peak GPU Memory: 8.0568 GB
61
 
62
  ### Eval-Phase Metrics
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
66
+ | 0 | 0 | 88697.0156 | 150478.2188 | 6.9925 | 13.279 | 75.307 | 9.413 | 69390.6016 | 113346.8047 |
67
+ | 500 | 0.0404 | 4772.3877 | 41781.0312 | 4.4675 | 13.261 | 75.409 | 9.426 | 1755.5269 | 85471.3125 |
68
+ | 1000 | 0.0808 | 159.9700 | 23484.4004 | 2.7209 | 13.208 | 75.712 | 9.464 | 14.2542 | 117503.6719 |
69
+ | 1500 | 0.1212 | 339.4117 | 183525.9688 | 2.5581 | 13.2723 | 75.345 | 9.418 | 11.8515 | 4105987.25 |
70
+ | 2000 | 0.1616 | 331.9490 | 187405.2031 | 1.7211 | 13.2071 | 75.717 | 9.465 | 12.0039 | 1849146.875 |
71
+ | 2500 | 0.2020 | 495.6664 | 785470.0 | 1.6396 | 13.2577 | 75.428 | 9.428 | 11.9702 | 16244978.0 |
72
+ | 3000 | 0.2424 | 303.5385 | 246071.1094 | 1.3174 | 13.2799 | 75.302 | 9.413 | 11.2371 | 5965317.5 |
73
+ | 3500 | 0.2828 | 203.5562 | 98305.3203 | 1.2166 | 13.2691 | 75.363 | 9.42 | 10.0871 | 1967744.0 |
74
+ | 4000 | 0.3232 | 173.5457 | 63847.1445 | 1.1965 | 13.2141 | 75.677 | 9.46 | 9.5835 | 1061893.875 |
75
+ | 4500 | 0.3636 | 166.6617 | 57248.3125 | 1.1922 | 13.2557 | 75.439 | 9.43 | 9.4960 | 928536.9375 |
76
+ | 5000 | 0.4040 | 196.6893 | 119945.6172 | 1.2002 | 13.2052 | 75.728 | 9.466 | 9.5117 | 3237270.25 |
77
+ | 5500 | 0.4444 | 180.5183 | 85077.1328 | 1.1802 | 13.2023 | 75.744 | 9.468 | 9.3612 | 2035016.125 |
78
+ | 6000 | 0.4848 | 173.1228 | 71347.1719 | 1.1743 | 13.2273 | 75.601 | 9.45 | 9.3863 | 1387704.875 |
79
+ | 6500 | 0.5253 | 173.5524 | 73148.3516 | 1.1740 | 13.2503 | 75.47 | 9.434 | 9.2676 | 1480652.625 |
80
+ | 7000 | 0.5657 | 172.1599 | 69698.2734 | 1.1730 | 13.2236 | 75.623 | 9.453 | 9.3360 | 1310343.875 |
81
+ | 7500 | 0.6061 | 174.2867 | 74426.7188 | 1.1723 | 13.2073 | 75.716 | 9.464 | 9.3767 | 1511788.0 |
82
+ | 8000 | 0.6465 | 173.0625 | 69953.9844 | 1.1716 | 13.2511 | 75.465 | 9.433 | 9.4510 | 1417641.375 |
83
+ | 8500 | 0.6869 | 175.9622 | 72553.1562 | 1.1703 | 13.2273 | 75.601 | 9.45 | 9.5974 | 1453256.875 |
84
+ | 9000 | 0.7273 | 177.5917 | 71942.5625 | 1.1698 | 13.2247 | 75.616 | 9.452 | 9.7756 | 1375906.75 |
85
+ | 9500 | 0.7677 | 177.7982 | 71457.8281 | 1.1698 | 13.1698 | 75.932 | 9.491 | 9.7578 | 1401097.25 |
86
+ | 10000 | 0.8081 | 182.3172 | 70652.1328 | 1.1690 | 13.2521 | 75.46 | 9.432 | 10.2168 | 1272107.75 |
87
+ | 10500 | 0.8485 | 184.0270 | 70617.3047 | 1.1694 | 13.2146 | 75.674 | 9.459 | 10.4603 | 1281646.125 |
88
+ | 11000 | 0.8889 | 181.9786 | 70831.5156 | 1.1687 | 13.2442 | 75.505 | 9.438 | 10.2134 | 1352613.125 |
89
+ | 11500 | 0.9293 | 182.2607 | 71593.8438 | 1.1688 | 13.2727 | 75.343 | 9.418 | 10.2172 | 1358399.375 |
90
+ | 12000 | 0.9697 | 181.1417 | 70522.8828 | 1.1687 | 13.2373 | 75.544 | 9.443 | 10.2155 | 1319816.5 |
91
+ | 12375 | 1.0 | 181.2119 | 70612.3203 | 1.1688 | 13.2662 | 75.38 | 9.422 | 10.2206 | 1326170.375 |
92
 
93
  ### Framework versions
94
  - Distily 0.2.0
logs/learning_rate=0.01, lr_scheduler_type=linear, warmup_ratio=0.5/events.out.tfevents.1723847774.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0170d1cb74de8a18089ef197819bc686153124057a621bb3a611e10437aa43c3
3
+ size 307