lapp0 commited on
Commit
73de5fd
·
verified ·
1 Parent(s): b2fd20f

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 3650.0444
19
- - eval_frwikippl: 29470.7617
20
- - eval_zhwikippl: 52791.2461
21
- - eval_tinystoriesppl: 1183.5695
22
- - eval_loss: 5.1097
23
- - eval_runtime: 6.5331
24
- - eval_samples_per_second: 76.533
25
- - eval_steps_per_second: 9.643
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -64,9 +64,9 @@ Peak GPU Memory: 8.0568 GB
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
67
- | 0 | 0 | 21321.3555 | 56774.5312 | 6.6010 | 6.5152 | 76.744 | 9.67 | 11289.9248 | 60744.7383 |
68
- | 500 | 0.6464 | 3754.7207 | 29462.4434 | 5.1110 | 6.5528 | 76.303 | 9.614 | 1235.8627 | 53887.0117 |
69
- | 773 | 0.9994 | 3650.0444 | 29470.7617 | 5.1097 | 6.5331 | 76.533 | 9.643 | 1183.5695 | 52791.2461 |
70
 
71
  ### Framework versions
72
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 4891.7905
19
+ - eval_frwikippl: 35673.2305
20
+ - eval_zhwikippl: 32045.9043
21
+ - eval_tinystoriesppl: 1523.1017
22
+ - eval_loss: 4.8703
23
+ - eval_runtime: 6.5675
24
+ - eval_samples_per_second: 76.132
25
+ - eval_steps_per_second: 9.593
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
67
+ | 0 | 0 | 28486.4824 | 70517.9062 | 6.3282 | 6.5373 | 76.484 | 9.637 | 14258.5928 | 39903.9922 |
68
+ | 500 | 0.6464 | 4952.0312 | 35693.3398 | 4.8720 | 6.4924 | 77.014 | 9.704 | 1551.0551 | 32286.1875 |
69
+ | 773 | 0.9994 | 4891.7905 | 35673.2305 | 4.8703 | 6.5675 | 76.132 | 9.593 | 1523.1017 | 32045.9043 |
70
 
71
  ### Framework versions
72
  - Distily 0.2.0
logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=8, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=1000, opti/events.out.tfevents.1723842504.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4254fab6471061d593ca49d85eaffa7f8552664efd2c0f446979a2702e9362b1
3
+ size 307