End of training

Browse files

Files changed (2) hide show

README.md +11 -11
logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=8, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=1000, opti/events.out.tfevents.1723842504.5f530b1cf724 +3 -0

README.md CHANGED Viewed

@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 3650.0444
-- eval_frwikippl: 29470.7617
-- eval_zhwikippl: 52791.2461
-- eval_tinystoriesppl: 1183.5695
-- eval_loss: 5.1097
-- eval_runtime: 6.5331
-- eval_samples_per_second: 76.533
-- eval_steps_per_second: 9.643
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -64,9 +64,9 @@ Peak GPU Memory: 8.0568 GB
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
-| 0 | 0 | 21321.3555 | 56774.5312 | 6.6010 | 6.5152 | 76.744 | 9.67 | 11289.9248 | 60744.7383 |
-| 500 | 0.6464 | 3754.7207 | 29462.4434 | 5.1110 | 6.5528 | 76.303 | 9.614 | 1235.8627 | 53887.0117 |
-| 773 | 0.9994 | 3650.0444 | 29470.7617 | 5.1097 | 6.5331 | 76.533 | 9.643 | 1183.5695 | 52791.2461 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 4891.7905
+- eval_frwikippl: 35673.2305
+- eval_zhwikippl: 32045.9043
+- eval_tinystoriesppl: 1523.1017
+- eval_loss: 4.8703
+- eval_runtime: 6.5675
+- eval_samples_per_second: 76.132
+- eval_steps_per_second: 9.593
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
+| 0 | 0 | 28486.4824 | 70517.9062 | 6.3282 | 6.5373 | 76.484 | 9.637 | 14258.5928 | 39903.9922 |
+| 500 | 0.6464 | 4952.0312 | 35693.3398 | 4.8720 | 6.4924 | 77.014 | 9.704 | 1551.0551 | 32286.1875 |
+| 773 | 0.9994 | 4891.7905 | 35673.2305 | 4.8703 | 6.5675 | 76.132 | 9.593 | 1523.1017 | 32045.9043 |
 ### Framework versions
 - Distily 0.2.0

logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=8, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=1000, opti/events.out.tfevents.1723842504.5f530b1cf724 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4254fab6471061d593ca49d85eaffa7f8552664efd2c0f446979a2702e9362b1
+size 307