s1lv3rj1nx
/

ch2

s1lv3rj1nx commited on Feb 19, 2025

Commit

88345c9

verified ·

1 Parent(s): 9e03249

Create README.md

Files changed (1) hide show

README.md ADDED Viewed

+---
+license: apache-2.0
+datasets:
+- sgoel9/paul_graham_essays
+---
+This is the trained model file for Ch2 - LLMs are MultiTask Learners.
+This chapter creates a GPT2-124M from scratch for text generation. Please use the `best_model.pt` checkpoint for inference.
+Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.
+## Plots
+Loss  (Train):
+![ch2_05_train_epoch_loss.png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ht1Tfjuoqywbf5GF06jMx.png)
+Perplexity (Train):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/psCddxI08z64FKzPH3ADk.png)
+Loss (Val):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ul5sRV2g0HT2CTCU1FQBT.png)
+Perplexixty (Val):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/TmZ6cn7g48q3sAjgsECI5.png)