Spaces:
Runtime error
Runtime error
| title: GPT From Scratch | |
| emoji: ⚡ | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 4.4.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # GPT from scratch | |
| This repo contains code to train a GPT from scratch. The dataset is taken from the [RedPajama 1 trillion data](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample). Only samples from this are taken and used for the training purposes. The implementation of the transformer is similar to the [LitGPT](https://github.com/Lightning-AI/lit-gpt). | |
| The trained model has a parameter count of about 160M. The final training loss was found to be 3.2154. | |
|  | |
| The training details can be found in the attached notebooks. The initial training was stopped when the loss was around 4. | |
|  | |
| Using the checkpoint, the training was resumed and stopped when it went below 3.5. | |
| Github link - https://github.com/mkthoma/gpt_from_scratch |