Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ model-index:
|
|
| 15 |
|
| 16 |
# NanoGPT Personal Experiment
|
| 17 |
|
| 18 |
-
This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model
|
| 19 |
|
| 20 |
## Model Description
|
| 21 |
|
|
@@ -24,16 +24,14 @@ This model is based on the nanoGPT implementation, which is a minimal, clean imp
|
|
| 24 |
### Technical Details
|
| 25 |
|
| 26 |
- Base Architecture: GPT-2
|
| 27 |
-
- Implementation: nanoGPT
|
| 28 |
- Training Infrastructure: 8x A100 80GB GPUs
|
| 29 |
- Parameters: ~124M (similar to GPT-2 small)
|
| 30 |
|
| 31 |
### Training Process
|
| 32 |
|
| 33 |
The model underwent a multi-stage training process:
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
3. Experimentation with different hyperparameters and optimization techniques
|
| 37 |
|
| 38 |
### Features
|
| 39 |
|
|
|
|
| 15 |
|
| 16 |
# NanoGPT Personal Experiment
|
| 17 |
|
| 18 |
+
This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
|
| 19 |
|
| 20 |
## Model Description
|
| 21 |
|
|
|
|
| 24 |
### Technical Details
|
| 25 |
|
| 26 |
- Base Architecture: GPT-2
|
|
|
|
| 27 |
- Training Infrastructure: 8x A100 80GB GPUs
|
| 28 |
- Parameters: ~124M (similar to GPT-2 small)
|
| 29 |
|
| 30 |
### Training Process
|
| 31 |
|
| 32 |
The model underwent a multi-stage training process:
|
| 33 |
+
- Initial training on a subset of the OpenWebText dataset
|
| 34 |
+
- Experimentation with different hyperparameters and optimization techniques
|
|
|
|
| 35 |
|
| 36 |
### Features
|
| 37 |
|