Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,6 @@ tags:
|
|
| 11 |
model_name: Genesis-100M
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
## Architecture
|
| 16 |
- Decoder-only Transformer (GPT-style)
|
| 17 |
- 12 layers
|
|
@@ -28,6 +27,12 @@ model_name: Genesis-100M
|
|
| 28 |
- Training steps: 2000
|
| 29 |
- Optimizations: Gradient checkpointing, gradient accumulation
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## Intended Use
|
| 32 |
- Research
|
| 33 |
- Educational purposes
|
|
|
|
| 11 |
model_name: Genesis-100M
|
| 12 |
---
|
| 13 |
|
|
|
|
| 14 |
## Architecture
|
| 15 |
- Decoder-only Transformer (GPT-style)
|
| 16 |
- 12 layers
|
|
|
|
| 27 |
- Training steps: 2000
|
| 28 |
- Optimizations: Gradient checkpointing, gradient accumulation
|
| 29 |
|
| 30 |
+
## Training Loss Curve
|
| 31 |
+
|
| 32 |
+

|
| 33 |
+
|
| 34 |
+
The training loss decreased steadily from approximately **9.1 to 5.3** over **2000 training steps**, indicating stable convergence during from-scratch training of the 100M-parameter language model.
|
| 35 |
+
|
| 36 |
## Intended Use
|
| 37 |
- Research
|
| 38 |
- Educational purposes
|