sagar118 commited on
Commit
099e95e
·
verified ·
1 Parent(s): a650d70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -11,7 +11,6 @@ tags:
11
  model_name: Genesis-100M
12
  ---
13
 
14
-
15
  ## Architecture
16
  - Decoder-only Transformer (GPT-style)
17
  - 12 layers
@@ -28,6 +27,12 @@ model_name: Genesis-100M
28
  - Training steps: 2000
29
  - Optimizations: Gradient checkpointing, gradient accumulation
30
 
 
 
 
 
 
 
31
  ## Intended Use
32
  - Research
33
  - Educational purposes
 
11
  model_name: Genesis-100M
12
  ---
13
 
 
14
  ## Architecture
15
  - Decoder-only Transformer (GPT-style)
16
  - 12 layers
 
27
  - Training steps: 2000
28
  - Optimizations: Gradient checkpointing, gradient accumulation
29
 
30
+ ## Training Loss Curve
31
+
32
+ ![Training Loss Curve](training_loss.png)
33
+
34
+ The training loss decreased steadily from approximately **9.1 to 5.3** over **2000 training steps**, indicating stable convergence during from-scratch training of the 100M-parameter language model.
35
+
36
  ## Intended Use
37
  - Research
38
  - Educational purposes