i3-lab
/

i3-80m

Text Generation

i3-architecture

Model card Files Files and versions

FlameF0X commited on Oct 31, 2025

Commit

15fc360

·

verified ·

1 Parent(s): 71ef85c

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -132,6 +132,15 @@ Layers 11-16: Full Attention Blocks
 | Training Loss | ~6.0 | ~2.0 | 1.98 |
 | Perplexity | ~400+ | ~7-10 | 7.29 |
 The model shows strong convergence with stable training dynamics and efficient GPU utilization.
 ## Usage

 | Training Loss | ~6.0 | ~2.0 | 1.98 |
 | Perplexity | ~400+ | ~7-10 | 7.29 |
+![image](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/ugtJGyEkQfbGieURP2W78.png)
+> [!NOTE]
+> I dont know why the logging starts at step 4.6k .
+**i3-22m** and **i3-80m** comparation?
+![image](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/utj6B7AE_gMMI9jnHc37Z.png)
 The model shows strong convergence with stable training dynamics and efficient GPU utilization.
 ## Usage