Text Generation
Transformers
PyTorch
Safetensors
English
i3
i3-architecture
hybrid-model
rwkv-mamba
custom_code
FlameF0X commited on
Commit
15fc360
·
verified ·
1 Parent(s): 71ef85c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -132,6 +132,15 @@ Layers 11-16: Full Attention Blocks
132
  | Training Loss | ~6.0 | ~2.0 | 1.98 |
133
  | Perplexity | ~400+ | ~7-10 | 7.29 |
134
 
 
 
 
 
 
 
 
 
 
135
  The model shows strong convergence with stable training dynamics and efficient GPU utilization.
136
 
137
  ## Usage
 
132
  | Training Loss | ~6.0 | ~2.0 | 1.98 |
133
  | Perplexity | ~400+ | ~7-10 | 7.29 |
134
 
135
+
136
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/ugtJGyEkQfbGieURP2W78.png)
137
+ > [!NOTE]
138
+ > I dont know why the logging starts at step 4.6k .
139
+
140
+ **i3-22m** and **i3-80m** comparation?
141
+
142
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/utj6B7AE_gMMI9jnHc37Z.png)
143
+
144
  The model shows strong convergence with stable training dynamics and efficient GPU utilization.
145
 
146
  ## Usage