Text Generation
Transformers
PyTorch
Safetensors
English
i3
i3-architecture
hybrid-model
rwkv-mamba
custom_code
FlameF0X commited on
Commit
bab8a37
·
verified ·
1 Parent(s): b72e7dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -53,7 +53,7 @@ Layers 11-16: Full Attention Blocks
53
  | **Hidden Dimension** | 512 | 512 |
54
  | **Vocabulary Size** | 4,466 | 35,560 |
55
  | **Training Dataset** | TinyChat only | TinyStories + TinyChat + HQ Sentences |
56
- | **Total Tokens** | ~1M conversations | 3M+ tokens |
57
  | **Final Loss** | ~2.0 | ~2.0 |
58
  | **Final Perplexity** | 7.29-9.70 | 7.29-10.0 |
59
  | **Training Time** | ~17 hours | ~2-4 hours |
@@ -127,11 +127,10 @@ Layers 11-16: Full Attention Blocks
127
 
128
  ### Performance Metrics
129
 
130
- | Metric | Initial | Final | Best |
131
- |--------|---------|-------|------|
132
- | Training Loss | ~6.0 | ~2.0 | 1.98 |
133
- | Perplexity | ~400+ | ~7-10 | 7.29 |
134
-
135
 
136
  ![image](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/ugtJGyEkQfbGieURP2W78.png)
137
  > [!NOTE]
 
53
  | **Hidden Dimension** | 512 | 512 |
54
  | **Vocabulary Size** | 4,466 | 35,560 |
55
  | **Training Dataset** | TinyChat only | TinyStories + TinyChat + HQ Sentences |
56
+ | **Total Tokens** | ~1M conversations | ~3M+ tokens |
57
  | **Final Loss** | ~2.0 | ~2.0 |
58
  | **Final Perplexity** | 7.29-9.70 | 7.29-10.0 |
59
  | **Training Time** | ~17 hours | ~2-4 hours |
 
127
 
128
  ### Performance Metrics
129
 
130
+ | Metric | Initial | Final |
131
+ |--------|---------|-------|
132
+ | Training Loss | ~10.0 | ~1.7 |
133
+ | Perplexity | ~4000+ | ~6 |
 
134
 
135
  ![image](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/ugtJGyEkQfbGieURP2W78.png)
136
  > [!NOTE]