Text Generation
Transformers
PyTorch
Safetensors
English
i3
i3-architecture
hybrid-model
rwkv-mamba
custom_code
FlameF0X commited on
Commit
b72e7dc
·
verified ·
1 Parent(s): 725812c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -53,7 +53,7 @@ Layers 11-16: Full Attention Blocks
53
  | **Hidden Dimension** | 512 | 512 |
54
  | **Vocabulary Size** | 4,466 | 35,560 |
55
  | **Training Dataset** | TinyChat only | TinyStories + TinyChat + HQ Sentences |
56
- | **Total Tokens** | ~1M conversations | 3,000,000+ tokens |
57
  | **Final Loss** | ~2.0 | ~2.0 |
58
  | **Final Perplexity** | 7.29-9.70 | 7.29-10.0 |
59
  | **Training Time** | ~17 hours | ~2-4 hours |
 
53
  | **Hidden Dimension** | 512 | 512 |
54
  | **Vocabulary Size** | 4,466 | 35,560 |
55
  | **Training Dataset** | TinyChat only | TinyStories + TinyChat + HQ Sentences |
56
+ | **Total Tokens** | ~1M conversations | 3M+ tokens |
57
  | **Final Loss** | ~2.0 | ~2.0 |
58
  | **Final Perplexity** | 7.29-9.70 | 7.29-10.0 |
59
  | **Training Time** | ~17 hours | ~2-4 hours |