FlameF0X commited on
Commit
860a775
·
verified ·
1 Parent(s): 2209baa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -12
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
 
12
  # i3-tiny
13
 
14
- **i3-tiny** is a compact, efficient character-level language model designed for experimentation and exploration in text generation. Despite its small size, it can generate sequences that are quirky, unpredictable, and full of human-like character-level errors.
15
 
16
  ---
17
 
@@ -19,19 +19,44 @@ tags:
19
 
20
  i3-tiny is trained to predict the next character in a sequence, making it ideal for **character-level language modeling**, **creative text generation**, and **research on lightweight, efficient models**. Its small footprint allows rapid experimentation, even on modest hardware, and it provides a playground for studying how models learn patterns in sequences of characters.
21
 
22
- The model is **intentionally experimental** — its not aligned, fact-checked, or polished. Outputs may be coherent, partially readable, or amusingly garbled.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ---
25
 
26
  ## Training Details
27
 
28
- * **Dataset:** ~45,830 characters (a curated text corpus repeated for exposure)  
29
- * **Vocabulary:** 34 characters (all lowercased)  
30
- * **Sequence length:** 128  
31
- * **Training iterations:** 2,000  
32
- * **Batch size:** 2  
33
- * **Optimizer:** AdamW, learning rate 3e-4  
34
- * **Model parameters:** 711,106  
 
35
  * **Performance notes:** Each iteration takes roughly 400–500 ms; 100 iterations take ~45 s on average. Loss steadily decreased from 3.53 to 2.15 over training.
36
 
37
  ### Training Analysis
@@ -45,10 +70,43 @@ The **Training Loss Over Iterations** plot shows a clear learning trend, with th
45
  **Example generation (iteration 1200):**
46
 
47
  ```
48
-
49
  Prompt: "The quick"
50
  Generated: the quick efehn. dethe cans the fice the fpeens antary of eathetint, an thadat hitimes the and cow thig, and
51
-
52
  ```
53
 
54
- These outputs capture the **chaotic creativity** of a character-level model: a mixture of readable words, invented forms, and surprising sequences.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # i3-tiny
13
 
14
+ **i3-tiny** is a compact, efficient character-level language model designed for experimentation and exploration in text generation. Despite its small size, it can generate sequences that are quirky, unpredictable, and full of "human-like" character-level errors.
15
 
16
  ---
17
 
 
19
 
20
  i3-tiny is trained to predict the next character in a sequence, making it ideal for **character-level language modeling**, **creative text generation**, and **research on lightweight, efficient models**. Its small footprint allows rapid experimentation, even on modest hardware, and it provides a playground for studying how models learn patterns in sequences of characters.
21
 
22
+ The model is **intentionally experimental** — it's not aligned, fact-checked, or polished. Outputs may be coherent, partially readable, or amusingly garbled.
23
+
24
+ ---
25
+
26
+ ## Architecture: i3
27
+
28
+ The **i3 architecture** (pronounced "i-three") is a novel hybrid design optimized for extreme efficiency on resource-constrained hardware. The name reflects its design goal: to enable language model training on modest consumer CPUs, including Intel Core i3 processors.
29
+
30
+ ### Key Design Principles
31
+
32
+ i3 combines multiple efficiency techniques to achieve sub-1GB memory usage during training:
33
+
34
+ - **Hybrid sequence modeling**: Blends different approaches to long-range dependency capture, balancing expressiveness with computational efficiency
35
+ - **Low-rank parameterization**: Strategic use of matrix factorization reduces memory footprint while maintaining model capacity
36
+ - **Factorized attention mechanisms**: Efficient approximations that preserve attention's ability to model relationships without quadratic memory costs
37
+ - **Linear-time operations**: Emphasis on operations that scale linearly with sequence length rather than quadratically
38
+
39
+ ### Efficiency Characteristics
40
+
41
+ - **Training memory**: < 1 GB RAM total (including model, gradients, and optimizer state)
42
+ - **Model size**: 711,106 parameters (~2.7 MB in FP32)
43
+ - **Training speed**: ~450 ms per iteration on modest CPU hardware
44
+ - **Sequence processing**: Linear complexity enables longer context windows on limited hardware
45
+
46
+ The architecture is designed from the ground up for CPU-friendly training, making it accessible for experimentation and research without requiring specialized hardware.
47
 
48
  ---
49
 
50
  ## Training Details
51
 
52
+ * **Dataset:** ~45,830 characters (a curated text corpus repeated for exposure)
53
+ * **Vocabulary:** 34 characters (all lowercased)
54
+ * **Sequence length:** 128
55
+ * **Training iterations:** 2,000
56
+ * **Batch size:** 2
57
+ * **Optimizer:** AdamW, learning rate 3e-4
58
+ * **Model parameters:** 711,106
59
+ * **Hardware:** Trained on free-tier CPU compute (Kaggle)
60
  * **Performance notes:** Each iteration takes roughly 400–500 ms; 100 iterations take ~45 s on average. Loss steadily decreased from 3.53 to 2.15 over training.
61
 
62
  ### Training Analysis
 
70
  **Example generation (iteration 1200):**
71
 
72
  ```
 
73
  Prompt: "The quick"
74
  Generated: the quick efehn. dethe cans the fice the fpeens antary of eathetint, an thadat hitimes the and cow thig, and
 
75
  ```
76
 
77
+ These outputs capture the **chaotic creativity** of a character-level model: a mixture of readable words, invented forms, and surprising sequences.
78
+
79
+ ---
80
+
81
+ ## Use Cases
82
+
83
+ - **Educational research**: Study how tiny models learn language patterns
84
+ - **Creative text generation**: Experiment with character-level generation
85
+ - **Efficiency benchmarking**: Test memory-constrained training scenarios
86
+ - **Architecture research**: Explore novel approaches to efficient language modeling
87
+
88
+ ---
89
+
90
+ ## Limitations
91
+
92
+ - Character-level modeling only (no tokenization)
93
+ - Small vocabulary (34 characters)
94
+ - Limited training data and iterations
95
+ - Not suitable for production use or factual tasks
96
+ - Outputs are experimental and unfiltered
97
+
98
+ ---
99
+
100
+ ## Citation
101
+
102
+ If you use this model or the i3 architecture in your research, please cite:
103
+
104
+ ```bibtex
105
+ @misc{i3tiny2024,
106
+ author = {FlameF0X},
107
+ title = {i3-tiny: Ultra-Efficient Character-Level Language Model},
108
+ year = {2024},
109
+ publisher = {HuggingFace},
110
+ howpublished = {\url{https://huggingface.co/FlameF0X/i3-tiny}}
111
+ }
112
+ ```