Update README.md
Browse files
README.md
CHANGED
|
@@ -25,63 +25,30 @@ The model is **intentionally experimental** — it’s not aligned, fact-checked
|
|
| 25 |
|
| 26 |
## Training Details
|
| 27 |
|
| 28 |
-
* **Dataset:** ~45,830 characters (a curated text corpus repeated for exposure)
|
| 29 |
-
* **Vocabulary:** 34 characters (all lowercased)
|
| 30 |
-
* **Sequence length:** 128
|
| 31 |
-
* **Training iterations:** 2,000
|
| 32 |
-
* **Batch size:** 2
|
| 33 |
-
* **Optimizer:** AdamW, learning rate 3e-4
|
| 34 |
-
* **Model parameters:** 711,106
|
| 35 |
* **Performance notes:** Each iteration takes roughly 400–500 ms; 100 iterations take ~45 s on average. Loss steadily decreased from 3.53 to 2.15 over training.
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
```
|
| 40 |
-
|
| 41 |
-
Prompt: "The quick"
|
| 42 |
-
Generated: the quick efehn. dethe cans the fice the fpeens antary of eathetint, an thadat hitimes the and cow thig, and
|
| 43 |
-
|
| 44 |
-
````
|
| 45 |
-
|
| 46 |
-
These outputs capture the **chaotic creativity** of a character-level model: a mixture of readable words, invented forms, and surprising sequences.
|
| 47 |
-
|
| 48 |
-
---
|
| 49 |
-
|
| 50 |
-
## Intended Uses
|
| 51 |
-
|
| 52 |
-
* **Character-level text generation experiments**
|
| 53 |
-
* **Research and education:** studying lightweight language models and sequence learning
|
| 54 |
-
* **Creative exploration:** generating quirky text or procedural content for games, demos, or artistic projects
|
| 55 |
-
|
| 56 |
-
> ⚠️ i3-tiny is experimental and **not intended for production or high-stakes applications**. Text may be repetitive, nonsensical, or inconsistent.
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
## Limitations
|
| 61 |
|
| 62 |
-
|
| 63 |
-
* Outputs are **highly experimental** and not fact-checked
|
| 64 |
-
* Generated sequences can be repetitive, garbled, or unpredictable
|
| 65 |
-
* Not aligned or safety-checked
|
| 66 |
-
|
| 67 |
-
---
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
* Stored in `pytorch_model.bin` (or `model.safetensors`)
|
| 72 |
-
* Compatible with PyTorch and Hugging Face Transformers
|
| 73 |
-
* Requires `modeling_i3.py` and `config.json` to instantiate
|
| 74 |
-
|
| 75 |
-
---
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
```
|
| 80 |
-
# not available
|
| 81 |
-
````
|
| 82 |
|
| 83 |
-
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
|
| 87 |
-
|
|
|
|
| 25 |
|
| 26 |
## Training Details
|
| 27 |
|
| 28 |
+
* **Dataset:** ~45,830 characters (a curated text corpus repeated for exposure)
|
| 29 |
+
* **Vocabulary:** 34 characters (all lowercased)
|
| 30 |
+
* **Sequence length:** 128
|
| 31 |
+
* **Training iterations:** 2,000
|
| 32 |
+
* **Batch size:** 2
|
| 33 |
+
* **Optimizer:** AdamW, learning rate 3e-4
|
| 34 |
+
* **Model parameters:** 711,106
|
| 35 |
* **Performance notes:** Each iteration takes roughly 400–500 ms; 100 iterations take ~45 s on average. Loss steadily decreased from 3.53 to 2.15 over training.
|
| 36 |
|
| 37 |
+
### Training Analysis
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
The charts below illustrate the model's performance over the 2,000 training iterations.
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
The **Training Loss Over Iterations** plot shows a clear learning trend, with the 50-iteration moving average (red line) confirming a steady decrease in Cross-Entropy loss from $\sim3.5$ to $\sim2.1$. The **Training Time Performance** plot shows a consistent block time per 100 iterations, resulting in a nearly linear increase in cumulative training time, demonstrating stable and predictable training execution.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
**Example generation (iteration 1200):**
|
| 46 |
|
| 47 |
+
```
|
|
|
|
|
|
|
| 48 |
|
| 49 |
+
Prompt: "The quick"
|
| 50 |
+
Generated: the quick efehn. dethe cans the fice the fpeens antary of eathetint, an thadat hitimes the and cow thig, and
|
| 51 |
|
| 52 |
+
```
|
| 53 |
|
| 54 |
+
These outputs capture the **chaotic creativity** of a character-level model: a mixture of readable words, invented forms, and surprising sequences.
|