i3-12m / README.md
FlameF0X's picture
Update README.md
36f2293 verified
---
license: apache-2.0
tags:
- conversational
- efficient
- i3-architecture
- custom_code
datasets:
- starhopp3r/TinyChat
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# i3 Model - Ultra-Efficient Pretraining Language Model
## Model Description
The **i3 Model** is designed to optimize **pretraining efficiency** while retaining core language modeling capabilities.
Its architecture allows training on **memory-constrained hardware**, including CPU-only setups, without sacrificing sequence modeling performance.
> [!Note]
> The i3 architecture is present within the model for highly efficient pretraining. It is designed to **reduce memory usage**, **speed up training**, and allow pretraining from scratch on tiny hardware. Internal details are abstracted for simplicity.
---
## Use
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="FlameF0X/i3-12m")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages)
````
---
## Model Statistics
* **Vocabulary Size:** 4,466 (variable-length chunks)
* **Hidden Dimension:** 512
* **Number of Layers:** 12
* **Max Sequence Length:** 256
* **Total Parameters:** 12,691,186
* **Tokenization:** Memory-efficient variable-length chunking (2–3 characters)
* **Total tokens:** 334,524,736
---
## Key Features
1. **Memory-Optimized:** Designed to train on tiny hardware with minimal RAM usage
2. **Pretraining-Focused Architecture:** i3 layers provide efficient sequence modeling, low-rank linear updates, and factorized attention
3. **Variable-Length Tokenization:** 2–3 character chunks for compact embeddings
4. **Conversational Readiness:** Optimized for dialogue and text generation
---
## i3 Architecture (Abstract Overview)
### Design Philosophy
The i3 model targets **CPU-friendly, memory-constrained pretraining**, emphasizing:
* Long-range sequence modeling
* Low-rank weight updates for memory savings
* Efficient factorized attention
* 4-bit weights and microbatching for minimal memory footprint
## Technologies used in the i3 Architecture that are open-sourced by me:
* [Low-Rank Pre-training](https://github.com/FlameF0X/Low-Rank-Pretraining) - LoRa for pre-training.
### Conceptual Layout
```
Input Tokens
β”‚
+-----------------+
| Embedding Layer |
+-----------------+
β”‚
+-----------------+
| i3 Architecture |
+-----------------+
β”‚
+------------------------+
| KQV Low-Rank Attention |
+------------------------+
β”‚
+-----------------------+
| LayerNorm + Residuals |
+-----------------------+
β”‚
+-------------------+
| Output Projection |
+-------------------+
β”‚
Predicted Tokens
```
> Key idea: Every component is optimized for **memory efficiency** and **pretraining speed** on small hardware, while preserving essential transformer dynamics.
---
## Training Details
* **Sequence length:** 128–512 tokens
* **Model size:** ~12M parameters (CPU-friendly)
* **Optimizer:** AdamW or Lion (4-bit / mixed precision)
* **Dataset:** TinyChat (~50–200 MB)
* **Training loop:** gradient checkpointing + recomputation
* **Objective:** token prediction / text generation
## Citation
```bibtex
@software{lorpt2025,
title={LoRPt: Low-Rank Pretraining for Resource-Efficient Language Models},
author={[FlameF0X]},
year={2025},
url={https://github.com/FlameF0X/Low-Rank-Pretraining}
}
```