File size: 3,225 Bytes
92ddaa3
 
acfe735
92ddaa3
 
 
 
 
d6aa23e
acfe735
d6aa23e
acfe735
d6aa23e
acfe735
2b529ec
738a40f
92bdb10
 
 
 
722c002
0a387bb
 
738a40f
92bdb10
 
 
 
 
 
 
acfe735
 
 
 
2b529ec
acfe735
 
 
2b529ec
acfe735
 
 
 
 
2b529ec
acfe735
 
2b529ec
da24650
acfe735
 
 
d6aa23e
 
 
acfe735
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: README
emoji: 
colorFrom: indigo
colorTo: pink
sdk: static
pinned: false
---

# Welcome to i3-lab

**"Chase the SOTA pipeline, not the MMLU slop."**

i3-lab is dedicated to extreme efficiency in LLM architecture. We develop the **i3** model family—state-of-the-art architectures designed to reach high performance levels in hours on consumer-grade hardware (like the NVIDIA Quadro P100) that typically require days on massive GPU clusters.

## Why?
<details>
  <summary>Click to expand</summary>
  
1. Why?
> Well, I’m determined to make this model or architecture as efficient and fast as possible, knowing that not everyone can afford a decent GPU. In some countries, weak economies or import bans make it even harder, and sometimes all you have is a laptop with an i3-6006U, relying on free cloud computing services like Colab or Kaggle—which is exactly my situation :D
>
> — Daniel

2. Why use RWKV-Attention when you could just use Attention like LLaMa, Qwen, and many others?
> RWKV is great because it’s fast, lightweight, and doesn’t require much RAM, though it struggles with long contexts. Adding a bit of attention to the model architecture makes it more stable and smarter, but at the cost of quadratic memory usage. From my tests on a Kaggle P100 GPU, you can train SLMs (Small Language Models) within its 16GB VRAM, though it takes time and patience. Once you hit around 500 million parameters, training speed drops from about 300–400 tokens per second to 200–300, which may not sound huge, but it’s definitely noticeable. Of course, with something like an RTX 2060 or better, you wouldn’t experience this issue of *feeling slow*.
>
> — Daniel

</details>

---

## i3: High-Efficiency Training
We specialize in hybrid architectures, specifically **RWKV-Attention**, to bypass the quadratic scaling bottlenecks of traditional Transformers.

* **Fast Iteration:** Trainable in hours, not weeks.
* **Accessible SOTA:** High performance on legacy/mid-range hardware.
* **Open Research:** Push the boundaries of what is possible with limited compute.

### Quick Links
* **Source Code:** [FlameF0X/open-i3](https://github.com/FlameF0X/open-i3)
* **Community:** [Join our Discord](https://discord.gg/qtXApjpaJF)

---

## Roadmap / TODO
We are currently scaling our architecture through the following milestones:

- [ ] **i3-500m** — Our 500m parameter text generator.
- [ ] **i3-Ethan-it** — Specialized instruction-tuned variant.
- [ ] **i3-1B** — Our first major scale-up.
- [ ] **i3-7B-A1.6B** — Mixture of Experts / Sparsity testing.

---

## Usage & Attribution
The `open-i3` codebase is licensed under **Apache 2.0**. We believe in open-source, but we value attribution. 

If you use our architecture (RWKV-Attention) or our weights, you are required per **Section 4(b)** and **4(d)** to:
1.  Carry prominent notices of any modifications.
2.  Include a readable copy of the attribution notices from our **NOTICE** file.

> [!IMPORTANT]
> You **must** include the attribution link found in the [open-i3 GitHub](https://github.com/FlameF0X/open-i3) in your documentation or model card.

---
<p align="center">
  Made with ❤️ and <b>DETERMINATION</b> by Daniel.
</p>