Spaces:
Running
Running
Update index.html
Browse files- index.html +13 -13
index.html
CHANGED
|
@@ -208,7 +208,7 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
|
|
| 208 |
<div class="container">
|
| 209 |
<div class="hero-eyebrow">AxionLab Research</div>
|
| 210 |
<h1 class="hero-title">Scaling <em>Intelligence</em><br>from Zero</h1>
|
| 211 |
-
<p class="hero-sub">Building
|
| 212 |
<div class="hero-ctas">
|
| 213 |
<a href="#models" class="btn-primary">β Explore Models</a>
|
| 214 |
<a href="https://huggingface.co/AxionLab-Co" target="_blank" class="btn-ghost">View on HuggingFace β</a>
|
|
@@ -242,8 +242,8 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
|
|
| 242 |
<a href="https://huggingface.co/AxionLab-Co/Axion1-350k-A250k" target="_blank" class="model-link">View on HuggingFace β</a>
|
| 243 |
</div>
|
| 244 |
<div class="model-card upcoming reveal">
|
| 245 |
-
<div class="model-version">
|
| 246 |
-
<div class="model-name">
|
| 247 |
<div class="model-desc">Same architecture, 4Γ the capacity. Expanded vocabulary and noticeably more coherent language generation.</div>
|
| 248 |
<div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 128</span><span class="chip">6 layers</span></div>
|
| 249 |
<div class="model-meta">
|
|
@@ -252,8 +252,8 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
|
|
| 252 |
</div>
|
| 253 |
</div>
|
| 254 |
<div class="model-card upcoming reveal">
|
| 255 |
-
<div class="model-version">
|
| 256 |
-
<div class="model-name">
|
| 257 |
<div class="model-desc">First model expected to produce grammatically coherent multi-sentence responses. Scaling laws in action.</div>
|
| 258 |
<div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 256</span></div>
|
| 259 |
<div class="model-meta">
|
|
@@ -262,8 +262,8 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
|
|
| 262 |
</div>
|
| 263 |
</div>
|
| 264 |
<div class="model-card upcoming reveal">
|
| 265 |
-
<div class="model-version">
|
| 266 |
-
<div class="model-name">
|
| 267 |
<div class="model-desc">Scaling to 24M and 100M parameters. Instruction tuning and multi-language support planned.</div>
|
| 268 |
<div class="model-chips"><span class="chip">24M β 100M</span><span class="chip amber">Multilingual</span></div>
|
| 269 |
<div class="model-meta">
|
|
@@ -337,7 +337,7 @@ Quanto Γ© 5 + 3?
|
|
| 337 |
<div class="blog-read-more">Read more β</div>
|
| 338 |
</div>
|
| 339 |
<div class="blog-visual">
|
| 340 |
-
<div class="blog-visual-inner">val_loss:
|
| 341 |
val_loss: 4.59 β epoch 2
|
| 342 |
val_loss: 4.30 β epoch 3
|
| 343 |
val_loss: 3.88 β epoch 5
|
|
@@ -488,31 +488,31 @@ python train.py --resume --epochs 20</div>
|
|
| 488 |
<p class="section-sub">Every Axion release is a scaling experiment. Same architecture, increasing capacity.</p>
|
| 489 |
<div class="roadmap-track">
|
| 490 |
<div class="roadmap-item done reveal">
|
| 491 |
-
<div class="roadmap-meta"><span class="roadmap-version">Axion1 β 344k params</span><span class="roadmap-date">March 2025</span><span class="roadmap-badge done">Released</span></div>
|
| 492 |
<div class="roadmap-title">Proof of Architecture</div>
|
| 493 |
<div class="roadmap-desc">Full DeepSeek-V3 pipeline from scratch. MLA + MoE + BPE tokenizer + HuggingFace integration. Trained on GSM8K in 115 minutes on CPU.</div>
|
| 494 |
<div class="roadmap-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">GSM8K</span><span class="chip green">HuggingFace</span></div>
|
| 495 |
</div>
|
| 496 |
<div class="roadmap-item next reveal">
|
| 497 |
-
<div class="roadmap-meta"><span class="roadmap-version">
|
| 498 |
<div class="roadmap-title">First Coherent Sentences</div>
|
| 499 |
<div class="roadmap-desc">d_model 128, 6 layers, expanded vocab. Expected to produce grammatically structured responses. Full training log will be published.</div>
|
| 500 |
<div class="roadmap-chips"><span class="chip">d_model 128</span><span class="chip">6 layers</span><span class="chip amber">Larger vocab</span></div>
|
| 501 |
</div>
|
| 502 |
<div class="roadmap-item reveal">
|
| 503 |
-
<div class="roadmap-meta"><span class="roadmap-version">
|
| 504 |
<div class="roadmap-title">Reliable Math Reasoning</div>
|
| 505 |
<div class="roadmap-desc">d_model 256. Consistent step-by-step reasoning on arithmetic. Broader dataset planned.</div>
|
| 506 |
<div class="roadmap-chips"><span class="chip">d_model 256</span><span class="chip amber">Multi-dataset</span></div>
|
| 507 |
</div>
|
| 508 |
<div class="roadmap-item reveal">
|
| 509 |
-
<div class="roadmap-meta"><span class="roadmap-version">
|
| 510 |
<div class="roadmap-title">Instruction Following</div>
|
| 511 |
<div class="roadmap-desc">First Axion with instruction tuning. Target: answer general questions in Portuguese and English.</div>
|
| 512 |
<div class="roadmap-chips"><span class="chip">Instruction SFT</span><span class="chip amber">PT + EN</span></div>
|
| 513 |
</div>
|
| 514 |
<div class="roadmap-item reveal">
|
| 515 |
-
<div class="roadmap-meta"><span class="roadmap-version">
|
| 516 |
<div class="roadmap-title">General Purpose</div>
|
| 517 |
<div class="roadmap-desc">The flagship. Real conversation, multi-turn context, and a full evaluation suite.</div>
|
| 518 |
<div class="roadmap-chips"><span class="chip">100M</span><span class="chip cyan">Multi-turn</span><span class="chip amber">Eval suite</span></div>
|
|
|
|
| 208 |
<div class="container">
|
| 209 |
<div class="hero-eyebrow">AxionLab Research</div>
|
| 210 |
<h1 class="hero-title">Scaling <em>Intelligence</em><br>from Zero</h1>
|
| 211 |
+
<p class="hero-sub">Building architectures from scratch β MLA, MoE, auxiliary-loss-free load balancing β scaled progressively from 344k to 100M+ parameters. All weights open. All code open.</p>
|
| 212 |
<div class="hero-ctas">
|
| 213 |
<a href="#models" class="btn-primary">β Explore Models</a>
|
| 214 |
<a href="https://huggingface.co/AxionLab-Co" target="_blank" class="btn-ghost">View on HuggingFace β</a>
|
|
|
|
| 242 |
<a href="https://huggingface.co/AxionLab-Co/Axion1-350k-A250k" target="_blank" class="model-link">View on HuggingFace β</a>
|
| 243 |
</div>
|
| 244 |
<div class="model-card upcoming reveal">
|
| 245 |
+
<div class="model-version">v0.2 β Coming Soon</div>
|
| 246 |
+
<div class="model-name">Axion1-v0.2</div>
|
| 247 |
<div class="model-desc">Same architecture, 4Γ the capacity. Expanded vocabulary and noticeably more coherent language generation.</div>
|
| 248 |
<div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 128</span><span class="chip">6 layers</span></div>
|
| 249 |
<div class="model-meta">
|
|
|
|
| 252 |
</div>
|
| 253 |
</div>
|
| 254 |
<div class="model-card upcoming reveal">
|
| 255 |
+
<div class="model-version">v0.3 β Planned</div>
|
| 256 |
+
<div class="model-name">Axion1-v0.3</div>
|
| 257 |
<div class="model-desc">First model expected to produce grammatically coherent multi-sentence responses. Scaling laws in action.</div>
|
| 258 |
<div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 256</span></div>
|
| 259 |
<div class="model-meta">
|
|
|
|
| 262 |
</div>
|
| 263 |
</div>
|
| 264 |
<div class="model-card upcoming reveal">
|
| 265 |
+
<div class="model-version">v0.4-0.5 β Future</div>
|
| 266 |
+
<div class="model-name">Axion1-v0.4 / Axion1-v0.5</div>
|
| 267 |
<div class="model-desc">Scaling to 24M and 100M parameters. Instruction tuning and multi-language support planned.</div>
|
| 268 |
<div class="model-chips"><span class="chip">24M β 100M</span><span class="chip amber">Multilingual</span></div>
|
| 269 |
<div class="model-meta">
|
|
|
|
| 337 |
<div class="blog-read-more">Read more β</div>
|
| 338 |
</div>
|
| 339 |
<div class="blog-visual">
|
| 340 |
+
<div class="blog-visual-inner">val_loss: 6.49 β epoch 1
|
| 341 |
val_loss: 4.59 β epoch 2
|
| 342 |
val_loss: 4.30 β epoch 3
|
| 343 |
val_loss: 3.88 β epoch 5
|
|
|
|
| 488 |
<p class="section-sub">Every Axion release is a scaling experiment. Same architecture, increasing capacity.</p>
|
| 489 |
<div class="roadmap-track">
|
| 490 |
<div class="roadmap-item done reveal">
|
| 491 |
+
<div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.1 β 344k params</span><span class="roadmap-date">March 2025</span><span class="roadmap-badge done">Released</span></div>
|
| 492 |
<div class="roadmap-title">Proof of Architecture</div>
|
| 493 |
<div class="roadmap-desc">Full DeepSeek-V3 pipeline from scratch. MLA + MoE + BPE tokenizer + HuggingFace integration. Trained on GSM8K in 115 minutes on CPU.</div>
|
| 494 |
<div class="roadmap-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">GSM8K</span><span class="chip green">HuggingFace</span></div>
|
| 495 |
</div>
|
| 496 |
<div class="roadmap-item next reveal">
|
| 497 |
+
<div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.2 β ~1.5M params</span><span class="roadmap-date">Coming Soon</span><span class="roadmap-badge next">In Progress</span></div>
|
| 498 |
<div class="roadmap-title">First Coherent Sentences</div>
|
| 499 |
<div class="roadmap-desc">d_model 128, 6 layers, expanded vocab. Expected to produce grammatically structured responses. Full training log will be published.</div>
|
| 500 |
<div class="roadmap-chips"><span class="chip">d_model 128</span><span class="chip">6 layers</span><span class="chip amber">Larger vocab</span></div>
|
| 501 |
</div>
|
| 502 |
<div class="roadmap-item reveal">
|
| 503 |
+
<div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.3 β ~6M params</span><span class="roadmap-badge planned">Planned</span></div>
|
| 504 |
<div class="roadmap-title">Reliable Math Reasoning</div>
|
| 505 |
<div class="roadmap-desc">d_model 256. Consistent step-by-step reasoning on arithmetic. Broader dataset planned.</div>
|
| 506 |
<div class="roadmap-chips"><span class="chip">d_model 256</span><span class="chip amber">Multi-dataset</span></div>
|
| 507 |
</div>
|
| 508 |
<div class="roadmap-item reveal">
|
| 509 |
+
<div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.4 β ~24M params</span><span class="roadmap-badge planned">Planned</span></div>
|
| 510 |
<div class="roadmap-title">Instruction Following</div>
|
| 511 |
<div class="roadmap-desc">First Axion with instruction tuning. Target: answer general questions in Portuguese and English.</div>
|
| 512 |
<div class="roadmap-chips"><span class="chip">Instruction SFT</span><span class="chip amber">PT + EN</span></div>
|
| 513 |
</div>
|
| 514 |
<div class="roadmap-item reveal">
|
| 515 |
+
<div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.5 β ~100M params</span><span class="roadmap-badge planned">Planned</span></div>
|
| 516 |
<div class="roadmap-title">General Purpose</div>
|
| 517 |
<div class="roadmap-desc">The flagship. Real conversation, multi-turn context, and a full evaluation suite.</div>
|
| 518 |
<div class="roadmap-chips"><span class="chip">100M</span><span class="chip cyan">Multi-turn</span><span class="chip amber">Eval suite</span></div>
|