AxionLab-official commited on
Commit
6ce7fbd
Β·
verified Β·
1 Parent(s): 34e9264

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +13 -13
index.html CHANGED
@@ -208,7 +208,7 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
208
  <div class="container">
209
  <div class="hero-eyebrow">AxionLab Research</div>
210
  <h1 class="hero-title">Scaling <em>Intelligence</em><br>from Zero</h1>
211
- <p class="hero-sub">Building DeepSeek-V3 architecture from scratch β€” MLA, MoE, auxiliary-loss-free load balancing β€” scaled progressively from 344k to 100M+ parameters. All weights open. All code open.</p>
212
  <div class="hero-ctas">
213
  <a href="#models" class="btn-primary">↓ Explore Models</a>
214
  <a href="https://huggingface.co/AxionLab-Co" target="_blank" class="btn-ghost">View on HuggingFace β†’</a>
@@ -242,8 +242,8 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
242
  <a href="https://huggingface.co/AxionLab-Co/Axion1-350k-A250k" target="_blank" class="model-link">View on HuggingFace β†’</a>
243
  </div>
244
  <div class="model-card upcoming reveal">
245
- <div class="model-version">v2.0 β€” Coming Soon</div>
246
- <div class="model-name">Axion2</div>
247
  <div class="model-desc">Same architecture, 4Γ— the capacity. Expanded vocabulary and noticeably more coherent language generation.</div>
248
  <div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 128</span><span class="chip">6 layers</span></div>
249
  <div class="model-meta">
@@ -252,8 +252,8 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
252
  </div>
253
  </div>
254
  <div class="model-card upcoming reveal">
255
- <div class="model-version">v3.0 β€” Planned</div>
256
- <div class="model-name">Axion3</div>
257
  <div class="model-desc">First model expected to produce grammatically coherent multi-sentence responses. Scaling laws in action.</div>
258
  <div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 256</span></div>
259
  <div class="model-meta">
@@ -262,8 +262,8 @@ footer{background:var(--bg3);border-top:1px solid var(--border);padding:60px 0 4
262
  </div>
263
  </div>
264
  <div class="model-card upcoming reveal">
265
- <div class="model-version">v4–5 β€” Future</div>
266
- <div class="model-name">Axion4 / Axion5</div>
267
  <div class="model-desc">Scaling to 24M and 100M parameters. Instruction tuning and multi-language support planned.</div>
268
  <div class="model-chips"><span class="chip">24M β†’ 100M</span><span class="chip amber">Multilingual</span></div>
269
  <div class="model-meta">
@@ -337,7 +337,7 @@ Quanto Γ© 5 + 3?
337
  <div class="blog-read-more">Read more β†’</div>
338
  </div>
339
  <div class="blog-visual">
340
- <div class="blog-visual-inner">val_loss: 5.49 β†’ epoch 1
341
  val_loss: 4.59 β†’ epoch 2
342
  val_loss: 4.30 β†’ epoch 3
343
  val_loss: 3.88 β†’ epoch 5
@@ -488,31 +488,31 @@ python train.py --resume --epochs 20</div>
488
  <p class="section-sub">Every Axion release is a scaling experiment. Same architecture, increasing capacity.</p>
489
  <div class="roadmap-track">
490
  <div class="roadmap-item done reveal">
491
- <div class="roadmap-meta"><span class="roadmap-version">Axion1 β€” 344k params</span><span class="roadmap-date">March 2025</span><span class="roadmap-badge done">Released</span></div>
492
  <div class="roadmap-title">Proof of Architecture</div>
493
  <div class="roadmap-desc">Full DeepSeek-V3 pipeline from scratch. MLA + MoE + BPE tokenizer + HuggingFace integration. Trained on GSM8K in 115 minutes on CPU.</div>
494
  <div class="roadmap-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">GSM8K</span><span class="chip green">HuggingFace</span></div>
495
  </div>
496
  <div class="roadmap-item next reveal">
497
- <div class="roadmap-meta"><span class="roadmap-version">Axion2 β€” ~1.5M params</span><span class="roadmap-date">Coming Soon</span><span class="roadmap-badge next">In Progress</span></div>
498
  <div class="roadmap-title">First Coherent Sentences</div>
499
  <div class="roadmap-desc">d_model 128, 6 layers, expanded vocab. Expected to produce grammatically structured responses. Full training log will be published.</div>
500
  <div class="roadmap-chips"><span class="chip">d_model 128</span><span class="chip">6 layers</span><span class="chip amber">Larger vocab</span></div>
501
  </div>
502
  <div class="roadmap-item reveal">
503
- <div class="roadmap-meta"><span class="roadmap-version">Axion3 β€” ~6M params</span><span class="roadmap-badge planned">Planned</span></div>
504
  <div class="roadmap-title">Reliable Math Reasoning</div>
505
  <div class="roadmap-desc">d_model 256. Consistent step-by-step reasoning on arithmetic. Broader dataset planned.</div>
506
  <div class="roadmap-chips"><span class="chip">d_model 256</span><span class="chip amber">Multi-dataset</span></div>
507
  </div>
508
  <div class="roadmap-item reveal">
509
- <div class="roadmap-meta"><span class="roadmap-version">Axion4 β€” ~24M params</span><span class="roadmap-badge planned">Planned</span></div>
510
  <div class="roadmap-title">Instruction Following</div>
511
  <div class="roadmap-desc">First Axion with instruction tuning. Target: answer general questions in Portuguese and English.</div>
512
  <div class="roadmap-chips"><span class="chip">Instruction SFT</span><span class="chip amber">PT + EN</span></div>
513
  </div>
514
  <div class="roadmap-item reveal">
515
- <div class="roadmap-meta"><span class="roadmap-version">Axion5 β€” ~100M params</span><span class="roadmap-badge planned">Planned</span></div>
516
  <div class="roadmap-title">General Purpose</div>
517
  <div class="roadmap-desc">The flagship. Real conversation, multi-turn context, and a full evaluation suite.</div>
518
  <div class="roadmap-chips"><span class="chip">100M</span><span class="chip cyan">Multi-turn</span><span class="chip amber">Eval suite</span></div>
 
208
  <div class="container">
209
  <div class="hero-eyebrow">AxionLab Research</div>
210
  <h1 class="hero-title">Scaling <em>Intelligence</em><br>from Zero</h1>
211
+ <p class="hero-sub">Building architectures from scratch β€” MLA, MoE, auxiliary-loss-free load balancing β€” scaled progressively from 344k to 100M+ parameters. All weights open. All code open.</p>
212
  <div class="hero-ctas">
213
  <a href="#models" class="btn-primary">↓ Explore Models</a>
214
  <a href="https://huggingface.co/AxionLab-Co" target="_blank" class="btn-ghost">View on HuggingFace β†’</a>
 
242
  <a href="https://huggingface.co/AxionLab-Co/Axion1-350k-A250k" target="_blank" class="model-link">View on HuggingFace β†’</a>
243
  </div>
244
  <div class="model-card upcoming reveal">
245
+ <div class="model-version">v0.2 β€” Coming Soon</div>
246
+ <div class="model-name">Axion1-v0.2</div>
247
  <div class="model-desc">Same architecture, 4Γ— the capacity. Expanded vocabulary and noticeably more coherent language generation.</div>
248
  <div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 128</span><span class="chip">6 layers</span></div>
249
  <div class="model-meta">
 
252
  </div>
253
  </div>
254
  <div class="model-card upcoming reveal">
255
+ <div class="model-version">v0.3 β€” Planned</div>
256
+ <div class="model-name">Axion1-v0.3</div>
257
  <div class="model-desc">First model expected to produce grammatically coherent multi-sentence responses. Scaling laws in action.</div>
258
  <div class="model-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">d_model 256</span></div>
259
  <div class="model-meta">
 
262
  </div>
263
  </div>
264
  <div class="model-card upcoming reveal">
265
+ <div class="model-version">v0.4-0.5 β€” Future</div>
266
+ <div class="model-name">Axion1-v0.4 / Axion1-v0.5</div>
267
  <div class="model-desc">Scaling to 24M and 100M parameters. Instruction tuning and multi-language support planned.</div>
268
  <div class="model-chips"><span class="chip">24M β†’ 100M</span><span class="chip amber">Multilingual</span></div>
269
  <div class="model-meta">
 
337
  <div class="blog-read-more">Read more β†’</div>
338
  </div>
339
  <div class="blog-visual">
340
+ <div class="blog-visual-inner">val_loss: 6.49 β†’ epoch 1
341
  val_loss: 4.59 β†’ epoch 2
342
  val_loss: 4.30 β†’ epoch 3
343
  val_loss: 3.88 β†’ epoch 5
 
488
  <p class="section-sub">Every Axion release is a scaling experiment. Same architecture, increasing capacity.</p>
489
  <div class="roadmap-track">
490
  <div class="roadmap-item done reveal">
491
+ <div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.1 β€” 344k params</span><span class="roadmap-date">March 2025</span><span class="roadmap-badge done">Released</span></div>
492
  <div class="roadmap-title">Proof of Architecture</div>
493
  <div class="roadmap-desc">Full DeepSeek-V3 pipeline from scratch. MLA + MoE + BPE tokenizer + HuggingFace integration. Trained on GSM8K in 115 minutes on CPU.</div>
494
  <div class="roadmap-chips"><span class="chip cyan">MLA</span><span class="chip cyan">MoE</span><span class="chip">GSM8K</span><span class="chip green">HuggingFace</span></div>
495
  </div>
496
  <div class="roadmap-item next reveal">
497
+ <div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.2 β€” ~1.5M params</span><span class="roadmap-date">Coming Soon</span><span class="roadmap-badge next">In Progress</span></div>
498
  <div class="roadmap-title">First Coherent Sentences</div>
499
  <div class="roadmap-desc">d_model 128, 6 layers, expanded vocab. Expected to produce grammatically structured responses. Full training log will be published.</div>
500
  <div class="roadmap-chips"><span class="chip">d_model 128</span><span class="chip">6 layers</span><span class="chip amber">Larger vocab</span></div>
501
  </div>
502
  <div class="roadmap-item reveal">
503
+ <div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.3 β€” ~6M params</span><span class="roadmap-badge planned">Planned</span></div>
504
  <div class="roadmap-title">Reliable Math Reasoning</div>
505
  <div class="roadmap-desc">d_model 256. Consistent step-by-step reasoning on arithmetic. Broader dataset planned.</div>
506
  <div class="roadmap-chips"><span class="chip">d_model 256</span><span class="chip amber">Multi-dataset</span></div>
507
  </div>
508
  <div class="roadmap-item reveal">
509
+ <div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.4 β€” ~24M params</span><span class="roadmap-badge planned">Planned</span></div>
510
  <div class="roadmap-title">Instruction Following</div>
511
  <div class="roadmap-desc">First Axion with instruction tuning. Target: answer general questions in Portuguese and English.</div>
512
  <div class="roadmap-chips"><span class="chip">Instruction SFT</span><span class="chip amber">PT + EN</span></div>
513
  </div>
514
  <div class="roadmap-item reveal">
515
+ <div class="roadmap-meta"><span class="roadmap-version">Axion1-v0.5 β€” ~100M params</span><span class="roadmap-badge planned">Planned</span></div>
516
  <div class="roadmap-title">General Purpose</div>
517
  <div class="roadmap-desc">The flagship. Real conversation, multi-turn context, and a full evaluation suite.</div>
518
  <div class="roadmap-chips"><span class="chip">100M</span><span class="chip cyan">Multi-turn</span><span class="chip amber">Eval suite</span></div>