CompactAI commited on
Commit
c118fa1
·
verified ·
1 Parent(s): 13abe07

Upload 2 files

Browse files
Files changed (2) hide show
  1. index.html +67 -11
  2. status.html +39 -11
index.html CHANGED
@@ -612,22 +612,22 @@
612
  <div class="feature-card">
613
  <div class="feature-icon">M</div>
614
  <h3>External Memory Module</h3>
615
- <p>384 memory slots that the model can read from and write to. Think of it as the model's diary. It writes stuff down so it doesn't have to remember. Revolutionary concept.</p>
616
  </div>
617
  <div class="feature-card">
618
  <div class="feature-icon">C</div>
619
  <h3>Precision Codebook</h3>
620
- <p>A 32-dimensional codebook at the output head. Instead of predicting directly into a ~2.1K token vocabulary, the model projects down to learnable codes that get mapped to the full vocabulary. Think of it as a compression layer that helps with efficiency while still producing readable output.</p>
621
  </div>
622
  <div class="feature-card">
623
  <div class="feature-icon">T</div>
624
  <h3>Makeshift MTP</h3>
625
- <p>Multi-token prediction with 2 horizons. The model predicts further ahead during training, then branches during inference. It's like looking both ways before crossing, then deciding to jaywalk anyway.</p>
626
  </div>
627
  <div class="feature-card">
628
  <div class="feature-icon">R</div>
629
  <h3>RTX 5090 Optimized</h3>
630
- <p>Fully optimized for the 5090 with torch.compile, gradient checkpointing, and chunked attention. We use every trick in the book. And some from the sequel.</p>
631
  </div>
632
  </div>
633
  </div>
@@ -649,22 +649,22 @@
649
  <div class="arch-arrow">↓</div>
650
  <div class="arch-layer">
651
  <div class="arch-box main">
652
- <span>Transformer Block x2</span>
653
- <small>With External Memory</small>
654
  </div>
655
  </div>
656
  <div class="arch-arrow">↓</div>
657
  <div class="arch-layer">
658
  <div class="arch-box">
659
- <span>Codebook Head</span>
660
- <small>16-dim Quantized</small>
661
  </div>
662
  </div>
663
  <div class="arch-arrow">↓</div>
664
  <div class="arch-layer">
665
  <div class="arch-box">
666
  <span>Output</span>
667
- <small>~2.1K vocab</small>
668
  </div>
669
  </div>
670
  <div class="arch-details">
@@ -681,8 +681,8 @@
681
  <span class="arch-detail-value">256</span>
682
  </div>
683
  <div class="arch-detail">
684
- <span class="arch-detail-label">memory_slots</span>
685
- <span class="arch-detail-value">384</span>
686
  </div>
687
  <div class="arch-detail">
688
  <span class="arch-detail-label">code_dim</span>
@@ -697,6 +697,62 @@
697
  </div>
698
  </section>
699
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
700
  <section class="demo">
701
  <div class="container">
702
  <div class="section-header">
 
612
  <div class="feature-card">
613
  <div class="feature-icon">M</div>
614
  <h3>External Memory Module</h3>
615
+ <p>A recurrent memory module with 32-dimensional memory vectors is baked into the architecture, but currently disabled during SFT training due to AOT autograd compatibility. The architecture supports it the training pipeline just isn't cooperating yet.</p>
616
  </div>
617
  <div class="feature-card">
618
  <div class="feature-icon">C</div>
619
  <h3>Precision Codebook</h3>
620
+ <p>Tied weight embeddings with a learnable per-token output bias. Instead of a separate codebook projection, the model ties input embeddings to output weights and learns a bias vector to compensate for word-token suppression. Simple, parameter-efficient, and surprisingly effective.</p>
621
  </div>
622
  <div class="feature-card">
623
  <div class="feature-icon">T</div>
624
  <h3>Makeshift MTP</h3>
625
+ <p>Multi-token prediction adapters with horizon 2 are wired into the architecture but currently run with weight 0.0. They're there for future experiments. Think of them as emergency exits that nobody's allowed to use yet.</p>
626
  </div>
627
  <div class="feature-card">
628
  <div class="feature-icon">R</div>
629
  <h3>RTX 5090 Optimized</h3>
630
+ <p>Tuned for RTX 5090 with chunked sliding-window attention (1024 window, 256 chunk), bf16 mixed precision, and batch size 64. torch.compile and gradient checkpointing are available but disabled for the Haiku tier stability over speed.</p>
631
  </div>
632
  </div>
633
  </div>
 
649
  <div class="arch-arrow">↓</div>
650
  <div class="arch-layer">
651
  <div class="arch-box main">
652
+ <span>Transformer Block ×6</span>
653
+ <small>Memory Module (Disabled)</small>
654
  </div>
655
  </div>
656
  <div class="arch-arrow">↓</div>
657
  <div class="arch-layer">
658
  <div class="arch-box">
659
+ <span>Tied Output Head</span>
660
+ <small>Learnable Bias</small>
661
  </div>
662
  </div>
663
  <div class="arch-arrow">↓</div>
664
  <div class="arch-layer">
665
  <div class="arch-box">
666
  <span>Output</span>
667
+ <small>~2.1K Hybrid Vocab</small>
668
  </div>
669
  </div>
670
  <div class="arch-details">
 
681
  <span class="arch-detail-value">256</span>
682
  </div>
683
  <div class="arch-detail">
684
+ <span class="arch-detail-label">memory_dim</span>
685
+ <span class="arch-detail-value">32</span>
686
  </div>
687
  <div class="arch-detail">
688
  <span class="arch-detail-label">code_dim</span>
 
697
  </div>
698
  </section>
699
 
700
+ <section class="model-series" style="padding: 100px 0; background: var(--black); border-top: 1px solid var(--gray-2);">
701
+ <div class="container">
702
+ <div class="section-header">
703
+ <h2>Model Series</h2>
704
+ <p>Three tiers following Chinchilla scaling. Yes, we borrowed the naming scheme. No, we're not sorry.</p>
705
+ </div>
706
+ <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(300px, 1fr)); gap: 24px; margin-top: 48px;">
707
+ <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px;">
708
+ <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
709
+ <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Haiku</span>
710
+ <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~1M params</span>
711
+ </div>
712
+ <p style="color: var(--gray-6); font-size: 15px; margin-bottom: 20px;">Lightweight and experimental. Updated frequently. The scrappy underdog.</p>
713
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 8px; font-size: 13px; font-family: var(--font-mono);">
714
+ <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">160</span>
715
+ <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">6</span>
716
+ <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">4</span>
717
+ <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">256</span>
718
+ <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
719
+ <span style="color: var(--gray-5);">pretrain tokens</span><span style="color: var(--gray-7);">~1B</span>
720
+ </div>
721
+ </div>
722
+ <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px;">
723
+ <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
724
+ <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Sonnet</span>
725
+ <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~300M params</span>
726
+ </div>
727
+ <p style="color: var(--gray-6); font-size: 15px; margin-bottom: 20px;">Balanced and stable. Updated less often. The responsible middle child.</p>
728
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 8px; font-size: 13px; font-family: var(--font-mono);">
729
+ <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">768</span>
730
+ <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">36</span>
731
+ <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">12</span>
732
+ <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">2,560</span>
733
+ <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
734
+ <span style="color: var(--gray-5);">pretrain tokens</span><span style="color: var(--gray-7);">~300B</span>
735
+ </div>
736
+ </div>
737
+ <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px;">
738
+ <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
739
+ <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Opus</span>
740
+ <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~600M params</span>
741
+ </div>
742
+ <p style="color: var(--gray-6); font-size: 15px; margin-bottom: 20px;">Maximum quality. Heavy and most stable. The overachiever who never sleeps.</p>
743
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 8px; font-size: 13px; font-family: var(--font-mono);">
744
+ <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">1,024</span>
745
+ <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">39</span>
746
+ <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">16</span>
747
+ <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">3,584</span>
748
+ <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
749
+ <span style="color: var(--gray-5);">pretrain tokens</span><span style="color: var(--gray-7);">~600B</span>
750
+ </div>
751
+ </div>
752
+ </div>
753
+ </div>
754
+ </section>
755
+
756
  <section class="demo">
757
  <div class="container">
758
  <div class="section-header">
status.html CHANGED
@@ -506,15 +506,15 @@
506
  <div class="features-grid">
507
  <div class="feature-item">
508
  <span class="feature-name">External Memory</span>
509
- <span class="feature-status enabled">Enabled</span>
510
  </div>
511
  <div class="feature-item">
512
  <span class="feature-name">Precision Codebook</span>
513
- <span class="feature-status enabled">Enabled</span>
514
  </div>
515
  <div class="feature-item">
516
  <span class="feature-name">Makeshift MTP</span>
517
- <span class="feature-status enabled">Enabled</span>
518
  </div>
519
  <div class="feature-item">
520
  <span class="feature-name">Gradient Checkpointing</span>
@@ -522,7 +522,7 @@
522
  </div>
523
  <div class="feature-item">
524
  <span class="feature-name">Torch Compile</span>
525
- <span class="feature-status enabled">Enabled</span>
526
  </div>
527
  <div class="feature-item">
528
  <span class="feature-name">Chunked Attention</span>
@@ -530,27 +530,47 @@
530
  </div>
531
  <div class="feature-item">
532
  <span class="feature-name">Flash Attention</span>
533
- <span class="feature-status enabled">Enabled</span>
534
  </div>
535
  <div class="feature-item">
536
  <span class="feature-name">Repetition Penalty</span>
537
  <span class="feature-status disabled">Disabled (1.0)</span>
538
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
539
  </div>
540
  </div>
541
 
542
  <!-- Memory Configuration -->
543
  <div class="status-card">
544
  <div class="status-header">
545
- <h3>Memory Module Config</h3>
546
  </div>
547
  <div class="specs-grid">
548
  <div class="spec-item">
549
- <div class="spec-value">384</div>
550
  <div class="spec-label">Memory Slots</div>
551
  </div>
552
  <div class="spec-item">
553
- <div class="spec-value">64</div>
554
  <div class="spec-label">Memory Dim</div>
555
  </div>
556
  <div class="spec-item">
@@ -571,11 +591,19 @@
571
  </div>
572
  <div class="dataset-list">
573
  <div class="dataset-item">
574
- <span class="dataset-name">shuyuej/English-Pretraining-Dataset</span>
575
  <span class="dataset-info">Pretraining</span>
576
  </div>
577
  <div class="dataset-item">
578
- <span class="dataset-name">imdatta0/openthink_chat</span>
 
 
 
 
 
 
 
 
579
  <span class="dataset-info">Instruction Tuning</span>
580
  </div>
581
  <div class="dataset-item">
@@ -598,7 +626,7 @@
598
  <div class="log-entry">
599
  <span class="log-date">2026-03-01</span>
600
  <span class="log-status active">Running</span>
601
- <span class="log-message">Training on RTX 5090 with torch.compile enabled</span>
602
  </div>
603
  <div class="log-entry">
604
  <span class="log-date">2026-02-28</span>
 
506
  <div class="features-grid">
507
  <div class="feature-item">
508
  <span class="feature-name">External Memory</span>
509
+ <span class="feature-status disabled">Disabled</span>
510
  </div>
511
  <div class="feature-item">
512
  <span class="feature-name">Precision Codebook</span>
513
+ <span class="feature-status disabled">Disabled</span>
514
  </div>
515
  <div class="feature-item">
516
  <span class="feature-name">Makeshift MTP</span>
517
+ <span class="feature-status disabled">Disabled (weight=0.0)</span>
518
  </div>
519
  <div class="feature-item">
520
  <span class="feature-name">Gradient Checkpointing</span>
 
522
  </div>
523
  <div class="feature-item">
524
  <span class="feature-name">Torch Compile</span>
525
+ <span class="feature-status disabled">Disabled</span>
526
  </div>
527
  <div class="feature-item">
528
  <span class="feature-name">Chunked Attention</span>
 
530
  </div>
531
  <div class="feature-item">
532
  <span class="feature-name">Flash Attention</span>
533
+ <span class="feature-status disabled">Not Used</span>
534
  </div>
535
  <div class="feature-item">
536
  <span class="feature-name">Repetition Penalty</span>
537
  <span class="feature-status disabled">Disabled (1.0)</span>
538
  </div>
539
+ <div class="feature-item">
540
+ <span class="feature-name">Tied Embeddings</span>
541
+ <span class="feature-status enabled">Enabled</span>
542
+ </div>
543
+ <div class="feature-item">
544
+ <span class="feature-name">Output Logit Bias</span>
545
+ <span class="feature-status enabled">Enabled</span>
546
+ </div>
547
+ <div class="feature-item">
548
+ <span class="feature-name">Word Token Loss Boost</span>
549
+ <span class="feature-status enabled">Enabled (3x)</span>
550
+ </div>
551
+ <div class="feature-item">
552
+ <span class="feature-name">Response-Start Boost</span>
553
+ <span class="feature-status enabled">Enabled (3x, 20 tokens)</span>
554
+ </div>
555
+ <div class="feature-item">
556
+ <span class="feature-name">Entropy Regularization</span>
557
+ <span class="feature-status disabled">Disabled</span>
558
+ </div>
559
  </div>
560
  </div>
561
 
562
  <!-- Memory Configuration -->
563
  <div class="status-card">
564
  <div class="status-header">
565
+ <h3>Memory Module Config (Disabled)</h3>
566
  </div>
567
  <div class="specs-grid">
568
  <div class="spec-item">
569
+ <div class="spec-value">&mdash;</div>
570
  <div class="spec-label">Memory Slots</div>
571
  </div>
572
  <div class="spec-item">
573
+ <div class="spec-value">32</div>
574
  <div class="spec-label">Memory Dim</div>
575
  </div>
576
  <div class="spec-item">
 
591
  </div>
592
  <div class="dataset-list">
593
  <div class="dataset-item">
594
+ <span class="dataset-name">HuggingFaceFW/fineweb_100BT</span>
595
  <span class="dataset-info">Pretraining</span>
596
  </div>
597
  <div class="dataset-item">
598
+ <span class="dataset-name">mattwesney/General_Inquiry_Thinking-Chain-Of-Thought</span>
599
+ <span class="dataset-info">Instruction Tuning</span>
600
+ </div>
601
+ <div class="dataset-item">
602
+ <span class="dataset-name">tatsu-lab/alpaca</span>
603
+ <span class="dataset-info">Instruction Tuning</span>
604
+ </div>
605
+ <div class="dataset-item">
606
+ <span class="dataset-name">databricks/databricks-dolly-15k</span>
607
  <span class="dataset-info">Instruction Tuning</span>
608
  </div>
609
  <div class="dataset-item">
 
626
  <div class="log-entry">
627
  <span class="log-date">2026-03-01</span>
628
  <span class="log-status active">Running</span>
629
+ <span class="log-message">Training on RTX 5090 with torch.compile disabled, chunked attention active</span>
630
  </div>
631
  <div class="log-entry">
632
  <span class="log-date">2026-02-28</span>