CompactAI commited on
Commit
8f8a9b1
·
verified ·
1 Parent(s): cdf8bcd

Upload 2 files

Browse files
Files changed (2) hide show
  1. index.html +285 -30
  2. status.html +80 -14
index.html CHANGED
@@ -91,10 +91,20 @@
91
  display: flex;
92
  align-items: center;
93
  gap: 8px;
 
 
 
 
 
94
  }
95
 
96
  .nav-brand span {
97
  color: var(--accent);
 
 
 
 
 
98
  }
99
 
100
  .nav-links {
@@ -106,12 +116,28 @@
106
  font-size: 14px;
107
  font-weight: 500;
108
  color: var(--gray-6);
 
 
 
 
 
 
 
 
 
 
 
 
109
  }
110
 
111
  .nav-links a:hover {
112
  color: var(--white);
113
  }
114
 
 
 
 
 
115
  /* Hero */
116
  .hero {
117
  min-height: 100vh;
@@ -133,6 +159,12 @@
133
  bottom: 0;
134
  background: radial-gradient(ellipse 80% 50% at 50% -20%, rgba(255, 77, 0, 0.08), transparent);
135
  pointer-events: none;
 
 
 
 
 
 
136
  }
137
 
138
  .hero-badge {
@@ -157,8 +189,44 @@
157
  }
158
 
159
  @keyframes pulse {
160
- 0%, 100% { opacity: 1; }
161
- 50% { opacity: 0.4; }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  }
163
 
164
  .hero h1 {
@@ -205,22 +273,59 @@
205
  background: var(--white);
206
  color: var(--black);
207
  border: none;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
  }
209
 
210
  .btn-primary:hover {
211
  background: var(--gray-7);
212
  color: var(--black);
 
 
213
  }
214
 
215
  .btn-secondary {
216
  background: transparent;
217
  color: var(--gray-7);
218
  border: 1px solid var(--gray-3);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  }
220
 
221
  .btn-secondary:hover {
222
- border-color: var(--gray-5);
223
  color: var(--white);
 
224
  }
225
 
226
  /* Specs */
@@ -243,6 +348,26 @@
243
  .spec-card {
244
  background: var(--black-soft);
245
  padding: 32px;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
  }
247
 
248
  .spec-value {
@@ -268,6 +393,8 @@
268
 
269
  .section-header {
270
  margin-bottom: 64px;
 
 
271
  }
272
 
273
  .section-header h2 {
@@ -295,11 +422,21 @@
295
  border: 1px solid var(--gray-2);
296
  border-radius: 12px;
297
  padding: 32px;
298
- transition: border-color 0.2s ease;
 
 
299
  }
300
 
 
 
 
 
 
 
301
  .feature-card:hover {
302
- border-color: var(--gray-3);
 
 
303
  }
304
 
305
  .feature-icon {
@@ -313,6 +450,12 @@
313
  justify-content: center;
314
  margin-bottom: 20px;
315
  font-size: 24px;
 
 
 
 
 
 
316
  }
317
 
318
  .feature-card h3 {
@@ -428,6 +571,8 @@
428
  border: 1px solid var(--gray-2);
429
  border-radius: 12px;
430
  overflow: hidden;
 
 
431
  }
432
 
433
  .demo-header {
@@ -503,6 +648,21 @@
503
  color: var(--accent);
504
  }
505
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
506
  /* Responsive */
507
  @media (max-width: 768px) {
508
  .hero h1 {
@@ -533,7 +693,6 @@
533
  <a href="downloads/index.html">Download</a>
534
  <a href="blog.html">Blog</a>
535
  <a href="status.html">Status</a>
536
- <a href="#">GitHub</a>
537
  </div>
538
  </div>
539
  </nav>
@@ -546,7 +705,7 @@
546
  Training on RTX 5090
547
  </div>
548
  <h1>A ~1M Parameter Model<br>with <span class="highlight">2K Context</span></h1>
549
- <p>TinyMemoryLM is a character-level transformer that learns to remember things. Not because it smart, but because we gave it external memory. And a codebook. And MTP. It still forgets where it put its keys though.</p>
550
  <div class="hero-cta">
551
  <a href="status.html" class="btn btn-primary">View Training Status</a>
552
  <a href="blog.html" class="btn btn-secondary">Read the Blog</a>
@@ -554,6 +713,24 @@
554
  </div>
555
  </section>
556
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
557
  <section class="download-section" style="padding: 80px 0; background: linear-gradient(180deg, var(--black-soft) 0%, var(--black) 100%); border-bottom: 1px solid var(--gray-2);">
558
  <div class="container" style="text-align: center;">
559
  <h2 style="font-size: 32px; font-weight: 600; color: var(--white); margin-bottom: 16px;">Download CompactAI Studio</h2>
@@ -595,7 +772,7 @@
595
  <div class="spec-label">Model Dimension</div>
596
  </div>
597
  <div class="spec-card">
598
- <div class="spec-value">256</div>
599
  <div class="spec-label">FFN Dimension</div>
600
  </div>
601
  </div>
@@ -611,23 +788,28 @@
611
  <div class="feature-grid">
612
  <div class="feature-card">
613
  <div class="feature-icon">M</div>
614
- <h3>External Memory Module</h3>
615
- <p>A recurrent memory module with 32-dimensional memory vectors is baked into the architecture, but currently disabled during SFT training due to AOT autograd compatibility. The architecture supports it the training pipeline just isn't cooperating yet.</p>
616
  </div>
617
  <div class="feature-card">
618
  <div class="feature-icon">C</div>
619
- <h3>Precision Codebook</h3>
620
- <p>Tied weight embeddings with a learnable per-token output bias. Instead of a separate codebook projection, the model ties input embeddings to output weights and learns a bias vector to compensate for word-token suppression. Simple, parameter-efficient, and surprisingly effective.</p>
621
  </div>
622
  <div class="feature-card">
623
  <div class="feature-icon">T</div>
624
  <h3>Makeshift MTP</h3>
625
- <p>Multi-token prediction adapters with horizon 2 are wired into the architecture but currently run with weight 0.0. They're there for future experiments. Think of them as emergency exits that nobody's allowed to use yet.</p>
626
  </div>
627
  <div class="feature-card">
628
  <div class="feature-icon">R</div>
629
  <h3>RTX 5090 Optimized</h3>
630
- <p>Tuned for RTX 5090 with chunked sliding-window attention (1024 window, 256 chunk), bf16 mixed precision, and batch size 64. torch.compile and gradient checkpointing are available but disabled for the Haiku tier — stability over speed.</p>
 
 
 
 
 
631
  </div>
632
  </div>
633
  </div>
@@ -650,14 +832,21 @@
650
  <div class="arch-layer">
651
  <div class="arch-box main">
652
  <span>Transformer Block ×6</span>
653
- <small>Memory Module (Disabled)</small>
 
 
 
 
 
 
 
654
  </div>
655
  </div>
656
  <div class="arch-arrow">↓</div>
657
  <div class="arch-layer">
658
  <div class="arch-box">
659
  <span>Tied Output Head</span>
660
- <small>Learnable Bias</small>
661
  </div>
662
  </div>
663
  <div class="arch-arrow">↓</div>
@@ -678,15 +867,15 @@
678
  </div>
679
  <div class="arch-detail">
680
  <span class="arch-detail-label">ffn_dim</span>
681
- <span class="arch-detail-value">256</span>
682
  </div>
683
  <div class="arch-detail">
684
- <span class="arch-detail-label">memory_dim</span>
685
- <span class="arch-detail-value">32</span>
686
  </div>
687
  <div class="arch-detail">
688
- <span class="arch-detail-label">code_dim</span>
689
- <span class="arch-detail-value">32</span>
690
  </div>
691
  <div class="arch-detail">
692
  <span class="arch-detail-label">seq_len</span>
@@ -704,7 +893,7 @@
704
  <p>Three tiers following Chinchilla scaling. Yes, we borrowed the naming scheme. No, we're not sorry.</p>
705
  </div>
706
  <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(300px, 1fr)); gap: 24px; margin-top: 48px;">
707
- <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px;">
708
  <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
709
  <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Haiku</span>
710
  <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~1M params</span>
@@ -714,12 +903,12 @@
714
  <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">160</span>
715
  <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">6</span>
716
  <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">4</span>
717
- <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">256</span>
718
  <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
719
- <span style="color: var(--gray-5);">pretrain tokens</span><span style="color: var(--gray-7);">~1B</span>
720
  </div>
721
  </div>
722
- <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px;">
723
  <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
724
  <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Sonnet</span>
725
  <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~300M params</span>
@@ -729,12 +918,12 @@
729
  <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">768</span>
730
  <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">36</span>
731
  <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">12</span>
732
- <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">2,560</span>
733
  <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
734
- <span style="color: var(--gray-5);">pretrain tokens</span><span style="color: var(--gray-7);">~300B</span>
735
  </div>
736
  </div>
737
- <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px;">
738
  <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
739
  <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Opus</span>
740
  <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~600M params</span>
@@ -744,9 +933,9 @@
744
  <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">1,024</span>
745
  <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">39</span>
746
  <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">16</span>
747
- <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">3,584</span>
748
  <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
749
- <span style="color: var(--gray-5);">pretrain tokens</span><span style="color: var(--gray-7);">~600B</span>
750
  </div>
751
  </div>
752
  </div>
@@ -777,6 +966,72 @@
777
  </div>
778
  </div>
779
  </section>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
780
  </main>
781
 
782
  <footer>
 
91
  display: flex;
92
  align-items: center;
93
  gap: 8px;
94
+ transition: all 0.3s ease;
95
+ }
96
+
97
+ .nav-brand:hover {
98
+ text-shadow: 0 0 20px rgba(255, 77, 0, 0.5);
99
  }
100
 
101
  .nav-brand span {
102
  color: var(--accent);
103
+ transition: all 0.3s ease;
104
+ }
105
+
106
+ .nav-brand:hover span {
107
+ text-shadow: 0 0 20px rgba(255, 77, 0, 0.8);
108
  }
109
 
110
  .nav-links {
 
116
  font-size: 14px;
117
  font-weight: 500;
118
  color: var(--gray-6);
119
+ position: relative;
120
+ }
121
+
122
+ .nav-links a::after {
123
+ content: '';
124
+ position: absolute;
125
+ bottom: -4px;
126
+ left: 0;
127
+ width: 0;
128
+ height: 2px;
129
+ background: var(--accent);
130
+ transition: width 0.3s ease;
131
  }
132
 
133
  .nav-links a:hover {
134
  color: var(--white);
135
  }
136
 
137
+ .nav-links a:hover::after {
138
+ width: 100%;
139
+ }
140
+
141
  /* Hero */
142
  .hero {
143
  min-height: 100vh;
 
159
  bottom: 0;
160
  background: radial-gradient(ellipse 80% 50% at 50% -20%, rgba(255, 77, 0, 0.08), transparent);
161
  pointer-events: none;
162
+ animation: glowPulse 8s ease-in-out infinite;
163
+ }
164
+
165
+ @keyframes glowPulse {
166
+ 0%, 100% { opacity: 1; }
167
+ 50% { opacity: 0.6; }
168
  }
169
 
170
  .hero-badge {
 
189
  }
190
 
191
  @keyframes pulse {
192
+ 0%, 100% { opacity: 1; transform: scale(1); }
193
+ 50% { opacity: 0.4; transform: scale(0.9); }
194
+ }
195
+
196
+ .hero-content {
197
+ opacity: 0;
198
+ animation: fadeSlideUp 0.8s ease forwards;
199
+ }
200
+
201
+ .hero-badge {
202
+ opacity: 0;
203
+ animation: fadeSlideUp 0.8s ease 0.1s forwards;
204
+ }
205
+
206
+ .hero h1 {
207
+ opacity: 0;
208
+ animation: fadeSlideUp 0.8s ease 0.2s forwards;
209
+ }
210
+
211
+ .hero p {
212
+ opacity: 0;
213
+ animation: fadeSlideUp 0.8s ease 0.3s forwards;
214
+ }
215
+
216
+ .hero-cta {
217
+ opacity: 0;
218
+ animation: fadeSlideUp 0.8s ease 0.4s forwards;
219
+ }
220
+
221
+ @keyframes fadeSlideUp {
222
+ from {
223
+ opacity: 0;
224
+ transform: translateY(20px);
225
+ }
226
+ to {
227
+ opacity: 1;
228
+ transform: translateY(0);
229
+ }
230
  }
231
 
232
  .hero h1 {
 
273
  background: var(--white);
274
  color: var(--black);
275
  border: none;
276
+ position: relative;
277
+ overflow: hidden;
278
+ }
279
+
280
+ .btn-primary::before {
281
+ content: '';
282
+ position: absolute;
283
+ top: 0;
284
+ left: -100%;
285
+ width: 100%;
286
+ height: 100%;
287
+ background: linear-gradient(90deg, transparent, rgba(255, 255, 255, 0.2), transparent);
288
+ transition: left 0.5s ease;
289
+ }
290
+
291
+ .btn-primary:hover::before {
292
+ left: 100%;
293
  }
294
 
295
  .btn-primary:hover {
296
  background: var(--gray-7);
297
  color: var(--black);
298
+ transform: translateY(-2px);
299
+ box-shadow: 0 8px 20px rgba(255, 255, 255, 0.15);
300
  }
301
 
302
  .btn-secondary {
303
  background: transparent;
304
  color: var(--gray-7);
305
  border: 1px solid var(--gray-3);
306
+ position: relative;
307
+ overflow: hidden;
308
+ }
309
+
310
+ .btn-secondary::before {
311
+ content: '';
312
+ position: absolute;
313
+ top: 0;
314
+ left: -100%;
315
+ width: 100%;
316
+ height: 100%;
317
+ background: linear-gradient(90deg, transparent, rgba(255, 77, 0, 0.1), transparent);
318
+ transition: left 0.5s ease;
319
+ }
320
+
321
+ .btn-secondary:hover::before {
322
+ left: 100%;
323
  }
324
 
325
  .btn-secondary:hover {
326
+ border-color: var(--accent);
327
  color: var(--white);
328
+ transform: translateY(-2px);
329
  }
330
 
331
  /* Specs */
 
348
  .spec-card {
349
  background: var(--black-soft);
350
  padding: 32px;
351
+ opacity: 0;
352
+ animation: fadeScale 0.6s ease forwards;
353
+ }
354
+
355
+ .spec-card:nth-child(1) { animation-delay: 0.1s; }
356
+ .spec-card:nth-child(2) { animation-delay: 0.15s; }
357
+ .spec-card:nth-child(3) { animation-delay: 0.2s; }
358
+ .spec-card:nth-child(4) { animation-delay: 0.25s; }
359
+ .spec-card:nth-child(5) { animation-delay: 0.3s; }
360
+ .spec-card:nth-child(6) { animation-delay: 0.35s; }
361
+
362
+ @keyframes fadeScale {
363
+ from {
364
+ opacity: 0;
365
+ transform: scale(0.95);
366
+ }
367
+ to {
368
+ opacity: 1;
369
+ transform: scale(1);
370
+ }
371
  }
372
 
373
  .spec-value {
 
393
 
394
  .section-header {
395
  margin-bottom: 64px;
396
+ opacity: 0;
397
+ animation: fadeSlideUp 0.6s ease forwards;
398
  }
399
 
400
  .section-header h2 {
 
422
  border: 1px solid var(--gray-2);
423
  border-radius: 12px;
424
  padding: 32px;
425
+ transition: all 0.3s ease;
426
+ opacity: 0;
427
+ animation: fadeSlideUp 0.6s ease forwards;
428
  }
429
 
430
+ .feature-card:nth-child(1) { animation-delay: 0.1s; }
431
+ .feature-card:nth-child(2) { animation-delay: 0.2s; }
432
+ .feature-card:nth-child(3) { animation-delay: 0.3s; }
433
+ .feature-card:nth-child(4) { animation-delay: 0.4s; }
434
+ .feature-card:nth-child(5) { animation-delay: 0.5s; }
435
+
436
  .feature-card:hover {
437
+ border-color: var(--accent);
438
+ transform: translateY(-4px);
439
+ box-shadow: 0 20px 40px rgba(255, 77, 0, 0.1);
440
  }
441
 
442
  .feature-icon {
 
450
  justify-content: center;
451
  margin-bottom: 20px;
452
  font-size: 24px;
453
+ transition: all 0.3s ease;
454
+ }
455
+
456
+ .feature-card:hover .feature-icon {
457
+ border-color: var(--accent);
458
+ box-shadow: 0 0 20px rgba(255, 77, 0, 0.2);
459
  }
460
 
461
  .feature-card h3 {
 
571
  border: 1px solid var(--gray-2);
572
  border-radius: 12px;
573
  overflow: hidden;
574
+ opacity: 0;
575
+ animation: fadeSlideUp 0.8s ease 0.2s forwards;
576
  }
577
 
578
  .demo-header {
 
648
  color: var(--accent);
649
  }
650
 
651
+ .model-card {
652
+ opacity: 0;
653
+ animation: fadeSlideUp 0.6s ease forwards;
654
+ }
655
+
656
+ .model-card:nth-child(1) { animation-delay: 0.1s; }
657
+ .model-card:nth-child(2) { animation-delay: 0.2s; }
658
+ .model-card:nth-child(3) { animation-delay: 0.3s; }
659
+
660
+ .model-card:hover {
661
+ border-color: var(--accent);
662
+ transform: translateY(-4px);
663
+ box-shadow: 0 20px 40px rgba(255, 77, 0, 0.1);
664
+ }
665
+
666
  /* Responsive */
667
  @media (max-width: 768px) {
668
  .hero h1 {
 
693
  <a href="downloads/index.html">Download</a>
694
  <a href="blog.html">Blog</a>
695
  <a href="status.html">Status</a>
 
696
  </div>
697
  </div>
698
  </nav>
 
705
  Training on RTX 5090
706
  </div>
707
  <h1>A ~1M Parameter Model<br>with <span class="highlight">2K Context</span></h1>
708
+ <p>TinyMemoryLM is a hybrid word-character transformer trained on RTX 5090. Features recurrent memory, precision codebook output head, and DeepSeek-V3 style MTP. It learns to remember things not because it's smart, but because we gave it external memory. And a codebook. And multi-token prediction. It still forgets where it put its keys though.</p>
709
  <div class="hero-cta">
710
  <a href="status.html" class="btn btn-primary">View Training Status</a>
711
  <a href="blog.html" class="btn btn-secondary">Read the Blog</a>
 
713
  </div>
714
  </section>
715
 
716
+ <!-- Model Series Banner -->
717
+ <section style="background: linear-gradient(90deg, var(--accent) 0%, #ff6a2a 100%); padding: 24px 0;">
718
+ <div class="container" style="text-align: center;">
719
+ <p style="color: var(--white); font-size: 16px; font-weight: 600; margin-bottom: 8px;">TRAINING THREE MODEL TIERS</p>
720
+ <div style="display: flex; gap: 24px; justify-content: center; flex-wrap: wrap; align-items: center;">
721
+ <span style="background: rgba(0,0,0,0.3); padding: 8px 20px; border-radius: 8px; color: var(--white); font-weight: 500;">
722
+ <strong style="font-size: 18px;">Haiku</strong> <span style="opacity: 0.9;">~1M params — Live</span>
723
+ </span>
724
+ <span style="background: rgba(0,0,0,0.3); padding: 8px 20px; border-radius: 8px; color: var(--white); font-weight: 500;">
725
+ <strong style="font-size: 18px;">Sonnet</strong> <span style="opacity: 0.9;">~300M params — In Training</span>
726
+ </span>
727
+ <span style="background: rgba(0,0,0,0.3); padding: 8px 20px; border-radius: 8px; color: var(--white); font-weight: 500;">
728
+ <strong style="font-size: 18px;">Opus</strong> <span style="opacity: 0.9;">~600M params — In Training</span>
729
+ </span>
730
+ </div>
731
+ </div>
732
+ </section>
733
+
734
  <section class="download-section" style="padding: 80px 0; background: linear-gradient(180deg, var(--black-soft) 0%, var(--black) 100%); border-bottom: 1px solid var(--gray-2);">
735
  <div class="container" style="text-align: center;">
736
  <h2 style="font-size: 32px; font-weight: 600; color: var(--white); margin-bottom: 16px;">Download CompactAI Studio</h2>
 
772
  <div class="spec-label">Model Dimension</div>
773
  </div>
774
  <div class="spec-card">
775
+ <div class="spec-value">229</div>
776
  <div class="spec-label">FFN Dimension</div>
777
  </div>
778
  </div>
 
788
  <div class="feature-grid">
789
  <div class="feature-card">
790
  <div class="feature-icon">M</div>
791
+ <h3>Recurrent Memory (Chunk-GRU)</h3>
792
+ <p>A recurrent memory module with chunk-level GRU processing is integrated into the architecture. Processes sequential chunks to maintain memory across the context window, giving the model external memory capabilities beyond what attention can handle.</p>
793
  </div>
794
  <div class="feature-card">
795
  <div class="feature-icon">C</div>
796
+ <h3>Precision Codebook Output Head</h3>
797
+ <p>Tied weight embeddings with a learnable per-token output bias. Instead of a separate codebook projection, the model ties input embeddings to output weights and learns a 2111-parameter bias vector to compensate for word-token suppression. Simple, parameter-efficient, and surprisingly effective.</p>
798
  </div>
799
  <div class="feature-card">
800
  <div class="feature-icon">T</div>
801
  <h3>Makeshift MTP</h3>
802
+ <p>DeepSeek-V3 style Multi-Token Prediction with horizons (2, 3, 4). MTP adapters learn to predict multiple future tokens simultaneously, improving sample quality through branch selection during generation. Pretrain weight: 0.3, SFT weight: 0.3.</p>
803
  </div>
804
  <div class="feature-card">
805
  <div class="feature-icon">R</div>
806
  <h3>RTX 5090 Optimized</h3>
807
+ <p>Tuned for RTX 5090 with flash attention, bf16 mixed precision, and batch size 64. Uses PyTorch Inductor with coordinate_descent_tuning enabled. Gradient checkpointing and torch.compile are available but disabled for Haiku tier — stability over speed.</p>
808
+ </div>
809
+ <div class="feature-card">
810
+ <div class="feature-icon">H</div>
811
+ <h3>Hybrid Word-Character Tokenizer</h3>
812
+ <p>Word-level tokenizer with ~2111 tokens. Scans datasets for top 2000 frequent words to achieve 3-4x compression vs pure character-level. Supports special format tokens for instruction tuning: &lt;|user|&gt;, &lt;|assistant|&gt;, &lt;|system|&gt;, &lt;|begin_of_thought|&gt;, &lt;|end_of_thought|&gt;.</p>
813
  </div>
814
  </div>
815
  </div>
 
832
  <div class="arch-layer">
833
  <div class="arch-box main">
834
  <span>Transformer Block ×6</span>
835
+ <small>RMSNorm, QK-Norm, SwiGLU FFN</small>
836
+ </div>
837
+ </div>
838
+ <div class="arch-arrow">↓</div>
839
+ <div class="arch-layer">
840
+ <div class="arch-box main">
841
+ <span>MTP Adapters ×3</span>
842
+ <small>Horizons 2, 3, 4</small>
843
  </div>
844
  </div>
845
  <div class="arch-arrow">↓</div>
846
  <div class="arch-layer">
847
  <div class="arch-box">
848
  <span>Tied Output Head</span>
849
+ <small>Learnable Bias (2111 params)</small>
850
  </div>
851
  </div>
852
  <div class="arch-arrow">↓</div>
 
867
  </div>
868
  <div class="arch-detail">
869
  <span class="arch-detail-label">ffn_dim</span>
870
+ <span class="arch-detail-value">229</span>
871
  </div>
872
  <div class="arch-detail">
873
+ <span class="arch-detail-label">mtp_horizons</span>
874
+ <span class="arch-detail-value">[2, 3, 4]</span>
875
  </div>
876
  <div class="arch-detail">
877
+ <span class="arch-detail-label">vocab_size</span>
878
+ <span class="arch-detail-value">~2111</span>
879
  </div>
880
  <div class="arch-detail">
881
  <span class="arch-detail-label">seq_len</span>
 
893
  <p>Three tiers following Chinchilla scaling. Yes, we borrowed the naming scheme. No, we're not sorry.</p>
894
  </div>
895
  <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(300px, 1fr)); gap: 24px; margin-top: 48px;">
896
+ <div class="model-card" style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px; transition: all 0.3s ease;">
897
  <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
898
  <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Haiku</span>
899
  <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~1M params</span>
 
903
  <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">160</span>
904
  <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">6</span>
905
  <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">4</span>
906
+ <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">229</span>
907
  <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
908
+ <span style="color: var(--gray-5);">lr</span><span style="color: var(--gray-7);">8e-4</span>
909
  </div>
910
  </div>
911
+ <div class="model-card" style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px; transition: all 0.3s ease;">
912
  <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
913
  <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Sonnet</span>
914
  <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~300M params</span>
 
918
  <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">768</span>
919
  <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">36</span>
920
  <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">12</span>
921
+ <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">2,538</span>
922
  <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
923
+ <span style="color: var(--gray-5);">lr</span><span style="color: var(--gray-7);">2e-4</span>
924
  </div>
925
  </div>
926
+ <div class="model-card" style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px; transition: all 0.3s ease;">
927
  <div style="display: flex; align-items: center; gap: 12px; margin-bottom: 16px;">
928
  <span style="font-size: 32px; font-weight: 700; color: var(--accent);">Opus</span>
929
  <span style="font-size: 13px; color: var(--gray-5); background: var(--gray-2); padding: 4px 10px; border-radius: 6px;">~600M params</span>
 
933
  <span style="color: var(--gray-5);">dim</span><span style="color: var(--gray-7);">1,024</span>
934
  <span style="color: var(--gray-5);">layers</span><span style="color: var(--gray-7);">39</span>
935
  <span style="color: var(--gray-5);">heads</span><span style="color: var(--gray-7);">16</span>
936
+ <span style="color: var(--gray-5);">ffn_dim</span><span style="color: var(--gray-7);">3,557</span>
937
  <span style="color: var(--gray-5);">context</span><span style="color: var(--gray-7);">2,048</span>
938
+ <span style="color: var(--gray-5);">lr</span><span style="color: var(--gray-7);">1.6e-4</span>
939
  </div>
940
  </div>
941
  </div>
 
966
  </div>
967
  </div>
968
  </section>
969
+
970
+ <!-- AIFinder Tool -->
971
+ <section style="padding: 100px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2);">
972
+ <div class="container">
973
+ <div class="section-header">
974
+ <h2>AIFinder</h2>
975
+ <p>A tool that snitches on AI models. Every AI has a writing accent — AIFinder detects it.</p>
976
+ </div>
977
+ <div style="max-width: 700px; margin: 48px auto 0;">
978
+ <div style="background: var(--gray-1); border: 1px solid var(--gray-2); border-radius: 12px; padding: 32px; text-align: center;">
979
+ <div style="font-size: 48px; margin-bottom: 16px;">🔍</div>
980
+ <h3 style="font-size: 24px; color: var(--white); margin-bottom: 16px;">Which AI Wrote This?</h3>
981
+ <p style="color: var(--gray-5); margin-bottom: 24px; line-height: 1.7;">Paste any AI-generated text and AIFinder will guess which lab made it. Google, Anthropic, OpenAI, DeepSeek, xAI, and more. It learns from corrections. The more you use it, the smarter it gets.</p>
982
+
983
+ <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(120px, 1fr)); gap: 12px; margin-bottom: 32px;">
984
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">Anthropic</span>
985
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">DeepSeek</span>
986
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">Google</span>
987
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">OpenAI</span>
988
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">xAI</span>
989
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">Mistral</span>
990
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">MiniMax</span>
991
+ <span style="background: var(--gray-2); padding: 8px 12px; border-radius: 6px; font-size: 13px; color: var(--gray-6);">+4 more</span>
992
+ </div>
993
+
994
+ <div style="display: flex; gap: 16px; justify-content: center; flex-wrap: wrap;">
995
+ <a href="https://huggingface.co/spaces/CompactAI/AIFinder" target="_blank" class="btn btn-primary" style="padding: 14px 28px; font-size: 15px;">
996
+ Try AIFinder Free
997
+ </a>
998
+ <a href="blog-AIFinder.html" class="btn btn-secondary" style="padding: 14px 28px; font-size: 15px;">
999
+ Read the Blog
1000
+ </a>
1001
+ </div>
1002
+
1003
+ <p style="margin-top: 20px; font-size: 13px; color: var(--gray-5);">
1004
+ Free API available · 60 requests/min · No API key required
1005
+ </p>
1006
+
1007
+ <!-- Yes We Know It Sucks -->
1008
+ <div style="margin-top: 32px; padding-top: 24px; border-top: 1px solid var(--gray-2);">
1009
+ <h4 style="font-size: 18px; font-weight: 700; color: var(--accent); margin-bottom: 12px;">YES WE KNOW IT SUCKS</h4>
1010
+ <p style="font-size: 14px; color: var(--gray-5); margin-bottom: 16px; line-height: 1.6;">
1011
+ The tool guesses wrong sometimes. It confuses Anthropic with OpenAI.
1012
+ It confidently identifies Google as DeepSeek. It's basically a parrot with an opinion.
1013
+ </p>
1014
+ <p style="font-size: 14px; color: var(--gray-5); margin-bottom: 16px; line-height: 1.6;">
1015
+ <strong style="color: var(--gray-7);">Pro tip:</strong> Ask it math and reasoning questions. That's what we trained it on —
1016
+ huge amounts of TeichAI datasets (check them out at <a href="https://huggingface.co/TeichAI" target="_blank" style="color: var(--accent);">huggingface.co/TeichAI</a>).
1017
+ It is noticeably better at detecting which math-happy lab produced the output.
1018
+ </p>
1019
+ <div style="background: var(--gray-2); border-radius: 8px; padding: 16px;">
1020
+ <p style="font-size: 13px; color: var(--gray-6); margin-bottom: 8px;">
1021
+ That said, I have an AI working on fixing it. I couldn't be bothered to do it manually.
1022
+ </p>
1023
+ <p style="font-size: 24px; font-weight: 700; color: var(--white); font-family: var(--font-mono);">
1024
+ 7+ hours
1025
+ </p>
1026
+ <p style="font-size: 12px; color: var(--gray-5); margin-top: 8px;">
1027
+ The AI is trying its best. Poor thing.
1028
+ </p>
1029
+ </div>
1030
+ </div>
1031
+ </div>
1032
+ </div>
1033
+ </div>
1034
+ </section>
1035
  </main>
1036
 
1037
  <footer>
status.html CHANGED
@@ -93,10 +93,20 @@
93
  display: flex;
94
  align-items: center;
95
  gap: 8px;
 
 
 
 
 
96
  }
97
 
98
  .nav-brand span {
99
  color: var(--accent);
 
 
 
 
 
100
  }
101
 
102
  .nav-links {
@@ -108,12 +118,28 @@
108
  font-size: 14px;
109
  font-weight: 500;
110
  color: var(--gray-6);
 
 
 
 
 
 
 
 
 
 
 
 
111
  }
112
 
113
  .nav-links a:hover {
114
  color: var(--white);
115
  }
116
 
 
 
 
 
117
  /* Page Header */
118
  .page-header {
119
  padding: 140px 0 60px;
@@ -127,12 +153,27 @@
127
  color: var(--white);
128
  margin-bottom: 16px;
129
  letter-spacing: -0.02em;
 
 
130
  }
131
 
132
  .page-header p {
133
  font-size: 18px;
134
  color: var(--gray-5);
135
  max-width: 500px;
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  }
137
 
138
  /* Status Section */
@@ -151,8 +192,22 @@
151
  border: 1px solid var(--gray-2);
152
  border-radius: 12px;
153
  padding: 32px;
 
 
 
 
 
 
 
 
154
  }
155
 
 
 
 
 
 
 
156
  .status-header {
157
  display: flex;
158
  align-items: center;
@@ -388,11 +443,14 @@
388
  background: var(--white);
389
  color: var(--black);
390
  border: none;
 
391
  }
392
 
393
  .btn-primary:hover {
394
  background: var(--gray-7);
395
  color: var(--black);
 
 
396
  }
397
 
398
  /* Footer */
@@ -505,16 +563,16 @@
505
  </div>
506
  <div class="features-grid">
507
  <div class="feature-item">
508
- <span class="feature-name">External Memory</span>
509
- <span class="feature-status disabled">Disabled</span>
510
  </div>
511
  <div class="feature-item">
512
- <span class="feature-name">Precision Codebook</span>
513
- <span class="feature-status disabled">Disabled</span>
514
  </div>
515
  <div class="feature-item">
516
  <span class="feature-name">Makeshift MTP</span>
517
- <span class="feature-status disabled">Disabled (weight=0.0)</span>
518
  </div>
519
  <div class="feature-item">
520
  <span class="feature-name">Gradient Checkpointing</span>
@@ -530,7 +588,7 @@
530
  </div>
531
  <div class="feature-item">
532
  <span class="feature-name">Flash Attention</span>
533
- <span class="feature-status disabled">Not Used</span>
534
  </div>
535
  <div class="feature-item">
536
  <span class="feature-name">Repetition Penalty</span>
@@ -556,30 +614,38 @@
556
  <span class="feature-name">Entropy Regularization</span>
557
  <span class="feature-status disabled">Disabled</span>
558
  </div>
 
 
 
 
 
 
 
 
559
  </div>
560
  </div>
561
 
562
- <!-- Memory Configuration -->
563
  <div class="status-card">
564
  <div class="status-header">
565
- <h3>Memory Module Config (Disabled)</h3>
566
  </div>
567
  <div class="specs-grid">
568
  <div class="spec-item">
569
- <div class="spec-value">&mdash;</div>
570
- <div class="spec-label">Memory Slots</div>
571
  </div>
572
  <div class="spec-item">
573
  <div class="spec-value">32</div>
574
  <div class="spec-label">Memory Dim</div>
575
  </div>
576
  <div class="spec-item">
577
- <div class="spec-value">4</div>
578
- <div class="spec-label">Top K</div>
579
  </div>
580
  <div class="spec-item">
581
- <div class="spec-value">8</div>
582
- <div class="spec-label">Writes/Step</div>
583
  </div>
584
  </div>
585
  </div>
 
93
  display: flex;
94
  align-items: center;
95
  gap: 8px;
96
+ transition: all 0.3s ease;
97
+ }
98
+
99
+ .nav-brand:hover {
100
+ text-shadow: 0 0 20px rgba(255, 77, 0, 0.5);
101
  }
102
 
103
  .nav-brand span {
104
  color: var(--accent);
105
+ transition: all 0.3s ease;
106
+ }
107
+
108
+ .nav-brand:hover span {
109
+ text-shadow: 0 0 20px rgba(255, 77, 0, 0.8);
110
  }
111
 
112
  .nav-links {
 
118
  font-size: 14px;
119
  font-weight: 500;
120
  color: var(--gray-6);
121
+ position: relative;
122
+ }
123
+
124
+ .nav-links a::after {
125
+ content: '';
126
+ position: absolute;
127
+ bottom: -4px;
128
+ left: 0;
129
+ width: 0;
130
+ height: 2px;
131
+ background: var(--accent);
132
+ transition: width 0.3s ease;
133
  }
134
 
135
  .nav-links a:hover {
136
  color: var(--white);
137
  }
138
 
139
+ .nav-links a:hover::after {
140
+ width: 100%;
141
+ }
142
+
143
  /* Page Header */
144
  .page-header {
145
  padding: 140px 0 60px;
 
153
  color: var(--white);
154
  margin-bottom: 16px;
155
  letter-spacing: -0.02em;
156
+ opacity: 0;
157
+ animation: fadeSlideUp 0.8s ease 0.1s forwards;
158
  }
159
 
160
  .page-header p {
161
  font-size: 18px;
162
  color: var(--gray-5);
163
  max-width: 500px;
164
+ opacity: 0;
165
+ animation: fadeSlideUp 0.8s ease 0.2s forwards;
166
+ }
167
+
168
+ @keyframes fadeSlideUp {
169
+ from {
170
+ opacity: 0;
171
+ transform: translateY(20px);
172
+ }
173
+ to {
174
+ opacity: 1;
175
+ transform: translateY(0);
176
+ }
177
  }
178
 
179
  /* Status Section */
 
192
  border: 1px solid var(--gray-2);
193
  border-radius: 12px;
194
  padding: 32px;
195
+ opacity: 0;
196
+ animation: fadeSlideUp 0.6s ease forwards;
197
+ transition: all 0.3s ease;
198
+ }
199
+
200
+ .status-card:hover {
201
+ border-color: var(--accent);
202
+ transform: translateY(-2px);
203
  }
204
 
205
+ .status-card:nth-child(1) { animation-delay: 0.1s; }
206
+ .status-card:nth-child(2) { animation-delay: 0.2s; }
207
+ .status-card:nth-child(3) { animation-delay: 0.3s; }
208
+ .status-card:nth-child(4) { animation-delay: 0.4s; }
209
+ .status-card:nth-child(5) { animation-delay: 0.5s; }
210
+
211
  .status-header {
212
  display: flex;
213
  align-items: center;
 
443
  background: var(--white);
444
  color: var(--black);
445
  border: none;
446
+ transition: all 0.3s ease;
447
  }
448
 
449
  .btn-primary:hover {
450
  background: var(--gray-7);
451
  color: var(--black);
452
+ transform: translateY(-2px);
453
+ box-shadow: 0 8px 20px rgba(255, 255, 255, 0.15);
454
  }
455
 
456
  /* Footer */
 
563
  </div>
564
  <div class="features-grid">
565
  <div class="feature-item">
566
+ <span class="feature-name">Recurrent Memory (Chunk-GRU)</span>
567
+ <span class="feature-status enabled">Enabled</span>
568
  </div>
569
  <div class="feature-item">
570
+ <span class="feature-name">Precision Codebook (Output Bias)</span>
571
+ <span class="feature-status enabled">Enabled (2111 params)</span>
572
  </div>
573
  <div class="feature-item">
574
  <span class="feature-name">Makeshift MTP</span>
575
+ <span class="feature-status enabled">Enabled (horizons: 2,3,4, weight: 0.3)</span>
576
  </div>
577
  <div class="feature-item">
578
  <span class="feature-name">Gradient Checkpointing</span>
 
588
  </div>
589
  <div class="feature-item">
590
  <span class="feature-name">Flash Attention</span>
591
+ <span class="feature-status enabled">Enabled</span>
592
  </div>
593
  <div class="feature-item">
594
  <span class="feature-name">Repetition Penalty</span>
 
614
  <span class="feature-name">Entropy Regularization</span>
615
  <span class="feature-status disabled">Disabled</span>
616
  </div>
617
+ <div class="feature-item">
618
+ <span class="feature-name">QK-Norm (RMSNorm)</span>
619
+ <span class="feature-status enabled">Enabled</span>
620
+ </div>
621
+ <div class="feature-item">
622
+ <span class="feature-name">SwiGLU FFN</span>
623
+ <span class="feature-status enabled">Enabled</span>
624
+ </div>
625
  </div>
626
  </div>
627
 
628
+ <!-- Recurrent Memory Configuration -->
629
  <div class="status-card">
630
  <div class="status-header">
631
+ <h3>Recurrent Memory (Chunk-GRU)</h3>
632
  </div>
633
  <div class="specs-grid">
634
  <div class="spec-item">
635
+ <div class="spec-value">8</div>
636
+ <div class="spec-label">Chunk Size</div>
637
  </div>
638
  <div class="spec-item">
639
  <div class="spec-value">32</div>
640
  <div class="spec-label">Memory Dim</div>
641
  </div>
642
  <div class="spec-item">
643
+ <div class="spec-value">GRU</div>
644
+ <div class="spec-label">Cell Type</div>
645
  </div>
646
  <div class="spec-item">
647
+ <div class="spec-value">4</div>
648
+ <div class="spec-label">Layers</div>
649
  </div>
650
  </div>
651
  </div>