Delete index.html
Browse files- index.html +0 -531
index.html
DELETED
|
@@ -1,531 +0,0 @@
|
|
| 1 |
-
<!DOCTYPE html>
|
| 2 |
-
<html lang="en">
|
| 3 |
-
<head>
|
| 4 |
-
<meta charset="UTF-8">
|
| 5 |
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
-
<title>FMN-GPT | CompactAI</title>
|
| 7 |
-
<link rel="stylesheet" href="styles.css">
|
| 8 |
-
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
|
| 9 |
-
</head>
|
| 10 |
-
<body>
|
| 11 |
-
<nav class="main-nav">
|
| 12 |
-
<div class="container">
|
| 13 |
-
<a href="index.html" class="nav-brand">FMN-GPT</a>
|
| 14 |
-
<div class="nav-links">
|
| 15 |
-
<a href="blog.html">Blog</a>
|
| 16 |
-
<a href="status.html">Model Status</a>
|
| 17 |
-
<a href="https://huggingface.co/CompactAI" target="_blank">HuggingFace</a>
|
| 18 |
-
</div>
|
| 19 |
-
</div>
|
| 20 |
-
</nav>
|
| 21 |
-
|
| 22 |
-
<main>
|
| 23 |
-
<!-- Hero Section -->
|
| 24 |
-
<section class="hero">
|
| 25 |
-
<div class="hero-content">
|
| 26 |
-
<div class="apology-badge">A New Approach to Small Models</div>
|
| 27 |
-
<h1 class="hero-title">Why I Stopped<br><span class="highlight">Compressing Models</span></h1>
|
| 28 |
-
<p class="hero-subtitle">And started building something that doesn't need 8 H100s to think</p>
|
| 29 |
-
<div class="scroll-indicator">
|
| 30 |
-
<span>Scroll for the truth</span>
|
| 31 |
-
<div class="scroll-arrow"></div>
|
| 32 |
-
</div>
|
| 33 |
-
</div>
|
| 34 |
-
<div class="hero-visual">
|
| 35 |
-
<canvas id="neuron-canvas"></canvas>
|
| 36 |
-
</div>
|
| 37 |
-
</section>
|
| 38 |
-
|
| 39 |
-
<!-- A Note Before We Begin -->
|
| 40 |
-
<section class="preface" id="preface">
|
| 41 |
-
<div class="container">
|
| 42 |
-
<h2 class="section-title">A Note Before We Begin</h2>
|
| 43 |
-
<div class="preface-content">
|
| 44 |
-
<p class="drop-cap">You might have noticed that everything on my profile got deleted. That was intentional. Let me explain why.</p>
|
| 45 |
-
|
| 46 |
-
<p>I used to fill my profile with compressed models. Dozens of them. But I realized that quantity was masking the real problem. I wasn't building anything new. Just cloning and shrinking other people's work.</p>
|
| 47 |
-
|
| 48 |
-
<p>So I wiped the slate clean. A fresh start. This time, I'm here to <strong>build from scratch</strong>.</p>
|
| 49 |
-
</div>
|
| 50 |
-
</div>
|
| 51 |
-
</section>
|
| 52 |
-
|
| 53 |
-
<!-- The Confession -->
|
| 54 |
-
<section class="confession" id="confession">
|
| 55 |
-
<div class="container">
|
| 56 |
-
<h2 class="section-title">The Confession</h2>
|
| 57 |
-
<div class="confession-content">
|
| 58 |
-
<div class="confession-text">
|
| 59 |
-
<p class="drop-cap">Let me be honest with you. For months, I was that person. You know the one. Leaving my computer on overnight, running distillation scripts, trying to squeeze GPT-4 into something that could run on a potato.</p>
|
| 60 |
-
|
| 61 |
-
<p>And you know what? <strong>It was boring.</strong></p>
|
| 62 |
-
|
| 63 |
-
<p>Who actually enjoys watching loss curves descend at 3 AM? Who gets excited about shaving off 2% of parameters while the model forgets how to count?</p>
|
| 64 |
-
|
| 65 |
-
<blockquote>
|
| 66 |
-
"I was cloning someone else's work and compressing it. The process lacked real creation. Just digital photocopying with extra steps."
|
| 67 |
-
</blockquote>
|
| 68 |
-
|
| 69 |
-
<p>So I stopped. And I started asking a different question:</p>
|
| 70 |
-
|
| 71 |
-
<div class="big-question">
|
| 72 |
-
<span class="question-mark">?</span>
|
| 73 |
-
<p>What if a model could be small <em>by design</em>, avoiding compression entirely?</p>
|
| 74 |
-
</div>
|
| 75 |
-
</div>
|
| 76 |
-
</div>
|
| 77 |
-
</div>
|
| 78 |
-
</section>
|
| 79 |
-
|
| 80 |
-
<!-- What I'm Building -->
|
| 81 |
-
<section class="what-building" id="what">
|
| 82 |
-
<div class="container">
|
| 83 |
-
<h2 class="section-title">What I'm Building Instead</h2>
|
| 84 |
-
|
| 85 |
-
<div class="feature-grid">
|
| 86 |
-
<div class="feature-card main-feature">
|
| 87 |
-
<div class="feature-icon">
|
| 88 |
-
<svg viewBox="0 0 100 100" class="neuron-icon">
|
| 89 |
-
<circle cx="50" cy="30" r="15" class="neuron-node"/>
|
| 90 |
-
<circle cx="25" cy="70" r="12" class="neuron-node"/>
|
| 91 |
-
<circle cx="75" cy="70" r="12" class="neuron-node"/>
|
| 92 |
-
<path d="M50 45 L30 58" class="neuron-connection"/>
|
| 93 |
-
<path d="M50 45 L70 58" class="neuron-connection"/>
|
| 94 |
-
<path d="M37 70 L63 70" class="neuron-connection" stroke-dasharray="5,3"/>
|
| 95 |
-
</svg>
|
| 96 |
-
</div>
|
| 97 |
-
<h3>FMN-GPT</h3>
|
| 98 |
-
<p class="feature-subtitle">Factored Multiplicative Neuron Transformer</p>
|
| 99 |
-
<p>A transformer architecture where each neuron can call backward into the network. A fundamentally different way to think, designed from scratch.</p>
|
| 100 |
-
<div class="feature-stats">
|
| 101 |
-
<div class="stat">
|
| 102 |
-
<span class="stat-value">491</span>
|
| 103 |
-
<span class="stat-label">Vocab Size</span>
|
| 104 |
-
</div>
|
| 105 |
-
<div class="stat">
|
| 106 |
-
<span class="stat-value">65K</span>
|
| 107 |
-
<span class="stat-label">Context Length</span>
|
| 108 |
-
</div>
|
| 109 |
-
<div class="stat">
|
| 110 |
-
<span class="stat-value">120</span>
|
| 111 |
-
<span class="stat-label">Max Loops/Neuron</span>
|
| 112 |
-
</div>
|
| 113 |
-
</div>
|
| 114 |
-
<p class="disclaimer-text">Everything is subject to change.</p>
|
| 115 |
-
</div>
|
| 116 |
-
</div>
|
| 117 |
-
|
| 118 |
-
<!-- Architecture Diagram -->
|
| 119 |
-
<div class="architecture-section">
|
| 120 |
-
<h3>The Architecture</h3>
|
| 121 |
-
<div class="architecture-diagram" id="arch-diagram">
|
| 122 |
-
<div class="arch-layer" data-layer="input">
|
| 123 |
-
<div class="layer-box input-layer">
|
| 124 |
-
<span>Input Embeddings</span>
|
| 125 |
-
</div>
|
| 126 |
-
</div>
|
| 127 |
-
|
| 128 |
-
<div class="arch-arrow">v</div>
|
| 129 |
-
|
| 130 |
-
<div class="arch-layer" data-layer="recursive">
|
| 131 |
-
<div class="layer-box recursive-layer">
|
| 132 |
-
<span>Shared Transformer Block x6</span>
|
| 133 |
-
<div class="layer-details">
|
| 134 |
-
<div class="detail-item">
|
| 135 |
-
<span class="detail-label">Multi-Query Attention</span>
|
| 136 |
-
<span class="detail-value">4 heads</span>
|
| 137 |
-
</div>
|
| 138 |
-
<div class="detail-item">
|
| 139 |
-
<span class="detail-label">FMN Feedforward</span>
|
| 140 |
-
<span class="detail-value">Rank 40</span>
|
| 141 |
-
</div>
|
| 142 |
-
<div class="detail-item">
|
| 143 |
-
<span class="detail-label">Dynamic Routing</span>
|
| 144 |
-
<span class="detail-value">REINFORCE based</span>
|
| 145 |
-
</div>
|
| 146 |
-
</div>
|
| 147 |
-
</div>
|
| 148 |
-
</div>
|
| 149 |
-
|
| 150 |
-
<div class="arch-arrow loop-arrow">
|
| 151 |
-
<span>Recurrent State</span>
|
| 152 |
-
</div>
|
| 153 |
-
|
| 154 |
-
<div class="arch-layer" data-layer="output">
|
| 155 |
-
<div class="layer-box output-layer">
|
| 156 |
-
<span>Output (Weight-Tied)</span>
|
| 157 |
-
</div>
|
| 158 |
-
</div>
|
| 159 |
-
</div>
|
| 160 |
-
</div>
|
| 161 |
-
</div>
|
| 162 |
-
</section>
|
| 163 |
-
|
| 164 |
-
<!-- How It Works -->
|
| 165 |
-
<section class="how-it-works" id="how">
|
| 166 |
-
<div class="container">
|
| 167 |
-
<h2 class="section-title">How It Actually Works</h2>
|
| 168 |
-
|
| 169 |
-
<div class="mechanism-tabs">
|
| 170 |
-
<button class="tab-btn active" data-tab="fmn">FMN Neurons</button>
|
| 171 |
-
<button class="tab-btn" data-tab="routing">Dynamic Routing</button>
|
| 172 |
-
<button class="tab-btn" data-tab="recurrent">Recurrent Mixer</button>
|
| 173 |
-
<button class="tab-btn" data-tab="loops">Loop Counter</button>
|
| 174 |
-
</div>
|
| 175 |
-
|
| 176 |
-
<div class="tab-content">
|
| 177 |
-
<!-- FMN Tab -->
|
| 178 |
-
<div class="tab-pane active" id="fmn-pane">
|
| 179 |
-
<div class="pane-content">
|
| 180 |
-
<div class="pane-text">
|
| 181 |
-
<h3>Factored Multiplicative Neurons</h3>
|
| 182 |
-
<p>Traditional neurons compute <code>y = σ(Wx + b)</code>. Simple, but limited.</p>
|
| 183 |
-
<p>FMN neurons compute:</p>
|
| 184 |
-
<div class="equation">
|
| 185 |
-
<code>gate = tanh(W1(x)) * sigmoid(W2(x))</code>
|
| 186 |
-
</div>
|
| 187 |
-
<div class="equation">
|
| 188 |
-
<code>output = V(gate)</code>
|
| 189 |
-
</div>
|
| 190 |
-
<p>Each FMN uses a multiplicative gating mechanism with two weight matrices W1 and W2 that project to a rank 40 latent space, then combines via tanh and sigmoid before projecting back through V.</p>
|
| 191 |
-
<ul class="feature-list">
|
| 192 |
-
<li>Rank 40 latent dimension for efficient computation</li>
|
| 193 |
-
<li>Multiplicative gating enables complex interactions</li>
|
| 194 |
-
<li>Optional SwiGLU variant available</li>
|
| 195 |
-
<li>Initialized with Xavier uniform for stability</li>
|
| 196 |
-
</ul>
|
| 197 |
-
</div>
|
| 198 |
-
<div class="pane-visual">
|
| 199 |
-
<canvas id="fmn-canvas" width="400" height="300"></canvas>
|
| 200 |
-
</div>
|
| 201 |
-
</div>
|
| 202 |
-
</div>
|
| 203 |
-
|
| 204 |
-
<!-- Routing Tab -->
|
| 205 |
-
<div class="tab-pane" id="routing-pane">
|
| 206 |
-
<div class="pane-content">
|
| 207 |
-
<div class="pane-text">
|
| 208 |
-
<h3>Dynamic Neuron Routing</h3>
|
| 209 |
-
<p>Each neuron can route its output to any layer and any neuron in the network using <strong>REINFORCE policy gradients</strong> for true gradient flow through hard routing decisions.</p>
|
| 210 |
-
<div class="equation">
|
| 211 |
-
<code>should_route ~ Bernoulli(sigmoid(logits))</code>
|
| 212 |
-
</div>
|
| 213 |
-
<div class="equation">
|
| 214 |
-
<code>target_layer ~ Categorical(softmax(layer_logits))</code>
|
| 215 |
-
</div>
|
| 216 |
-
<p>The router learns:</p>
|
| 217 |
-
<ul class="feature-list">
|
| 218 |
-
<li>Whether to route via learned sigmoid gate</li>
|
| 219 |
-
<li>Which layer to target (0 to n_layers)</li>
|
| 220 |
-
<li>Which neuron to target (0 to d_model)</li>
|
| 221 |
-
<li>Routing strength via learned parameter</li>
|
| 222 |
-
</ul>
|
| 223 |
-
</div>
|
| 224 |
-
<div class="pane-visual">
|
| 225 |
-
<canvas id="routing-canvas" width="400" height="300"></canvas>
|
| 226 |
-
</div>
|
| 227 |
-
</div>
|
| 228 |
-
</div>
|
| 229 |
-
|
| 230 |
-
<!-- Recurrent Tab -->
|
| 231 |
-
<div class="tab-pane" id="recurrent-pane">
|
| 232 |
-
<div class="pane-content">
|
| 233 |
-
<div class="pane-text">
|
| 234 |
-
<h3>Recurrent Mixer</h3>
|
| 235 |
-
<p>A per-channel gated recurrent state that persists across layer iterations. Think of it as a tiny LSTM for each dimension.</p>
|
| 236 |
-
<div class="equation">
|
| 237 |
-
<code>g = sigmoid(x * w_x + s * w_s + b)</code>
|
| 238 |
-
</div>
|
| 239 |
-
<div class="equation">
|
| 240 |
-
<code>s_new = s * (1 - g) + x * g</code>
|
| 241 |
-
</div>
|
| 242 |
-
<p>This allows the model to accumulate information across the 6 layer passes, creating a form of <strong>internal memory</strong>. The state scale parameter controls how much the recurrent state influences each layer.</p>
|
| 243 |
-
</div>
|
| 244 |
-
<div class="pane-visual">
|
| 245 |
-
<canvas id="recurrent-canvas" width="400" height="300"></canvas>
|
| 246 |
-
</div>
|
| 247 |
-
</div>
|
| 248 |
-
</div>
|
| 249 |
-
|
| 250 |
-
<!-- Loops Tab -->
|
| 251 |
-
<div class="tab-pane" id="loops-pane">
|
| 252 |
-
<div class="pane-content">
|
| 253 |
-
<div class="pane-text">
|
| 254 |
-
<h3>Loop Counter</h3>
|
| 255 |
-
<p>Each neuron has a hard limit on how many times it can participate in routing loops. This prevents infinite cycles and forces the model to be efficient.</p>
|
| 256 |
-
<div class="equation">
|
| 257 |
-
<code>loop_exhausted = loop_counts >= max_loops</code>
|
| 258 |
-
</div>
|
| 259 |
-
<p>With <code>max_loops = 120</code>, each neuron can participate in at most 120 routing events before being silenced. This creates a form of <strong>computational budget</strong>.</p>
|
| 260 |
-
<div class="loop-demo">
|
| 261 |
-
<label>Max Loops: <span id="loop-value">120</span></label>
|
| 262 |
-
<input type="range" id="loop-slider" min="1" max="200" value="120">
|
| 263 |
-
<div class="loop-indicator" id="loop-indicator"></div>
|
| 264 |
-
</div>
|
| 265 |
-
</div>
|
| 266 |
-
<div class="pane-visual">
|
| 267 |
-
<canvas id="loops-canvas" width="400" height="300"></canvas>
|
| 268 |
-
</div>
|
| 269 |
-
</div>
|
| 270 |
-
</div>
|
| 271 |
-
</div>
|
| 272 |
-
</div>
|
| 273 |
-
</section>
|
| 274 |
-
|
| 275 |
-
<!-- Explain Like I'm Five -->
|
| 276 |
-
<section class="elief-section" id="elief">
|
| 277 |
-
<div class="container">
|
| 278 |
-
<h2 class="section-title">Explain Like I'm Five</h2>
|
| 279 |
-
<p class="section-subtitle">How does this tiny model think?</p>
|
| 280 |
-
|
| 281 |
-
<div class="elief-content">
|
| 282 |
-
<div class="elief-card">
|
| 283 |
-
<h3>What is FMN-GPT?</h3>
|
| 284 |
-
<p>Imagine a really tiny brain. Most AI brains today are huge, like a library with millions of books. FMN-GPT is more like a small notebook. But here's the trick. It can read that notebook over and over, each time understanding a little more.</p>
|
| 285 |
-
</div>
|
| 286 |
-
|
| 287 |
-
<div class="elief-card">
|
| 288 |
-
<h3>Why Character-Level?</h3>
|
| 289 |
-
<p>Most AI models learn whole words at a time. We taught this one to read letter by letter, like a child learning to read. This keeps it small. It only needs to know 491 characters instead of thousands of words. Every letter matters.</p>
|
| 290 |
-
</div>
|
| 291 |
-
|
| 292 |
-
<div class="elief-card">
|
| 293 |
-
<h3>How Does It Think?</h3>
|
| 294 |
-
<p>When you ask a question, the model passes it through the same brain circuit 6 times. Each pass, it thinks a little deeper. Like asking yourself "what's 2+2?" and then checking, "Wait, let me think again. 2+2... that's adding two groups of two... so that makes 4!"</p>
|
| 295 |
-
</div>
|
| 296 |
-
|
| 297 |
-
<div class="elief-card">
|
| 298 |
-
<h3>The Magic Number: 491 Characters</h3>
|
| 299 |
-
<p>The model uses a character-level vocabulary of exactly 491 tokens. This includes ASCII characters, special symbols, and custom thinking tokens (the thinking emoji and light bulb emoji). Every character matters.</p>
|
| 300 |
-
</div>
|
| 301 |
-
</div>
|
| 302 |
-
</div>
|
| 303 |
-
</section>
|
| 304 |
-
|
| 305 |
-
<!-- Why I Stopped -->
|
| 306 |
-
<section class="why-stopped" id="why">
|
| 307 |
-
<div class="container">
|
| 308 |
-
<h2 class="section-title">Why I Really Stopped</h2>
|
| 309 |
-
|
| 310 |
-
<div class="reasons-grid">
|
| 311 |
-
<div class="reason-card">
|
| 312 |
-
<div class="reason-number">01</div>
|
| 313 |
-
<h3>It Was Boring</h3>
|
| 314 |
-
<p>There's no creativity in distillation. You're just making a smaller copy of someone else's breakthrough. Where's the fun in that?</p>
|
| 315 |
-
</div>
|
| 316 |
-
|
| 317 |
-
<div class="reason-card">
|
| 318 |
-
<div class="reason-number">02</div>
|
| 319 |
-
<h3>Diminishing Returns</h3>
|
| 320 |
-
<p>Every 1% of parameter reduction came with a measurable drop in capability. The tradeoff wasn't worth it.</p>
|
| 321 |
-
</div>
|
| 322 |
-
|
| 323 |
-
<div class="reason-card">
|
| 324 |
-
<div class="reason-number">03</div>
|
| 325 |
-
<h3>Overnight Runs</h3>
|
| 326 |
-
<p>Who leaves their computer on overnight to clone someone else's work and compress it? The process lacks real creation. Mere photocopying in disguise.</p>
|
| 327 |
-
</div>
|
| 328 |
-
|
| 329 |
-
<div class="reason-card">
|
| 330 |
-
<div class="reason-number">04</div>
|
| 331 |
-
<h3>A Better Question</h3>
|
| 332 |
-
<p>Instead of asking "how do I make this smaller?", I started asking "what if it was designed to be small from the start?"</p>
|
| 333 |
-
</div>
|
| 334 |
-
</div>
|
| 335 |
-
|
| 336 |
-
<div class="comparison-section">
|
| 337 |
-
<h3>The Old Way vs. The New Way</h3>
|
| 338 |
-
<div class="comparison-grid">
|
| 339 |
-
<div class="comparison-item old">
|
| 340 |
-
<h4>Model Compression</h4>
|
| 341 |
-
<ul>
|
| 342 |
-
<li>Start with 7B parameters</li>
|
| 343 |
-
<li>Distill, quantize, prune</li>
|
| 344 |
-
<li>End with 1B parameters</li>
|
| 345 |
-
<li>Capability: ???</li>
|
| 346 |
-
<li>Time: Weeks of compute</li>
|
| 347 |
-
</ul>
|
| 348 |
-
</div>
|
| 349 |
-
|
| 350 |
-
<div class="comparison-arrow">-></div>
|
| 351 |
-
|
| 352 |
-
<div class="comparison-item new">
|
| 353 |
-
<h4>Small by Design</h4>
|
| 354 |
-
<ul>
|
| 355 |
-
<li>Start with ~100K parameters</li>
|
| 356 |
-
<li>Novel architecture</li>
|
| 357 |
-
<li>End with ~100K parameters</li>
|
| 358 |
-
<li>Capability: Emergent</li>
|
| 359 |
-
<li>Time: One GPU, one night</li>
|
| 360 |
-
</ul>
|
| 361 |
-
</div>
|
| 362 |
-
</div>
|
| 363 |
-
</div>
|
| 364 |
-
</div>
|
| 365 |
-
</section>
|
| 366 |
-
|
| 367 |
-
<!-- Roadmap -->
|
| 368 |
-
<section class="roadmap-section" id="roadmap">
|
| 369 |
-
<div class="container">
|
| 370 |
-
<h2 class="section-title">Roadmap</h2>
|
| 371 |
-
<p class="section-subtitle">Where we're headed (everything is subject to change)</p>
|
| 372 |
-
|
| 373 |
-
<div class="roadmap-timeline">
|
| 374 |
-
<div class="roadmap-item completed">
|
| 375 |
-
<div class="roadmap-marker"></div>
|
| 376 |
-
<div class="roadmap-content">
|
| 377 |
-
<h4>Phase 1: Core Architecture</h4>
|
| 378 |
-
<p>FMN neurons with rank 40, REINFORCE based dynamic routing, recurrent mixer, QK normalization, gated residuals. The foundation is complete.</p>
|
| 379 |
-
<span class="roadmap-status">Completed</span>
|
| 380 |
-
</div>
|
| 381 |
-
</div>
|
| 382 |
-
|
| 383 |
-
<div class="roadmap-item in-progress">
|
| 384 |
-
<div class="roadmap-marker"></div>
|
| 385 |
-
<div class="roadmap-content">
|
| 386 |
-
<h4>Phase 2: Training Pipeline</h4>
|
| 387 |
-
<p>Character-level tokenization (491 vocab), 7 instruction datasets, pretraining on English-Pretraining-Dataset, AdamW optimizer, bfloat16 precision.</p>
|
| 388 |
-
<span class="roadmap-status">In Progress</span>
|
| 389 |
-
</div>
|
| 390 |
-
</div>
|
| 391 |
-
|
| 392 |
-
<div class="roadmap-item">
|
| 393 |
-
<div class="roadmap-marker"></div>
|
| 394 |
-
<div class="roadmap-content">
|
| 395 |
-
<h4>Phase 3: CoT Reasoning</h4>
|
| 396 |
-
<p>Teaching the model to think step-by-step with explicit reasoning traces.</p>
|
| 397 |
-
<span class="roadmap-status">Planned</span>
|
| 398 |
-
</div>
|
| 399 |
-
</div>
|
| 400 |
-
|
| 401 |
-
<div class="roadmap-item">
|
| 402 |
-
<div class="roadmap-marker"></div>
|
| 403 |
-
<div class="roadmap-content">
|
| 404 |
-
<h4>Phase 4: Evaluation Suite</h4>
|
| 405 |
-
<p>Comprehensive benchmarks to measure what 100K parameters can actually do.</p>
|
| 406 |
-
<span class="roadmap-status">Planned</span>
|
| 407 |
-
</div>
|
| 408 |
-
</div>
|
| 409 |
-
|
| 410 |
-
<div class="roadmap-item">
|
| 411 |
-
<div class="roadmap-marker"></div>
|
| 412 |
-
<div class="roadmap-content">
|
| 413 |
-
<h4>Phase 5: Model Release</h4>
|
| 414 |
-
<p>Open weights on HuggingFace for the community to experiment with.</p>
|
| 415 |
-
<span class="roadmap-status">Planned</span>
|
| 416 |
-
</div>
|
| 417 |
-
</div>
|
| 418 |
-
</div>
|
| 419 |
-
</div>
|
| 420 |
-
</section>
|
| 421 |
-
|
| 422 |
-
<!-- Dataset Credits -->
|
| 423 |
-
<section class="credits-section" id="credits">
|
| 424 |
-
<div class="container">
|
| 425 |
-
<h2 class="section-title">Dataset Credits</h2>
|
| 426 |
-
<p class="section-subtitle">The data that trains our model (subject to change)</p>
|
| 427 |
-
|
| 428 |
-
<div class="credits-grid">
|
| 429 |
-
<div class="credit-card">
|
| 430 |
-
<h4>Pretraining Dataset</h4>
|
| 431 |
-
<p class="credit-name">shuyuej/English-Pretraining-Dataset</p>
|
| 432 |
-
<p>Large-scale English text for initial language understanding and general knowledge.</p>
|
| 433 |
-
<a href="https://huggingface.co/datasets/shuyuej/English-Pretraining-Dataset" target="_blank">View on HuggingFace</a>
|
| 434 |
-
</div>
|
| 435 |
-
|
| 436 |
-
<div class="credit-card">
|
| 437 |
-
<h4>Instruction Dataset 1</h4>
|
| 438 |
-
<p class="credit-name">TeichAI/Pony-Alpha-15k</p>
|
| 439 |
-
<p>Conversational instruction data for learning dialogue patterns and responses.</p>
|
| 440 |
-
<a href="https://huggingface.co/datasets/TeichAI/Pony-Alpha-15k" target="_blank">View on HuggingFace</a>
|
| 441 |
-
</div>
|
| 442 |
-
|
| 443 |
-
<div class="credit-card">
|
| 444 |
-
<h4>Instruction Dataset 2</h4>
|
| 445 |
-
<p class="credit-name">TeichAI/convo-v1</p>
|
| 446 |
-
<p>Multi-turn conversation data for context handling and coherent dialogue.</p>
|
| 447 |
-
<a href="https://huggingface.co/datasets/TeichAI/convo-v1" target="_blank">View on HuggingFace</a>
|
| 448 |
-
</div>
|
| 449 |
-
|
| 450 |
-
<div class="credit-card">
|
| 451 |
-
<h4>Instruction Dataset 3</h4>
|
| 452 |
-
<p class="credit-name">TeichAI/Step-3.5-Flash-2600x</p>
|
| 453 |
-
<p>High-quality instruction-response pairs for fine-tuning reasoning capabilities.</p>
|
| 454 |
-
<a href="https://huggingface.co/datasets/TeichAI/Step-3.5-Flash-2600x" target="_blank">View on HuggingFace</a>
|
| 455 |
-
</div>
|
| 456 |
-
|
| 457 |
-
<div class="credit-card">
|
| 458 |
-
<h4>Instruction Dataset 4</h4>
|
| 459 |
-
<p class="credit-name">TeichAI/sherlock-thinking-alpha-11000x</p>
|
| 460 |
-
<p>Thinking and reasoning data for chain of thought training.</p>
|
| 461 |
-
<a href="https://huggingface.co/datasets/TeichAI/sherlock-thinking-alpha-11000x" target="_blank">View on HuggingFace</a>
|
| 462 |
-
</div>
|
| 463 |
-
|
| 464 |
-
<div class="credit-card">
|
| 465 |
-
<h4>Instruction Dataset 5</h4>
|
| 466 |
-
<p class="credit-name">TeichAI/glm-4.7-2000x</p>
|
| 467 |
-
<p>GLM model outputs for diverse response patterns.</p>
|
| 468 |
-
<a href="https://huggingface.co/datasets/TeichAI/glm-4.7-2000x" target="_blank">View on HuggingFace</a>
|
| 469 |
-
</div>
|
| 470 |
-
|
| 471 |
-
<div class="credit-card">
|
| 472 |
-
<h4>Instruction Dataset 6</h4>
|
| 473 |
-
<p class="credit-name">TeichAI/claude-haiku-4.5-high-reasoning-1700x</p>
|
| 474 |
-
<p>Claude reasoning outputs for advanced thinking patterns.</p>
|
| 475 |
-
<a href="https://huggingface.co/datasets/TeichAI/claude-haiku-4.5-high-reasoning-1700x" target="_blank">View on HuggingFace</a>
|
| 476 |
-
</div>
|
| 477 |
-
|
| 478 |
-
<div class="credit-card">
|
| 479 |
-
<h4>Instruction Dataset 7</h4>
|
| 480 |
-
<p class="credit-name">TeichAI/gemini-3-flash-preview</p>
|
| 481 |
-
<p>Gemini model outputs for additional diversity.</p>
|
| 482 |
-
<a href="https://huggingface.co/datasets/TeichAI/gemini-3-flash-preview" target="_blank">View on HuggingFace</a>
|
| 483 |
-
</div>
|
| 484 |
-
</div>
|
| 485 |
-
</div>
|
| 486 |
-
</section>
|
| 487 |
-
|
| 488 |
-
<!-- Closing -->
|
| 489 |
-
<section class="closing" id="closing">
|
| 490 |
-
<div class="container">
|
| 491 |
-
<div class="closing-content">
|
| 492 |
-
<h2>Beyond Competing with GPT-4</h2>
|
| 493 |
-
<p>The real goal is understanding what's actually necessary for intelligence to emerge. Maybe 100K parameters is enough. Maybe it isn't. But we won't know until we try building from first principles instead of just compressing what already exists.</p>
|
| 494 |
-
|
| 495 |
-
<div class="closing-quote">
|
| 496 |
-
<blockquote>
|
| 497 |
-
"Smaller models serve as a means toward a larger goal. Understanding what makes models work in the first place is the real objective."
|
| 498 |
-
</blockquote>
|
| 499 |
-
</div>
|
| 500 |
-
|
| 501 |
-
<div class="poem-section">
|
| 502 |
-
<p class="poem-intro">It's coming.</p>
|
| 503 |
-
<div class="poem">
|
| 504 |
-
<p>Small enough to hide in plain sight,</p>
|
| 505 |
-
<p>Big enough to twist the stars of night,</p>
|
| 506 |
-
<p>Quiet as a shadow, sharp as a spark,</p>
|
| 507 |
-
<p>A tiny flame that will light the dark.</p>
|
| 508 |
-
</div>
|
| 509 |
-
</div>
|
| 510 |
-
|
| 511 |
-
<div class="cta-section">
|
| 512 |
-
<p>This is an ongoing experiment. Everything described here is subject to change. Follow along if you're curious about what this architecture can actually do.</p>
|
| 513 |
-
</div>
|
| 514 |
-
</div>
|
| 515 |
-
</div>
|
| 516 |
-
</section>
|
| 517 |
-
</main>
|
| 518 |
-
|
| 519 |
-
<!-- Footer -->
|
| 520 |
-
<footer class="footer">
|
| 521 |
-
<div class="container">
|
| 522 |
-
<div class="footer-content">
|
| 523 |
-
<p class="footer-text">Built with curiosity over compute.</p>
|
| 524 |
-
<p class="footer-subtext">FMN-GPT by <a href="https://huggingface.co/CompactAI" target="_blank">CompactAI</a> - 2026</p>
|
| 525 |
-
</div>
|
| 526 |
-
</div>
|
| 527 |
-
</footer>
|
| 528 |
-
|
| 529 |
-
<script src="main.js"></script>
|
| 530 |
-
</body>
|
| 531 |
-
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|