Spaces:

CompactAI
/

Built-with-curiosity-not-compute

Running

App Files Files Community

CompactAI commited on 4 days ago

Commit

f646c79

verified ·

1 Parent(s): 615bf9f

New blog out!

Browse files

Files changed (2) hide show

blog-makeshift-mtp.html +112 -0
blog.html +7 -0

blog-makeshift-mtp.html ADDED Viewed

	@@ -0,0 +1,112 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Makeshift MTP: A dumb idea that might work | FMN-GPT - CompactAI</title>
+    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
+    <style>
+:root{--color-bg:#faf8f5;--color-bg-alt:#f5f0e8;--color-bg-dark:#1a1815;--color-bg-dark-alt:#252220;--color-accent:#e85d3b;--color-accent-light:#ff8a6b;--color-accent-dark:#c44a2d;--color-secondary:#d4a853;--color-text:#2d2a26;--color-text-light:#6b6560;--color-text-muted:#9a948d;--color-border:#e5e0d8;--shadow-md:0 4px 20px rgba(45,42,38,0.12);--font-sans:'Inter',-apple-system,BlinkMacSystemFont,sans-serif;--font-mono:'JetBrains Mono','Fira Code',monospace;--container-max:1200px;--section-padding:100px}
+*,*::before,*::after{box-sizing:border-box;margin:0;padding:0}
+html{scroll-behavior:smooth;font-size:16px}
+body{font-family:var(--font-sans);background:var(--color-bg);color:var(--color-text);line-height:1.7;-webkit-font-smoothing:antialiased;display:flex;flex-direction:column;min-height:100vh}
+main{flex:1}
+.container{max-width:var(--container-max);margin:0 auto;padding:0 24px}
+h1,h2,h3{font-weight:600;line-height:1.2;color:var(--color-text)}
+a{color:var(--color-accent);text-decoration:none;transition:color .2s}
+a:hover{color:var(--color-accent-dark)}
+code{font-family:var(--font-mono);background:var(--color-bg-alt);padding:.2em .5em;border-radius:4px;font-size:.9em;color:var(--color-accent-dark)}
+pre{font-family:var(--font-mono);background:var(--color-bg-dark);color:#f5f0e8;padding:1.5rem;border-radius:12px;overflow-x:auto;font-size:.875rem;line-height:1.6}
+pre code{background:none;padding:0;color:inherit}
+.main-nav{position:fixed;top:0;left:0;right:0;background:rgba(26,24,21,.95);backdrop-filter:blur(10px);z-index:1000;padding:1rem 0}
+.main-nav .container{display:flex;justify-content:space-between;align-items:center}
+.nav-brand{color:#fff;font-size:1.25rem;font-weight:600}
+.nav-links{display:flex;gap:2rem}
+.nav-links a{color:var(--color-text-muted);font-size:.9375rem;transition:color .2s}
+.nav-links a:hover{color:var(--color-accent)}
+.footer{padding:3rem 0;background:var(--color-bg-dark);text-align:center}
+.footer-text{color:#fff;font-size:1.125rem;margin-bottom:.5rem}
+.footer-subtext{color:var(--color-text-muted);font-size:.875rem;margin:0}
+.blog-post-section{padding:var(--section-padding) 0;background:var(--color-bg);flex:1}
+.blog-post-content{max-width:700px;margin:0 auto}
+.blog-back{display:inline-block;color:var(--color-accent);font-weight:500;margin-bottom:2rem}
+.blog-post-header{margin-bottom:3rem}
+.blog-post-header h1{margin-top:1rem}
+.blog-post-body p{font-size:1.125rem;line-height:1.8;margin-bottom:1.75rem;color:var(--color-text)}
+.blog-post-body p:first-of-type{font-size:1.25rem}
+.blog-post-body h2{font-size:1.6rem;margin:2rem 0 .8rem;color:var(--color-accent)}
+.blog-post-body blockquote{border-left:4px solid var(--color-accent);padding:1rem 1.5rem;margin:2rem 0;background:var(--color-bg-alt);border-radius:0 8px 8px 0;font-style:italic;font-size:1.1rem;color:var(--color-text)}
+.blog-post-body blockquote p{margin:0}
+.blog-post-body ul,.blog-post-body ol{margin:1.5rem 0;padding-left:1.5rem}
+.blog-post-body li{margin-bottom:.75rem;color:var(--color-text);line-height:1.7}
+.blog-post-body ul li{list-style-type:disc}
+.blog-post-body hr{border:none;height:2px;background:linear-gradient(to right,transparent,var(--color-border),transparent);margin:3rem 0}
+.blog-post-body pre{margin:1.5rem 0}
+.blog-post-body a{text-decoration:underline;text-underline-offset:2px}
+.blog-post-body strong{color:var(--color-text);font-weight:600}
+.blog-post-body em{color:var(--color-text)}
+.blog-meta{display:flex;gap:1rem;margin-bottom:1rem}
+.blog-date{color:var(--color-text-muted);font-size:.875rem}
+.blog-tag{background:rgba(232,93,59,.1);color:var(--color-accent);font-size:.75rem;font-weight:600;padding:.25rem .75rem;border-radius:50px;text-transform:uppercase;letter-spacing:.05em}
+@media(max-width:768px){:root{--section-padding:60px}}
+    </style>
+</head>
+<body>
+    <nav class="main-nav">
+        <div class="container">
+            <a href="index.html" class="nav-brand">FMN-GPT</a>
+            <div class="nav-links">
+                <a href="blog.html">Blog</a>
+                <a href="status.html">Model Status</a>
+                <a href="https://huggingface.co/CompactAI" target="_blank">HuggingFace</a>
+            </div>
+        </div>
+    </nav>
+    <main>
+        <article class="blog-post-section">
+            <div class="container">
+                <div class="blog-post-content">
+                    <a href="blog.html" class="blog-back">← Back to Blog</a>
+                    <header class="blog-post-header">
+                        <div class="blog-meta">
+                            <span class="blog-date">2026-02-16</span>
+                            <span class="blog-tag">Technique</span>
+                        </div>
+                        <h1>Makeshift MTP: A dumb idea that might work</h1>
+                    </header>
+                    <div class="blog-post-body">
+                        <p>Multi-token prediction is having a moment. DeepMind released a paper on it. Everyone's talking about how models should predict multiple tokens ahead instead of just one. The problem? Most implementations require architecture changes. New training objectives. More parameters. More compute. More everything.</p>
+                        <p>But what if we could fake it?</p>
+                        <p>Here's the idea. You have your model and a prompt like "The cat ". Normal inference predicts one token. Boring. But what if we spawned multiple continuations in parallel, each making their own guesses?</p>
+<pre><code>"The cat rna..."
+"The cat cank..."
+"The cat ran..."</code></pre>
+                        <p>Each of these runs through the model as a forward pass. Nothing fancy. No architectural changes. Then we compute loss on all of them and pick the winner. The one with the lowest loss gets to continue.</p>
+                        <p>Think about what this actually buys us. We're running inference X times instead of once, sure. But we're also sampling from the latent space in multiple directions at once. The model is essentially exploring different branches of probability and letting us pick the most coherent one.</p>
+                        <p>And here's the nice part. The number of branches can be anything. Running on a potato? Generate two continuations and pick the better one. Have a GPU cluster sitting around? Spawn fifty. Time-constrained? Pick based on next-token loss only. Got all day? Evaluate the full generated sequence. The tradeoff between compute and quality becomes a dial you can turn.</p>
+                        <h2>Why this feels like MTP</h2>
+                        <p>Traditional multi-token prediction trains the model to output multiple tokens in a single forward pass. The model learns to think ahead. Our approach does something similar at inference time. We explore multiple futures and commit to the best one.</p>
+                        <p>The difference is we never taught the model to do this. We just throw compute at the problem until it works. Crude? Maybe. But it runs on any model without retraining.</p>
+                        <h2>The actual benefits</h2>
+                        <p>First, no more regenerating bad outputs. If a branch goes off the rails, its loss spikes, and we simply don't pick it. The bad branch dies quietly without wasting user time on a regeneration request.</p>
+                        <p>Second, no architecture changes. Your model stays the same. Your training pipeline stays the same. You just add a wrapper around inference that handles the branching and selection logic.</p>
+                        <p>Third, compute flexibility. Real MTP baked the multi-token prediction into the model weights. Our approach lets you decide at runtime how much exploration you can afford.</p>
+                        <h2>Why this is probably a bad idea</h2>
+                        <p>Loss is a proxy for what we actually want, which is coherence, helpfulness, and correctness. A branch might have lower loss but still say something stupid. The model confidently predicting nonsense still has low loss if it's confidently predicting.</p>
+                        <p>Also, this scales poorly. If you want to explore N branches for M tokens, you're doing N times the forward passes. At some point, just using a bigger model becomes cheaper.</p>
+                        <p>But for small models? For experiments? For cases where you have time but not parameters? This might be genuinely useful.</p>
+                        <hr>
+                        <p><em>We're planning to test this on FMN-GPT. The model is small enough that running multiple forward passes is actually affordable. Whether it helps or not, we'll write up the results. Probably the failures will be more interesting than the successes.</em></p>
+                    </div>
+                </div>
+            </div>
+        </article>
+    </main>
+    <footer class="footer">
+        <div class="container">
+            <p class="footer-text">Built with curiosity over compute.</p>
+            <p class="footer-subtext">FMN-GPT by <a href="https://huggingface.co/CompactAI" target="_blank">CompactAI</a> - 2026</p>
+        </div>
+    </footer>
+</body>
+</html>

blog.html CHANGED Viewed

@@ -124,6 +124,13 @@ blockquote { border-left: 4px solid var(--color-accent); padding-left: 1.5rem; m
     <script>
 const posts = [
             {
                 "file": "blog-built-with-curiosity-over-compute.html",
                 "date": "2026-02-15",

     <script>
 const posts = [
+            {
+                "file": "blog-makeshift-mtp.html",
+                "date": "2026-02-16",
+                "tag": "Technique",
+                "title": "Makeshift MTP: A dumb idea that might work",
+                "excerpt": "Multi-token prediction is hot right now. But what if we could fake it without retraining? Spawn multiple continuations, compute loss on all of them, and pick the winner. Crude, but it runs on any model."
+            },
             {
                 "file": "blog-built-with-curiosity-over-compute.html",
                 "date": "2026-02-15",