Spaces:
Running
Running
File size: 16,450 Bytes
64885d7 eca3371 64885d7 e00a304 64885d7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>What Makes a Tiny Model Smart? Announcing the SupraLabs Open SLM Research Initiative. | SupraLabs Blog</title>
<style>
:root {
--bg: #0f0f0f;
--surface: #1a1a1a;
--border: #333;
--text: #e0e0e0;
--accent: #536bfe;
--muted: #888;
--font-mono: 'JetBrains Mono', 'Fira Code', monospace;
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
background-color: var(--bg);
color: var(--text);
font-family: 'Inter', -apple-system, sans-serif;
line-height: 1.6;
padding: 2rem;
}
code, pre, .mono { font-family: var(--font-mono); }
.container { max-width: 900px; margin: 0 auto; }
header {
border-bottom: 2px solid var(--border);
padding-bottom: 2rem;
margin-bottom: 3rem;
display: flex;
justify-content: space-between;
align-items: flex-end;
}
.logo-area h1 {
font-size: 1.2rem;
text-transform: uppercase;
letter-spacing: 2px;
color: var(--accent);
line-height: 1;
display: flex;
align-items: center;
gap: 10px;
}
.logo-area a { text-decoration: none; color: inherit; }
.logo-area {
display: flex;
align-items: center;
gap: 10px;
font-weight: bold;
font-size: 1.2rem;
}
nav a {
color: var(--text);
text-decoration: none;
margin-left: 1.5rem;
font-size: 0.9rem;
border-bottom: 1px solid transparent;
}
nav a:hover { border-bottom: 1px solid var(--accent); }
.post-header { margin-bottom: 3rem; }
.post-header h2 {
font-size: 3rem;
line-height: 1.1;
margin-bottom: 1rem;
font-weight: 800;
}
.post-meta {
font-family: var(--font-mono);
color: var(--accent);
font-size: 0.9rem;
margin-bottom: 2rem;
}
.post-content {
background: var(--surface);
border: 1px solid var(--border);
padding: 3rem;
margin-bottom: 4rem;
}
.post-content h2 {
font-size: 1.8rem;
margin: 2.5rem 0 1rem 0;
color: var(--accent);
}
.post-content h2:first-child { margin-top: 0; }
.post-content p {
margin-bottom: 1.5rem;
font-size: 1.1rem;
color: var(--text);
}
.post-content ul {
margin-bottom: 1.5rem;
padding-left: 1.5rem;
}
.post-content li { margin-bottom: 0.5rem; font-size: 1.1rem; }
.post-content strong { color: #fff; }
.post-content code {
background: #111;
border: 1px solid var(--border);
padding: 2px 6px;
border-radius: 3px;
font-size: 0.95em;
color: var(--accent);
}
.callout {
border-left: 3px solid var(--accent);
background: #111;
padding: 1rem 1.5rem;
margin: 2rem 0;
font-family: var(--font-mono);
font-size: 0.95rem;
color: #ccc;
}
.callout span {
display: block;
color: var(--muted);
font-size: 0.8rem;
margin-bottom: 0.4rem;
}
/* Screenshot box */
.screenshot-box {
border: 1px solid var(--border);
background: #111;
padding: 0;
margin: 2rem 0;
overflow: hidden;
}
.screenshot-box .screenshot-label {
font-family: var(--font-mono);
font-size: 0.75rem;
color: var(--muted);
padding: 0.6rem 1rem;
border-bottom: 1px solid var(--border);
background: #0d0d0d;
}
.screenshot-box img {
width: 100%;
display: block;
}
/* Reddit quote */
.reddit-quote {
border-left: 3px solid #ff4500;
background: #111;
padding: 1.2rem 1.5rem;
margin: 2rem 0;
font-size: 1rem;
color: #ccc;
font-style: italic;
line-height: 1.7;
}
.reddit-quote .reddit-meta {
font-family: var(--font-mono);
font-size: 0.72rem;
color: #ff4500;
font-style: normal;
margin-bottom: 0.6rem;
display: block;
}
/* Output example */
.output-example {
border: 1px solid var(--border);
background: #111;
padding: 1.5rem;
margin: 1.5rem 0;
}
.output-example .prompt-label {
font-family: var(--font-mono);
font-size: 0.75rem;
color: var(--accent);
margin-bottom: 0.4rem;
}
.output-example .prompt-text {
font-weight: 700;
color: #fff;
margin-bottom: 1rem;
font-size: 1rem;
}
.output-example .output-label {
font-family: var(--font-mono);
font-size: 0.75rem;
color: var(--muted);
margin-bottom: 0.4rem;
}
.output-example .output-text {
color: var(--text);
font-size: 0.95rem;
font-style: italic;
line-height: 1.7;
}
.tags { display: flex; gap: 0.5rem; margin-top: 2rem; flex-wrap: wrap; }
.tag {
font-family: var(--font-mono);
font-size: 0.7rem;
padding: 2px 8px;
border: 1px solid var(--border);
border-radius: 4px;
color: var(--muted);
}
footer {
margin-top: 6rem;
padding-bottom: 2rem;
font-size: 0.8rem;
color: var(--muted);
text-align: center;
}
@media (max-width: 600px) {
.post-header h2 { font-size: 2rem; }
.post-content { padding: 1.5rem; }
header { flex-direction: column; align-items: flex-start; gap: 1rem; }
nav a { margin-left: 0; margin-right: 1rem; }
}
</style>
</head>
<body>
<div class="container">
<header>
<div class="logo-area" style="font-size: 1.5em;">
<a href="./index.html"><h1><img src="./image.png" style="height: 2em"> SupraLabs_</h1></a>
</div>
<nav>
<a href="./index.html#news">News</a>
<a href="https://huggingface.co/SupraLabs" target="blank">HuggingFace</a>
<a href="./index.html#hardware">Hardware</a>
</nav>
</header>
<article>
<div class="post-header">
<div class="post-meta">// 2026-05-29 | Research Roadmap</div>
<h2>What Makes a Tiny Model Smart?<br>Announcing the SupraLabs<br>Open SLM Research Initiative.</h2>
</div>
<div class="post-content">
<p>We are still completely blown away. In just about a week since we dropped our <strong>Supra-50M Instruct</strong> model, the open-source community took it and ran with it. Thanks to your incredible support, we hit <strong>Page 1 of Trending Models in Text Generation</strong>, Page 4 across ALL categories on Hugging Face, crossed <strong>7,000+ downloads</strong>, and even got featured in a YouTube deep-dive! For a non-profit, 100% open-source garage project, this is unreal. Thank you.</p>
<p>But we aren't stopping there. The massive interest in small language models (SLMs) proves that the world wants highly efficient, reproducible computing. To push the boundaries of what tiny "brains" can do, SupraLabs is launching a massive, fully open systematic research initiative. We want to find the exact engineering sweet spots for SLMs, and we are open-sourcing every single pipeline, log, and weight along the way.</p>
<p>Here is the roadmap of the core experiments we are spinning up right now.</p>
<h2>Experiment 1: The Ultimate Data-Mix Showdown</h2>
<p>Everyone knows data quality is king, but what is the absolute best data recipe when your parameter budget is ultra-tight? We are pitting the top open-source datasets against each other to find the perfect synergy.</p>
<ul>
<li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
<li><strong>The Data:</strong> Exactly <strong>100 Million tokens</strong> total per run, testing four configurations:
<br>1. 100% <code>FineWeb-Edu</code>
<br>2. 100% <code>DCLM-Edu</code>
<br>3. 100% <code>Cosmopedia-v2</code>
<br>4. Custom algorithmic token-level mixes of all three.
</li>
</ul>
<p><strong>The Goal:</strong> Find out if highly structured synthetic data outpaces heavily curated web scraps at the 5M scale, or if a hybrid mix yields the ultimate downstream generalizability.</p>
<div class="callout">
<span>// THE EVALUATION SUITE (LM-EVAL)</span>
To find the true sweet spot, every single model in our studies will be rigorously evaluated across this standardized zero-shot/few-shot benchmark matrix:<br><br>
• Language & Perplexity → wikitext, lambada<br>
• Commonsense & Logic → hellaswag, piqa, winogrande, boolq<br>
• Science & Knowledge → sciq, openbookqa, arc_easy, arc_challenge<br>
• Grammar & Syntax → blimp
</div>
<h2>Experiment 2: Scaling Law Realities for Tiny Models</h2>
<p>Chinchilla scaling laws tell us how to scale compute and data optimally for billion-parameter giants. But do those rules shatter when you scale down to the absolute edge? We are conducting a dedicated scaling study to map out the returns on parameter expansion.</p>
<ul>
<li><strong>The Setup:</strong> Keeping dataset size fixed at exactly <strong>2 Billion tokens</strong> of <code>FineWeb-Edu (sample-10BT)</code>.</li>
<li><strong>The Core Matrix:</strong> We will train four distinct Llama architectures: <strong>10M, 25M, 50M, and 100M parameters</strong>.</li>
</ul>
<p><strong>The Goal:</strong> Identify the exact point of diminishing returns. Does a 25M model fully utilize 2B tokens, or does the 100M model show a massive performance leap on the exact same token footprint? We want to chart the efficiency frontier.</p>
<h2>Experiment 3: Is One Epoch Really All You Need for SLMs?</h2>
<p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
<ul>
<li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>500 Million tokens</strong> of <code>FineWeb-Edu</code>.</li>
<li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
</ul>
<p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>
<h2>Expanding the Frontier: More Ideas We're Testing</h2>
<p>While configuring our cluster for the three core studies above, we realized we have a golden opportunity to squeeze in even more architectural answers. We have officially added these four bonus dimensions to our upcoming research pipeline:</p>
<h3>A. The Tokenizer Bottleneck</h3>
<p>Modern tokenizers use massive vocabularies (like Llama 3's 128k). In a 10M parameter model, a huge vocabulary means the embedding layer eats up almost all your parameters, leaving nothing for the actual transformer layers. We will run identical 10M models comparing the Llama 3 tokenizer (128k), Llama 2 tokenizer (32k), and a custom-built 8k/16k vocabulary to see where the parameter balance lies.</p>
<h3>B. Depth vs. Width (Architecture Tweaks)</h3>
<p>If you have a strict budget of 25M parameters, how should you spend them? We're testing a "deep and narrow" configuration (e.g., 24 layers, smaller hidden dimensions) against a "shallow and wide" setup (e.g., 6 layers, massive hidden dimensions) to evaluate which layout reasons better on standard benchmarks.</p>
<h3>C. The Sequence Length Penalty</h3>
<p>Does forcing a longer context window ruin a tiny model's general capability? We will train identical models across 512, 1024, and 2048 context windows to see if extending context capacity directly penalizes the model's core knowledge density.</p>
<h3>D. LR Schedule Optimization for Ultra-Short Runs</h3>
<p>Standard cosine decay schedules are meant for trillions of tokens. For short 1B–2B token runs, we will experiment with aggressive linear decays and constant learning rates with sudden drops to establish the absolute fastest convergence paths for indie researchers.</p>
<h2>Everything Will Be Open. Everything.</h2>
<p>SupraLabs is entirely non-profit, and our commitment to open science means we won't just publish a PDF with pretty graphs. When these runs complete, we will be releasing:</p>
<ul>
<li>Every single checkpoint and weight file on Hugging Face.</li>
<li>Complete, unedited <code>lm-eval</code> logs and raw data points.</li>
<li>Our training configurations and custom setup code so anyone can replicate our work on their own hardware.</li>
</ul>
<p>We're getting the compute nodes warmed up as you read this. Stay tuned for the raw data drops—we're about to find out exactly how much power we can pack into these tiny architectures.</p>
<div class="callout">
<span>// JOIN THE INITIATIVE</span>
Track our progress directly on Hugging Face: <a href="https://huggingface.co/SupraLabs" target="_blank" style="color: var(--accent); text-decoration: underline;">huggingface.co/SupraLabs</a><br>
Codebase, configs, and automation tools will be linked there as the runs kick off.
</div>
<div class="tags">
<span class="tag">#open-research</span>
<span class="tag">#slm-scaling</span>
<span class="tag">#supralabs</span>
<span class="tag">#open-source</span>
<span class="tag">#data-science</span>
<span class="tag">#tinyml</span>
<span class="tag">#benchmarking</span>
</div>
</div>
</article>
<footer>
<p class="mono">© 2026 SupraLabs // Built for the community.</p>
</footer>
</div>
</body>
</html> |