File size: 16,450 Bytes
64885d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eca3371
64885d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e00a304
64885d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>What Makes a Tiny Model Smart? Announcing the SupraLabs Open SLM Research Initiative. | SupraLabs Blog</title>
    <style>
        :root {
            --bg: #0f0f0f;
            --surface: #1a1a1a;
            --border: #333;
            --text: #e0e0e0;
            --accent: #536bfe;
            --muted: #888;
            --font-mono: 'JetBrains Mono', 'Fira Code', monospace;
        }
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body {
            background-color: var(--bg);
            color: var(--text);
            font-family: 'Inter', -apple-system, sans-serif;
            line-height: 1.6;
            padding: 2rem;
        }
        code, pre, .mono { font-family: var(--font-mono); }
        .container { max-width: 900px; margin: 0 auto; }

        header {
            border-bottom: 2px solid var(--border);
            padding-bottom: 2rem;
            margin-bottom: 3rem;
            display: flex;
            justify-content: space-between;
            align-items: flex-end;
        }
        .logo-area h1 {
            font-size: 1.2rem;
            text-transform: uppercase;
            letter-spacing: 2px;
            color: var(--accent);
            line-height: 1;
            display: flex;
            align-items: center;
            gap: 10px;
        }
        .logo-area a { text-decoration: none; color: inherit; }
        .logo-area {
            display: flex;
            align-items: center;
            gap: 10px;
            font-weight: bold;
            font-size: 1.2rem;
        }
        nav a {
            color: var(--text);
            text-decoration: none;
            margin-left: 1.5rem;
            font-size: 0.9rem;
            border-bottom: 1px solid transparent;
        }
        nav a:hover { border-bottom: 1px solid var(--accent); }

        .post-header { margin-bottom: 3rem; }
        .post-header h2 {
            font-size: 3rem;
            line-height: 1.1;
            margin-bottom: 1rem;
            font-weight: 800;
        }
        .post-meta {
            font-family: var(--font-mono);
            color: var(--accent);
            font-size: 0.9rem;
            margin-bottom: 2rem;
        }
        .post-content {
            background: var(--surface);
            border: 1px solid var(--border);
            padding: 3rem;
            margin-bottom: 4rem;
        }
        .post-content h2 {
            font-size: 1.8rem;
            margin: 2.5rem 0 1rem 0;
            color: var(--accent);
        }
        .post-content h2:first-child { margin-top: 0; }
        .post-content p {
            margin-bottom: 1.5rem;
            font-size: 1.1rem;
            color: var(--text);
        }
        .post-content ul {
            margin-bottom: 1.5rem;
            padding-left: 1.5rem;
        }
        .post-content li { margin-bottom: 0.5rem; font-size: 1.1rem; }
        .post-content strong { color: #fff; }

        .post-content code {
            background: #111;
            border: 1px solid var(--border);
            padding: 2px 6px;
            border-radius: 3px;
            font-size: 0.95em;
            color: var(--accent);
        }

        .callout {
            border-left: 3px solid var(--accent);
            background: #111;
            padding: 1rem 1.5rem;
            margin: 2rem 0;
            font-family: var(--font-mono);
            font-size: 0.95rem;
            color: #ccc;
        }
        .callout span {
            display: block;
            color: var(--muted);
            font-size: 0.8rem;
            margin-bottom: 0.4rem;
        }

        /* Screenshot box */
        .screenshot-box {
            border: 1px solid var(--border);
            background: #111;
            padding: 0;
            margin: 2rem 0;
            overflow: hidden;
        }
        .screenshot-box .screenshot-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--muted);
            padding: 0.6rem 1rem;
            border-bottom: 1px solid var(--border);
            background: #0d0d0d;
        }
        .screenshot-box img {
            width: 100%;
            display: block;
        }

        /* Reddit quote */
        .reddit-quote {
            border-left: 3px solid #ff4500;
            background: #111;
            padding: 1.2rem 1.5rem;
            margin: 2rem 0;
            font-size: 1rem;
            color: #ccc;
            font-style: italic;
            line-height: 1.7;
        }
        .reddit-quote .reddit-meta {
            font-family: var(--font-mono);
            font-size: 0.72rem;
            color: #ff4500;
            font-style: normal;
            margin-bottom: 0.6rem;
            display: block;
        }

        /* Output example */
        .output-example {
            border: 1px solid var(--border);
            background: #111;
            padding: 1.5rem;
            margin: 1.5rem 0;
        }
        .output-example .prompt-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--accent);
            margin-bottom: 0.4rem;
        }
        .output-example .prompt-text {
            font-weight: 700;
            color: #fff;
            margin-bottom: 1rem;
            font-size: 1rem;
        }
        .output-example .output-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--muted);
            margin-bottom: 0.4rem;
        }
        .output-example .output-text {
            color: var(--text);
            font-size: 0.95rem;
            font-style: italic;
            line-height: 1.7;
        }

        .tags { display: flex; gap: 0.5rem; margin-top: 2rem; flex-wrap: wrap; }
        .tag {
            font-family: var(--font-mono);
            font-size: 0.7rem;
            padding: 2px 8px;
            border: 1px solid var(--border);
            border-radius: 4px;
            color: var(--muted);
        }

        footer {
            margin-top: 6rem;
            padding-bottom: 2rem;
            font-size: 0.8rem;
            color: var(--muted);
            text-align: center;
        }

        @media (max-width: 600px) {
            .post-header h2 { font-size: 2rem; }
            .post-content { padding: 1.5rem; }
            header { flex-direction: column; align-items: flex-start; gap: 1rem; }
            nav a { margin-left: 0; margin-right: 1rem; }
        }
    </style>
</head>
<body>

    <div class="container">
        <header>
            <div class="logo-area" style="font-size: 1.5em;">
                <a href="./index.html"><h1><img src="./image.png" style="height: 2em"> SupraLabs_</h1></a>
            </div>
            <nav>
                <a href="./index.html#news">News</a>
                <a href="https://huggingface.co/SupraLabs" target="blank">HuggingFace</a>
                <a href="./index.html#hardware">Hardware</a>
            </nav>
        </header>

        <article>
            <div class="post-header">
                <div class="post-meta">// 2026-05-29 | Research Roadmap</div>
                <h2>What Makes a Tiny Model Smart?<br>Announcing the SupraLabs<br>Open SLM Research Initiative.</h2>
            </div>
        
            <div class="post-content">
                <p>We are still completely blown away. In just about a week since we dropped our <strong>Supra-50M Instruct</strong> model, the open-source community took it and ran with it. Thanks to your incredible support, we hit <strong>Page 1 of Trending Models in Text Generation</strong>, Page 4 across ALL categories on Hugging Face, crossed <strong>7,000+ downloads</strong>, and even got featured in a YouTube deep-dive! For a non-profit, 100% open-source garage project, this is unreal. Thank you.</p>
        
                <p>But we aren't stopping there. The massive interest in small language models (SLMs) proves that the world wants highly efficient, reproducible computing. To push the boundaries of what tiny "brains" can do, SupraLabs is launching a massive, fully open systematic research initiative. We want to find the exact engineering sweet spots for SLMs, and we are open-sourcing every single pipeline, log, and weight along the way.</p>
        
                <p>Here is the roadmap of the core experiments we are spinning up right now.</p>
        
                <h2>Experiment 1: The Ultimate Data-Mix Showdown</h2>
                <p>Everyone knows data quality is king, but what is the absolute best data recipe when your parameter budget is ultra-tight? We are pitting the top open-source datasets against each other to find the perfect synergy.</p>
                
                <ul>
                    <li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
                    <li><strong>The Data:</strong> Exactly <strong>100 Million tokens</strong> total per run, testing four configurations:
                        <br>1. 100% <code>FineWeb-Edu</code>
                        <br>2. 100% <code>DCLM-Edu</code>
                        <br>3. 100% <code>Cosmopedia-v2</code>
                        <br>4. Custom algorithmic token-level mixes of all three.
                    </li>
                </ul>
                <p><strong>The Goal:</strong> Find out if highly structured synthetic data outpaces heavily curated web scraps at the 5M scale, or if a hybrid mix yields the ultimate downstream generalizability.</p>
        
                <div class="callout">
                    <span>// THE EVALUATION SUITE (LM-EVAL)</span>
                    To find the true sweet spot, every single model in our studies will be rigorously evaluated across this standardized zero-shot/few-shot benchmark matrix:<br><br>
                    • Language & Perplexity → wikitext, lambada<br>
                    • Commonsense & Logic &nbsp;→ hellaswag, piqa, winogrande, boolq<br>
                    • Science & Knowledge &nbsp;→ sciq, openbookqa, arc_easy, arc_challenge<br>
                    • Grammar & Syntax &nbsp;&nbsp;&nbsp;&nbsp;→ blimp
                </div>
        
                <h2>Experiment 2: Scaling Law Realities for Tiny Models</h2>
                <p>Chinchilla scaling laws tell us how to scale compute and data optimally for billion-parameter giants. But do those rules shatter when you scale down to the absolute edge? We are conducting a dedicated scaling study to map out the returns on parameter expansion.</p>
                
                <ul>
                    <li><strong>The Setup:</strong> Keeping dataset size fixed at exactly <strong>2 Billion tokens</strong> of <code>FineWeb-Edu (sample-10BT)</code>.</li>
                    <li><strong>The Core Matrix:</strong> We will train four distinct Llama architectures: <strong>10M, 25M, 50M, and 100M parameters</strong>.</li>
                </ul>
                <p><strong>The Goal:</strong> Identify the exact point of diminishing returns. Does a 25M model fully utilize 2B tokens, or does the 100M model show a massive performance leap on the exact same token footprint? We want to chart the efficiency frontier.</p>
        
                <h2>Experiment 3: Is One Epoch Really All You Need for SLMs?</h2>
                <p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
                
                <ul>
                    <li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>500 Million tokens</strong> of <code>FineWeb-Edu</code>.</li>
                    <li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
                </ul>
                <p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>
        
                <h2>Expanding the Frontier: More Ideas We're Testing</h2>
                <p>While configuring our cluster for the three core studies above, we realized we have a golden opportunity to squeeze in even more architectural answers. We have officially added these four bonus dimensions to our upcoming research pipeline:</p>
        
                <h3>A. The Tokenizer Bottleneck</h3>
                <p>Modern tokenizers use massive vocabularies (like Llama 3's 128k). In a 10M parameter model, a huge vocabulary means the embedding layer eats up almost all your parameters, leaving nothing for the actual transformer layers. We will run identical 10M models comparing the Llama 3 tokenizer (128k), Llama 2 tokenizer (32k), and a custom-built 8k/16k vocabulary to see where the parameter balance lies.</p>
        
                <h3>B. Depth vs. Width (Architecture Tweaks)</h3>
                <p>If you have a strict budget of 25M parameters, how should you spend them? We're testing a "deep and narrow" configuration (e.g., 24 layers, smaller hidden dimensions) against a "shallow and wide" setup (e.g., 6 layers, massive hidden dimensions) to evaluate which layout reasons better on standard benchmarks.</p>
        
                <h3>C. The Sequence Length Penalty</h3>
                <p>Does forcing a longer context window ruin a tiny model's general capability? We will train identical models across 512, 1024, and 2048 context windows to see if extending context capacity directly penalizes the model's core knowledge density.</p>
        
                <h3>D. LR Schedule Optimization for Ultra-Short Runs</h3>
                <p>Standard cosine decay schedules are meant for trillions of tokens. For short 1B–2B token runs, we will experiment with aggressive linear decays and constant learning rates with sudden drops to establish the absolute fastest convergence paths for indie researchers.</p>
        
                <h2>Everything Will Be Open. Everything.</h2>
                <p>SupraLabs is entirely non-profit, and our commitment to open science means we won't just publish a PDF with pretty graphs. When these runs complete, we will be releasing:</p>
                
                <ul>
                    <li>Every single checkpoint and weight file on Hugging Face.</li>
                    <li>Complete, unedited <code>lm-eval</code> logs and raw data points.</li>
                    <li>Our training configurations and custom setup code so anyone can replicate our work on their own hardware.</li>
                </ul>
        
                <p>We're getting the compute nodes warmed up as you read this. Stay tuned for the raw data drops—we're about to find out exactly how much power we can pack into these tiny architectures.</p>
        
                <div class="callout">
                    <span>// JOIN THE INITIATIVE</span>
                    Track our progress directly on Hugging Face: <a href="https://huggingface.co/SupraLabs" target="_blank" style="color: var(--accent); text-decoration: underline;">huggingface.co/SupraLabs</a><br>
                    Codebase, configs, and automation tools will be linked there as the runs kick off.
                </div>
        
                <div class="tags">
                    <span class="tag">#open-research</span>
                    <span class="tag">#slm-scaling</span>
                    <span class="tag">#supralabs</span>
                    <span class="tag">#open-source</span>
                    <span class="tag">#data-science</span>
                    <span class="tag">#tinyml</span>
                    <span class="tag">#benchmarking</span>
                </div>
            </div>
        </article>

        <footer>
            <p class="mono">&copy; 2026 SupraLabs // Built for the community.</p>
        </footer>
    </div>

</body>
</html>