Spaces:

SupraLabs
/

Blog

Running

File size: 16,450 Bytes

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>What Makes a Tiny Model Smart? Announcing the SupraLabs Open SLM Research Initiative. | SupraLabs Blog</title>
    <style>
        :root {
            --bg: #0f0f0f;
            --surface: #1a1a1a;
            --border: #333;
            --text: #e0e0e0;
            --accent: #536bfe;
            --muted: #888;
            --font-mono: 'JetBrains Mono', 'Fira Code', monospace;
        }
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body {
            background-color: var(--bg);
            color: var(--text);
            font-family: 'Inter', -apple-system, sans-serif;
            line-height: 1.6;
            padding: 2rem;
        }
        code, pre, .mono { font-family: var(--font-mono); }
        .container { max-width: 900px; margin: 0 auto; }

        header {
            border-bottom: 2px solid var(--border);
            padding-bottom: 2rem;
            margin-bottom: 3rem;
            display: flex;
            justify-content: space-between;
            align-items: flex-end;
        }
        .logo-area h1 {
            font-size: 1.2rem;
            text-transform: uppercase;
            letter-spacing: 2px;
            color: var(--accent);
            line-height: 1;
            display: flex;
            align-items: center;
            gap: 10px;
        }
        .logo-area a { text-decoration: none; color: inherit; }
        .logo-area {
            display: flex;
            align-items: center;
            gap: 10px;
            font-weight: bold;
            font-size: 1.2rem;
        }
        nav a {
            color: var(--text);
            text-decoration: none;
            margin-left: 1.5rem;
            font-size: 0.9rem;
            border-bottom: 1px solid transparent;
        }
        nav a:hover { border-bottom: 1px solid var(--accent); }

        .post-header { margin-bottom: 3rem; }
        .post-header h2 {
            font-size: 3rem;
            line-height: 1.1;
            margin-bottom: 1rem;
            font-weight: 800;
        }
        .post-meta {
            font-family: var(--font-mono);
            color: var(--accent);
            font-size: 0.9rem;
            margin-bottom: 2rem;
        }
        .post-content {
            background: var(--surface);
            border: 1px solid var(--border);
            padding: 3rem;
            margin-bottom: 4rem;
        }
        .post-content h2 {
            font-size: 1.8rem;
            margin: 2.5rem 0 1rem 0;
            color: var(--accent);
        }
        .post-content h2:first-child { margin-top: 0; }
        .post-content p {
            margin-bottom: 1.5rem;
            font-size: 1.1rem;
            color: var(--text);
        }
        .post-content ul {
            margin-bottom: 1.5rem;
            padding-left: 1.5rem;
        }
        .post-content li { margin-bottom: 0.5rem; font-size: 1.1rem; }
        .post-content strong { color: #fff; }

        .post-content code {
            background: #111;
            border: 1px solid var(--border);
            padding: 2px 6px;
            border-radius: 3px;
            font-size: 0.95em;
            color: var(--accent);
        }

        .callout {
            border-left: 3px solid var(--accent);
            background: #111;
            padding: 1rem 1.5rem;
            margin: 2rem 0;
            font-family: var(--font-mono);
            font-size: 0.95rem;
            color: #ccc;
        }
        .callout span {
            display: block;
            color: var(--muted);
            font-size: 0.8rem;
            margin-bottom: 0.4rem;
        }

        /* Screenshot box */
        .screenshot-box {
            border: 1px solid var(--border);
            background: #111;
            padding: 0;
            margin: 2rem 0;
            overflow: hidden;
        }
        .screenshot-box .screenshot-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--muted);
            padding: 0.6rem 1rem;
            border-bottom: 1px solid var(--border);
            background: #0d0d0d;
        }
        .screenshot-box img {
            width: 100%;
            display: block;
        }

        /* Reddit quote */
        .reddit-quote {
            border-left: 3px solid #ff4500;
            background: #111;
            padding: 1.2rem 1.5rem;
            margin: 2rem 0;
            font-size: 1rem;
            color: #ccc;
            font-style: italic;
            line-height: 1.7;
        }
        .reddit-quote .reddit-meta {
            font-family: var(--font-mono);
            font-size: 0.72rem;
            color: #ff4500;
            font-style: normal;
            margin-bottom: 0.6rem;
            display: block;
        }

        /* Output example */
        .output-example {
            border: 1px solid var(--border);
            background: #111;
            padding: 1.5rem;
            margin: 1.5rem 0;
        }
        .output-example .prompt-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--accent);
            margin-bottom: 0.4rem;
        }
        .output-example .prompt-text {
            font-weight: 700;
            color: #fff;
            margin-bottom: 1rem;
            font-size: 1rem;
        }
        .output-example .output-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--muted);
            margin-bottom: 0.4rem;
        }
        .output-example .output-text {
            color: var(--text);
            font-size: 0.95rem;
            font-style: italic;
            line-height: 1.7;
        }

        .tags { display: flex; gap: 0.5rem; margin-top: 2rem; flex-wrap: wrap; }
        .tag {
            font-family: var(--font-mono);
            font-size: 0.7rem;
            padding: 2px 8px;
            border: 1px solid var(--border);
            border-radius: 4px;
            color: var(--muted);
        }

        footer {
            margin-top: 6rem;
            padding-bottom: 2rem;
            font-size: 0.8rem;
            color: var(--muted);
            text-align: center;
        }

        @media (max-width: 600px) {
            .post-header h2 { font-size: 2rem; }
            .post-content { padding: 1.5rem; }
            header { flex-direction: column; align-items: flex-start; gap: 1rem; }
            nav a { margin-left: 0; margin-right: 1rem; }
        }
    </style>
</head>
<body>

    <div class="container">
        <header>
            <div class="logo-area" style="font-size: 1.5em;">
                <a href="./index.html"><h1><img src="./image.png" style="height: 2em"> SupraLabs_</h1></a>
            </div>
            <nav>
                <a href="./index.html#news">News</a>
                <a href="https://huggingface.co/SupraLabs" target="blank">HuggingFace</a>
                <a href="./index.html#hardware">Hardware</a>
            </nav>
        </header>

        <article>
            <div class="post-header">
                <div class="post-meta">// 2026-05-29 | Research Roadmap</div>
                <h2>What Makes a Tiny Model Smart?<br>Announcing the SupraLabs<br>Open SLM Research Initiative.</h2>
            </div>
        
            <div class="post-content">
                <p>We are still completely blown away. In just about a week since we dropped our <strong>Supra-50M Instruct</strong> model, the open-source community took it and ran with it. Thanks to your incredible support, we hit <strong>Page 1 of Trending Models in Text Generation</strong>, Page 4 across ALL categories on Hugging Face, crossed <strong>7,000+ downloads</strong>, and even got featured in a YouTube deep-dive! For a non-profit, 100% open-source garage project, this is unreal. Thank you.</p>
        
                <p>But we aren't stopping there. The massive interest in small language models (SLMs) proves that the world wants highly efficient, reproducible computing. To push the boundaries of what tiny "brains" can do, SupraLabs is launching a massive, fully open systematic research initiative. We want to find the exact engineering sweet spots for SLMs, and we are open-sourcing every single pipeline, log, and weight along the way.</p>
        
                <p>Here is the roadmap of the core experiments we are spinning up right now.</p>
        
                <h2>Experiment 1: The Ultimate Data-Mix Showdown</h2>
                <p>Everyone knows data quality is king, but what is the absolute best data recipe when your parameter budget is ultra-tight? We are pitting the top open-source datasets against each other to find the perfect synergy.</p>
                
                <ul>
                    <li><strong>The Setup:</strong> We are training an ultra-lean <strong>5M parameter Llama model</strong> using Hugging Face Transformers.</li>
                    <li><strong>The Data:</strong> Exactly <strong>100 Million tokens</strong> total per run, testing four configurations:
                        <br>1. 100% <code>FineWeb-Edu</code>
                        <br>2. 100% <code>DCLM-Edu</code>
                        <br>3. 100% <code>Cosmopedia-v2</code>
                        <br>4. Custom algorithmic token-level mixes of all three.
                    </li>
                </ul>
                <p><strong>The Goal:</strong> Find out if highly structured synthetic data outpaces heavily curated web scraps at the 5M scale, or if a hybrid mix yields the ultimate downstream generalizability.</p>
        
                <div class="callout">
                    <span>// THE EVALUATION SUITE (LM-EVAL)</span>
                    To find the true sweet spot, every single model in our studies will be rigorously evaluated across this standardized zero-shot/few-shot benchmark matrix:<br><br>
                    • Language & Perplexity → wikitext, lambada<br>
                    • Commonsense & Logic &nbsp;→ hellaswag, piqa, winogrande, boolq<br>
                    • Science & Knowledge &nbsp;→ sciq, openbookqa, arc_easy, arc_challenge<br>
                    • Grammar & Syntax &nbsp;&nbsp;&nbsp;&nbsp;→ blimp
                </div>
        
                <h2>Experiment 2: Scaling Law Realities for Tiny Models</h2>
                <p>Chinchilla scaling laws tell us how to scale compute and data optimally for billion-parameter giants. But do those rules shatter when you scale down to the absolute edge? We are conducting a dedicated scaling study to map out the returns on parameter expansion.</p>
                
                <ul>
                    <li><strong>The Setup:</strong> Keeping dataset size fixed at exactly <strong>2 Billion tokens</strong> of <code>FineWeb-Edu (sample-10BT)</code>.</li>
                    <li><strong>The Core Matrix:</strong> We will train four distinct Llama architectures: <strong>10M, 25M, 50M, and 100M parameters</strong>.</li>
                </ul>
                <p><strong>The Goal:</strong> Identify the exact point of diminishing returns. Does a 25M model fully utilize 2B tokens, or does the 100M model show a massive performance leap on the exact same token footprint? We want to chart the efficiency frontier.</p>
        
                <h2>Experiment 3: Is One Epoch Really All You Need for SLMs?</h2>
                <p>The standard convention for LLMs is "one epoch and move on" to avoid overfitting, popularized by several landmark papers. But small models training on high-quality educational data might be a completely different beast. Can they chew on the same high-signal data multiple times?</p>
                
                <ul>
                    <li><strong>The Setup:</strong> A <strong>10M parameter Llama model</strong> trained on exactly <strong>500 Million tokens</strong> of <code>FineWeb-Edu</code>.</li>
                    <li><strong>The Epoch Matrix:</strong> We are running 5 identical setups, changing only the epoch count: <strong>1 Epoch vs. 2, 3, 4, and 5 Epochs</strong>.</li>
                </ul>
                <p><strong>The Goal:</strong> Pinpoint exactly where overfitting begins for an SLM. If performance on <code>lm-eval</code> keeps scaling up past epoch 2 or 3 without destroying perplexity, it could mean data-scarcity solutions for edge AI are much easier than we think.</p>
        
                <h2>Expanding the Frontier: More Ideas We're Testing</h2>
                <p>While configuring our cluster for the three core studies above, we realized we have a golden opportunity to squeeze in even more architectural answers. We have officially added these four bonus dimensions to our upcoming research pipeline:</p>
        
                <h3>A. The Tokenizer Bottleneck</h3>
                <p>Modern tokenizers use massive vocabularies (like Llama 3's 128k). In a 10M parameter model, a huge vocabulary means the embedding layer eats up almost all your parameters, leaving nothing for the actual transformer layers. We will run identical 10M models comparing the Llama 3 tokenizer (128k), Llama 2 tokenizer (32k), and a custom-built 8k/16k vocabulary to see where the parameter balance lies.</p>
        
                <h3>B. Depth vs. Width (Architecture Tweaks)</h3>
                <p>If you have a strict budget of 25M parameters, how should you spend them? We're testing a "deep and narrow" configuration (e.g., 24 layers, smaller hidden dimensions) against a "shallow and wide" setup (e.g., 6 layers, massive hidden dimensions) to evaluate which layout reasons better on standard benchmarks.</p>
        
                <h3>C. The Sequence Length Penalty</h3>
                <p>Does forcing a longer context window ruin a tiny model's general capability? We will train identical models across 512, 1024, and 2048 context windows to see if extending context capacity directly penalizes the model's core knowledge density.</p>
        
                <h3>D. LR Schedule Optimization for Ultra-Short Runs</h3>
                <p>Standard cosine decay schedules are meant for trillions of tokens. For short 1B–2B token runs, we will experiment with aggressive linear decays and constant learning rates with sudden drops to establish the absolute fastest convergence paths for indie researchers.</p>
        
                <h2>Everything Will Be Open. Everything.</h2>
                <p>SupraLabs is entirely non-profit, and our commitment to open science means we won't just publish a PDF with pretty graphs. When these runs complete, we will be releasing:</p>
                
                <ul>
                    <li>Every single checkpoint and weight file on Hugging Face.</li>
                    <li>Complete, unedited <code>lm-eval</code> logs and raw data points.</li>
                    <li>Our training configurations and custom setup code so anyone can replicate our work on their own hardware.</li>
                </ul>
        
                <p>We're getting the compute nodes warmed up as you read this. Stay tuned for the raw data drops—we're about to find out exactly how much power we can pack into these tiny architectures.</p>
        
                <div class="callout">
                    <span>// JOIN THE INITIATIVE</span>
                    Track our progress directly on Hugging Face: <a href="https://huggingface.co/SupraLabs" target="_blank" style="color: var(--accent); text-decoration: underline;">huggingface.co/SupraLabs</a><br>
                    Codebase, configs, and automation tools will be linked there as the runs kick off.
                </div>
        
                <div class="tags">
                    <span class="tag">#open-research</span>
                    <span class="tag">#slm-scaling</span>
                    <span class="tag">#supralabs</span>
                    <span class="tag">#open-source</span>
                    <span class="tag">#data-science</span>
                    <span class="tag">#tinyml</span>
                    <span class="tag">#benchmarking</span>
                </div>
            </div>
        </article>

        <footer>
            <p class="mono">&copy; 2026 SupraLabs // Built for the community.</p>
        </footer>
    </div>

</body>
</html>