Spaces:
Running
Running
Update blog.html
Browse files
blog.html
CHANGED
|
@@ -277,6 +277,15 @@
|
|
| 277 |
<section class="blog-section">
|
| 278 |
<div class="container">
|
| 279 |
<div class="blog-grid">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
<a href="Teaching AI to Regret: The Backspace Token Theory.html" class="blog-card">
|
| 281 |
<div class="blog-meta">
|
| 282 |
<span class="blog-date">2026-03-6</span>
|
|
|
|
| 277 |
<section class="blog-section">
|
| 278 |
<div class="container">
|
| 279 |
<div class="blog-grid">
|
| 280 |
+
<a href="The Chinchilla Effect: Why Tiny Models Have to Be Picky.html" class="blog-card">
|
| 281 |
+
<div class="blog-meta">
|
| 282 |
+
<span class="blog-date">2026-03-7</span>
|
| 283 |
+
<span class="blog-tag"> Scaling Laws</span>
|
| 284 |
+
</div>
|
| 285 |
+
<h2>The Chinchilla Effect: Why Tiny Models Have to Be Picky</h2>
|
| 286 |
+
<p>The Chinchilla paper told us something elegant. For compute optimal training, aim for roughly twenty tokens per parameter. A 70 billion parameter model wants 1.4 trillion tokens. A 1 million parameter model wants 20 million tokens. The math is clean. The implication is messy.</p>
|
| 287 |
+
<span class="blog-read-more">Read more</span>
|
| 288 |
+
</a>
|
| 289 |
<a href="Teaching AI to Regret: The Backspace Token Theory.html" class="blog-card">
|
| 290 |
<div class="blog-meta">
|
| 291 |
<span class="blog-date">2026-03-6</span>
|