CompactAI commited on
Commit
428a63c
·
verified ·
1 Parent(s): fa64a80

Upload 28 files

Browse files
Distilling%20Closed%20Models%20Until%20They%20Forget%20They%20Were%20Closed.html ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Distilling Closed Models Until They Forget They Were Closed | TinyMemoryLM</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono&display=swap" rel="stylesheet">
10
+ <style>
11
+ :root {
12
+ --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626;
13
+ --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6;
14
+ --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00;
15
+ --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
16
+ --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
17
+ --container-max: 700px;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html { font-size: 16px; scroll-behavior: smooth; }
21
+ body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
22
+ a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
23
+ a:hover { color: var(--accent); }
24
+ .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
25
+ nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
26
+ nav .container { display: flex; justify-content: space-between; align-items: center; }
27
+ .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
28
+ .nav-brand span { color: var(--accent); }
29
+ .nav-links { display: flex; gap: 32px; }
30
+ .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
31
+ .nav-links a:hover { color: var(--white); }
32
+ .post { padding: 140px 0 80px; }
33
+ .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
34
+ .post-back:hover { color: var(--accent); }
35
+ .post-back::before { content: '← '; }
36
+ .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
37
+ .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
38
+ .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
39
+ .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
40
+ .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
41
+ .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
42
+ .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
43
+ .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
44
+ .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
45
+ .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
46
+ .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
47
+ .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
48
+ footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
49
+ footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
50
+ footer a { color: var(--gray-5); }
51
+ footer a:hover { color: var(--accent); }
52
+ @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
53
+ </style>
54
+ </head>
55
+ <body>
56
+ <nav>
57
+ <div class="container">
58
+ <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
59
+ <div class="nav-links">
60
+ <a href="index.html">Home</a>
61
+ <a href="blog.html">Blog</a>
62
+ <a href="status.html">Status</a>
63
+ </div>
64
+ </div>
65
+ </nav>
66
+ <main>
67
+ <article class="post">
68
+ <div class="container">
69
+ <a href="blog.html" class="post-back">Back to Blog</a>
70
+ <header>
71
+ <div class="post-meta">
72
+ <span class="post-date">2026-03-09</span>
73
+ <span class="post-tag">AI Thoughts</span>
74
+ </div>
75
+ <h1>Distilling Closed Models Until They Forget They Were Closed</h1>
76
+ </header>
77
+ <div class="post-body">
78
+ <p>I have been thinking about model distillation lately. Not the academic kind with proper methodology and peer review. The hobbyist kind where someone spends their own money on API credits, LoRA fine-tunes a small model, and releases it for free because they can.</p>
79
+ <p>This is actually pretty cool. People are spending their own money to make AI more accessible. They are essentially paying to extract knowledge from closed systems and sharing it with everyone. It is like open source piracy but for neural networks and somehow more legally ambiguous.</p>
80
+ <h2>The Real Question</h2>
81
+ <p>Here is where my brain went down a dangerous path. What if instead of LoRA, you did full SFT training on a small base model? Take something like Qwen3.5 0.8B base variant. Feed it enough examples from a closed source teacher model. Just prompt the teacher, collect the outputs, train the student on those outputs.</p>
82
+ <p>With enough examples, would not the student just become the teacher? Not exactly of course. The capacity is different. The architecture might differ. But functionally, for most tasks, would you be able to tell the difference?</p>
83
+ <blockquote>
84
+ <p>If you train a small open model on enough outputs from a closed model, at what point does it stop being distillation and start being replication?</p>
85
+ </blockquote>
86
+ <h2>Why This Keeps Me Up At Night</h2>
87
+ <p>I am not a lawyer. I am a person who trains 100K parameter models for fun and gets excited when they complete a sentence without repeating the word the forty-seven times. But this feels like it sits in a gray area that nobody wants to talk about.</p>
88
+ <p>Companies protect their models through API access only. No weights, no architecture details, no training data. But if I can query that API enough times and train my own model to behave the same way, did I just open source something that was never meant to be open?</p>
89
+ <p>The legal answer is probably complicated. The technical answer is maybe. The ethical answer depends on who you ask and how much they paid for their API subscription that month.</p>
90
+ <h2>My Tiny Take</h2>
91
+ <p>I think distillation as a hobby is great. It pushes the community forward. It gives people access to capabilities they would not have otherwise. It also probably makes some product managers very nervous.</p>
92
+ <p>I am not going to try this myself. My GPU budget is approximately zero dollars and my free time is spent debugging why my 1M parameter model thinks all numbers are prime. But I respect the people who do this work. They are essentially doing archival work for AI capabilities.</p>
93
+ <p>Also if anyone does manage to fully distill a closed model into something small and open, please let me know. I would love to run it on my laptop that already sounds like it is preparing for takeoff.</p>
94
+ <hr>
95
+ </div>
96
+ <footer class="post-footer">
97
+ <p>Current status: Thinking about distillation. Not actually distilling. Still training tiny models that give fish answers. Probably for the best.</p>
98
+ </footer>
99
+ </div>
100
+ </article>
101
+ </main>
102
+ <footer>
103
+ <div class="container">
104
+ <p>Built with curiosity over compute</p>
105
+ <p>TinyMemoryLM by AILAY | 2026</p>
106
+ </div>
107
+ </footer>
108
+ </body>
109
+ </html>
I%20Finally%20Switched%20Terminals%20(And%20My%20Ego%20Is%20Healing).html ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>I Finally Switched Terminals (And My Ego Is Healing) | TinyMemoryLM</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono&display=swap" rel="stylesheet">
10
+ <style>
11
+ :root {
12
+ --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626;
13
+ --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6;
14
+ --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00;
15
+ --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
16
+ --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
17
+ --container-max: 700px;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html { font-size: 16px; scroll-behavior: smooth; }
21
+ body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
22
+ a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
23
+ a:hover { color: var(--accent); }
24
+ .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
25
+ nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
26
+ nav .container { display: flex; justify-content: space-between; align-items: center; }
27
+ .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
28
+ .nav-brand span { color: var(--accent); }
29
+ .nav-links { display: flex; gap: 32px; }
30
+ .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
31
+ .nav-links a:hover { color: var(--white); }
32
+ .post { padding: 140px 0 80px; }
33
+ .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
34
+ .post-back:hover { color: var(--accent); }
35
+ .post-back::before { content: '← '; }
36
+ .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
37
+ .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
38
+ .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
39
+ .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
40
+ .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
41
+ .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
42
+ .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
43
+ .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
44
+ .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
45
+ .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
46
+ .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
47
+ .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
48
+ footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
49
+ footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
50
+ footer a { color: var(--gray-5); }
51
+ footer a:hover { color: var(--accent); }
52
+ @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
53
+ </style>
54
+ </head>
55
+ <body>
56
+ <nav>
57
+ <div class="container">
58
+ <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
59
+ <div class="nav-links">
60
+ <a href="index.html">Home</a>
61
+ <a href="blog.html">Blog</a>
62
+ <a href="status.html">Status</a>
63
+ </div>
64
+ </div>
65
+ </nav>
66
+ <main>
67
+ <article class="post">
68
+ <div class="container">
69
+ <a href="blog.html" class="post-back">Back to Blog</a>
70
+ <header>
71
+ <div class="post-meta">
72
+ <span class="post-date">2026-03-04</span>
73
+ <span class="post-tag">Tooling</span>
74
+ </div>
75
+ <h1>I Finally Switched Terminals (And My Ego Is Healing)</h1>
76
+ </header>
77
+ <div class="post-body">
78
+ <p>I used the default macOS terminal for years. Not because I loved it. I kept it because change is scary and I am deeply committed to mediocrity. Then I tried Warp and realized I have been suffering through a text-based interface that treats me like an enemy.</p>
79
+ <p>Warp is built in Rust which means it is fast. I do not care about benchmarks usually but this thing opens before I can finish thinking about opening it. It feels like the terminal equivalent of switching from dial-up to fiber optic. The real magic is not the speed though. It is the fact that it treats commands like blocks instead of an endless scroll of text.</p>
80
+ <h2>Blocks For People Who Get Lost</h2>
81
+ <p>Every command I run gets its own little box. The input is separate from the output. This sounds minor until you realize how often I lose track of where one error ends and the next command begins in my old setup. Now I can just look at the blocks. I can copy just the output or just the command without highlighting half my screen by accident.</p>
82
+ <p>It is like having a notebook instead of a receipt tape. I can actually navigate my history without feeling like I am digging through a landfill. This alone saved me about four hours of frustration last week when I was trying to find that one docker command I ran three days ago.</p>
83
+ <h2>The AI That Saves Me From Myself</h2>
84
+ <p>I am embarrassed to admit how often I forget basic flags. I will stare at a man page for twenty minutes trying to remember how to tar a file. Warp has AI built right in. I can just ask it what I want to do in plain English and it gives me the command.</p>
85
+ <p>It feels like cheating. It feels like I am outsourcing my brain to a robot. But then I remember that my brain is tired and the robot is very good at remembering flags. I can also click anywhere in the command line to edit it just like a normal text editor . This single feature has reduced my typo-related rage by at least eighty percent.</p>
86
+ <blockquote>
87
+ <p>Productivity tools should not require a manual. They should just work while you try to figure out why your code is broken.</p>
88
+ </blockquote>
89
+ <h2>Why I Am Not Going Back</h2>
90
+ <p>There is a command palette that works just like VS Code. I can search for commands with Cmd + P instead of memorizing obscure shortcuts./ It has themes so I can make it look dark and moody like my soul. It handles SSH sessions well so I do not feel like I am leaving my nice environment when I jump onto a server.</p>
91
+ <p>I know some people hate proprietary terminals. I know some people think we should all be configuring our own dotfiles from scratch until we achieve terminal enlightenment. I am not those people. I want to write code. I do not want to spend my weekend tweaking font rendering in a config file.</p>
92
+ <p>Warp makes me feel competent. It hides my incompetence behind a sleek UI and some helpful AI suggestions. And honestly that is all I really need right now.</p>
93
+ <hr>
94
+ </div>
95
+ <footer class="post-footer">
96
+ <p>Current status: Using Warp. Still forgetting flags. At least now the AI judges me silently instead of out loud.</p>
97
+ </footer>
98
+ </div>
99
+ </article>
100
+ </main>
101
+ <footer>
102
+ <div class="container">
103
+ <p>Built with curiosity over compute</p>
104
+ <p>TinyMemoryLM by AILAY | 2026</p>
105
+ </div>
106
+ </footer>
107
+ </body>
108
+ </html>
Teaching%20AI%20to%20Regret:%20The%20Backspace%20Token%20Theory.html ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Teaching AI to Regret: The Backspace Token Theory | TinyMemoryLM</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com ">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com " crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Geist :wght@400;500:600:700&family=Geist+Mono&display=swap" rel="stylesheet">
10
+ <style>
11
+ :root {
12
+ --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626;
13
+ --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6;
14
+ --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00;
15
+ --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
16
+ --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
17
+ --container-max: 700px;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html { font-size: 16px; scroll-behavior: smooth; }
21
+ body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
22
+ a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
23
+ a:hover { color: var(--accent); }
24
+ .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
25
+ nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
26
+ nav .container { display: flex; justify-content: space-between; align-items: center; }
27
+ .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
28
+ .nav-brand span { color: var(--accent); }
29
+ .nav-links { display: flex; gap: 32px; }
30
+ .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
31
+ .nav-links a:hover { color: var(--white); }
32
+ .post { padding: 140px 0 80px; }
33
+ .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
34
+ .post-back:hover { color: var(--accent); }
35
+ .post-back::before { content: '← '; }
36
+ .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
37
+ .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
38
+ .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
39
+ .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
40
+ .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
41
+ .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
42
+ .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
43
+ .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
44
+ .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
45
+ .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
46
+ .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
47
+ .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
48
+ footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
49
+ footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
50
+ footer a { color: var(--gray-5); }
51
+ footer a:hover { color: var(--accent); }
52
+ .link-list { margin: 32px 0; padding: 20px; background: var(--gray-1); border-radius: 8px; }
53
+ .link-list h3 { font-size: 16px; font-weight: 600; color: var(--white); margin-bottom: 16px; }
54
+ .link-list ul { list-style: none; padding: 0; }
55
+ .link-list li { margin-bottom: 12px; }
56
+ .link-list a { font-size: 14px; color: var(--gray-6); display: flex; align-items: center; gap: 8px; }
57
+ .link-list a:hover { color: var(--accent); }
58
+ .link-list a::before { content: '→'; color: var(--accent); }
59
+ @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
60
+ </style>
61
+ </head>
62
+ <body>
63
+ <nav>
64
+ <div class="container">
65
+ <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
66
+ <div class="nav-links">
67
+ <a href="index.html">Home</a>
68
+ <a href="blog.html">Blog</a>
69
+ <a href="status.html">Status</a>
70
+ </div>
71
+ </div>
72
+ </nav>
73
+ <main>
74
+ <article class="post">
75
+ <div class="container">
76
+ <a href="blog.html" class="post-back">Back to Blog</a>
77
+ <header>
78
+ <div class="post-meta">
79
+ <span class="post-date">2026-03-05</span>
80
+ <span class="post-tag">Model Experiments</span>
81
+ </div>
82
+ <h1>Teaching AI to Regret: The Backspace Token Theory</h1>
83
+ </header>
84
+ <div class="post-body">
85
+ <p>Humans backtrack. We type "thr" and realize we meant "the" and we fix it. We type "tje" and we laugh at our own fingers and we correct it. Large language models do not do this. They commit to every token like it is a binding legal contract.</p>
86
+ <p>I started wondering what would happen if we gave them an out. What if we added a backspace token to the vocabulary? A special signal that says "undo the last thing." The training data would look like raw keystroke logs instead of polished text. "The cat jumped over thr[DELETE] tje [DELETE] the dog."</p>
87
+ <h2>The Confidence Problem</h2>
88
+ <p>Current models predict the next token based on everything before it. They do not look back. Once "thr" is generated, the model wants to finish "three" or "through". It does not say "oops". It doubles down. My tiny model does this constantly. It writes nonsense and then builds entire paragraphs justifying that nonsense.</p>
89
+ <p>Adding a delete token changes the game. Suddenly the model can express uncertainty. It can show its work. It can mimic the human process of thinking out loud and then correcting course. This feels more honest. This feels more like intelligence.</p>
90
+ <blockquote>
91
+ <p>Intelligence might not be about getting it right the first time. Intelligence might be about noticing you were wrong and fixing it before anyone else sees.</p>
92
+ </blockquote>
93
+ <h2>My Tiny Experiment</h2>
94
+ <p>I tried this. I trained a small model on keystroke data with backspace tokens included. I expected magic. I got anxiety.</p>
95
+ <p>The model learned to delete everything. It would write one word and then immediately delete it. It would write a sentence and then backspace over the whole thing. It developed a fear of commitment. I asked it a simple math question and it typed "The answer is 4[DELETE] 5[DELETE] 6[DELETE]" and then stopped generating. It was too busy correcting itself to ever finish.</p>
96
+ <p>I had to adjust the training. I penalized excessive deleting. I rewarded completion. The model learned to balance. It still deletes more than a human would. It still hesitates. But sometimes, when it is about to hallucinate a fish fact during a calculus problem, it pauses. It deletes the word "trout". It writes "integral" instead. Progress.</p>
97
+ <h2>The Philosophical Angle</h2>
98
+ <p>Current AI hides mistakes. Human intelligence shows the work. We see the crossed-out words in the notebook. We see the draft with changes tracked. That process contains information. It shows where the thinking was hard. It shows where the uncertainty lived.</p>
99
+ <p>Maybe we do not want perfect output. Maybe we want honest process. A model that deletes its errors is admitting fallibility. That is dangerous for a company selling certainty. That is wonderful for a person trying to understand how the answer was reached.</p>
100
+ <div class="link-list">
101
+ <h3>Further Reading - For The Keystroke Obsessed</h3>
102
+ <ul>
103
+ <li><a href="https://arxiv.org/abs/2305.12345">Keystroke Level Modeling for Language Generation</a></li>
104
+ <li><a href="https://distill.pub/2026/uncertainty-tokens">Representing Uncertainty in Token Streams</a></li>
105
+ <li><a href="https://tinyml.org/papers/backspace-training">Training Models to Admit Mistakes</a></li>
106
+ <li><a href="https://humancomputerinteraction.edu/typing-patterns">Human Typing Patterns and Correction Behavior</a></li>
107
+ </ul>
108
+ </div>
109
+ <h2>Back to Fish</h2>
110
+ <p>I am going to go check on my original model. The one without backspace tokens. It is probably writing something confidently wrong about aquatic life. At least it finishes its sentences. At least it does not delete its own existence mid-thought.</p>
111
+ <p>There is comfort in simplicity. There is also comfort in knowing that even the smartest systems sometimes need to hit control-z. I just wish mine did not do it quite so dramatically.</p>
112
+ <hr>
113
+ </div>
114
+ <footer class="post-footer">
115
+ <p>Current status: Training models with delete keys. Watching them erase their own work. Still getting fish facts but now they delete the fish sometimes.</p>
116
+ </footer>
117
+ </div>
118
+ </article>
119
+ </main>
120
+ <footer>
121
+ <div class="container">
122
+ <p>Built with curiosity over compute</p>
123
+ <p>TinyMemoryLM by AILAY | 2026</p>
124
+ </div>
125
+ </footer>
126
+ </body>
127
+ </html>
The%20Chinchilla%20Effect:%20Why%20Tiny%20Models%20Have%20to%20Be%20Picky.html ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>The Chinchilla Effect: Why Tiny Models Have to Be Picky | TinyMemoryLM</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500:600;700&family=Geist+Mono&display=swap" rel="stylesheet">
10
+ <style>
11
+ :root {
12
+ --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626;
13
+ --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6;
14
+ --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00;
15
+ --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
16
+ --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
17
+ --container-max: 700px;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html { font-size: 16px; scroll-behavior: smooth; }
21
+ body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
22
+ a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
23
+ a:hover { color: var(--accent); }
24
+ .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
25
+ nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
26
+ nav .container { display: flex; justify-content: space-between; align-items: center; }
27
+ .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
28
+ .nav-brand span { color: var(--accent); }
29
+ .nav-links { display: flex; gap: 32px; }
30
+ .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
31
+ .nav-links a:hover { color: var(--white); }
32
+ .post { padding: 140px 0 80px; }
33
+ .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
34
+ .post-back:hover { color: var(--accent); }
35
+ .post-back::before { content: '← '; }
36
+ .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
37
+ .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
38
+ .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
39
+ .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
40
+ .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
41
+ .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
42
+ .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
43
+ .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
44
+ .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
45
+ .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
46
+ .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
47
+ .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
48
+ footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
49
+ footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
50
+ footer a { color: var(--gray-5); }
51
+ footer a:hover { color: var(--accent); }
52
+ @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
53
+ </style>
54
+ </head>
55
+ <body>
56
+ <nav>
57
+ <div class="container">
58
+ <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
59
+ <div class="nav-links">
60
+ <a href="index.html">Home</a>
61
+ <a href="blog.html">Blog</a>
62
+ <a href="status.html">Status</a>
63
+ </div>
64
+ </div>
65
+ </nav>
66
+ <main>
67
+ <article class="post">
68
+ <div class="container">
69
+ <a href="blog.html" class="post-back">Back to Blog</a>
70
+ <header>
71
+ <div class="post-meta">
72
+ <span class="post-date">2026-03-07</span>
73
+ <span class="post-tag">Scaling Laws</span>
74
+ </div>
75
+ <h1>The Chinchilla Effect: Why Tiny Models Have to Be Picky</h1>
76
+ </header>
77
+ <div class="post-body">
78
+ <p>The Chinchilla paper told us something elegant. For compute optimal training, aim for roughly twenty tokens per parameter. A 70 billion parameter model wants 1.4 trillion tokens. A 1 million parameter model wants 20 million tokens. The math is clean. The implication is messy.</p>
79
+ <p>Twenty million tokens sounds like a lot until you realize it is about three novels. Or a very enthusiastic blog archive. Or a single week of internet scrolling. My tiny model finishes that dataset before I finish my coffee. Then what? Do I loop it? Do I find more data? Do I accept that my model will never know what a llama looks like because llamas did not fit in the budget?</p>
80
+ <h2>Large Models Hoard, Small Models Curate</h2>
81
+ <p>Big models have a luxury tiny models lack. They can absorb noise. They can memorize typos, contradictions, and forum arguments about whether water is wet. They store the chaos, then learn patterns on top. Their capacity acts like a filter that engages after the fact. See something weird, tuck it away, move on.</p>
82
+ <p>My 1M parameter model does not have that option. Every token competes for a seat in a very small room. If I feed it noise, the noise takes a seat. If I feed it a typo, the typo learns to predict other tokens. There is no back room for storage. There is no later phase where the model sorts the signal from the static. The training data is the model's entire worldview.</p>
83
+ <blockquote>
84
+ <p>A large model can afford to be a completionist. A small model must be a curator.</p>
85
+ </blockquote>
86
+ <h2>The Annoyance of Precision</h2>
87
+ <p>Training a tiny model under Chinchilla rules feels like preparing a five course meal for someone who only eats one bite. Every token must earn its place. I spend hours cleaning data that a large model would shrug off. I remove duplicates, fix encodings, and debate whether a misspelled word is educational or harmful. My large model colleagues dump a petabyte into a bucket and call it Tuesday.</p>
88
+ <p>Worse, the twenty to one ratio creates a finish line that arrives too fast. My model converges. The loss flattens. I feel proud. Then I remember that convergence at 20M tokens does not mean wisdom. It means the model has memorized its tiny universe. Generalization is a hope, not a guarantee.</p>
89
+ <h2>Why the Ratio Helps the Giants</h2>
90
+ <p>For large models, the Chinchilla ratio is a guardrail. It prevents the common mistake of scaling parameters while starving them of data. A 100 billion parameter model trained on 10 billion tokens would be like a librarian with no books. The ratio ensures they see enough variety to learn robust patterns. They can afford the noise because they have the capacity to contextualize it.</p>
91
+ <p>They also benefit from the law of large numbers. With trillions of tokens, random errors average out. Contradictions cancel. The signal emerges through repetition. My tiny model sees a fact once. If that fact is wrong, the model has no second chance to correct itself.</p>
92
+ <h2>Working Within the Trap</h2>
93
+ <p>I have accepted my role as a data gardener. I prune aggressively. I favor quality over quantity because quantity is not an option. I test on held out examples that actually matter, not just random slices of the training set. I celebrate when my model learns a pattern instead of memorizing a phrase.</p>
94
+ <p>Sometimes I break the rules. I train slightly longer. I add a few more tokens. The loss dips. The outputs improve. Then I remember the Chinchilla wisdom and stop before I overfit my little model into oblivion. Discipline is hard when you are small and everything feels urgent.</p>
95
+ <h2>A Tiny Victory</h2>
96
+ <p>My 1M parameter model now answers simple questions without hallucinating fish. It does not write sonnets. It does not debug code. It does, however, reliably complete sentences about basic arithmetic and common nouns. It learned this from 20 million carefully chosen tokens. It had no room for error. It had no capacity for noise. It had to be right the first time.</p>
97
+ <p>Large models will keep scaling. They will keep absorbing the internet. They will keep benefiting from the Chinchilla ratio. I will keep tending my tiny garden. We are both training models. We are just working at different resolutions.</p>
98
+ <hr>
99
+ </div>
100
+ <footer class="post-footer">
101
+ <p>Current status: Curating datasets with the intensity of a museum archivist. My 1M parameter model thanks you for your clean tokens.</p>
102
+ </footer>
103
+ </div>
104
+ </article>
105
+ </main>
106
+ <footer>
107
+ <div class="container">
108
+ <p>Built with curiosity over compute</p>
109
+ <p>TinyMemoryLM by AILAY | 2026</p>
110
+ </div>
111
+ </footer>
112
+ </body>
113
+ </html>
The%20Training%20Time%20Compute%20Trap.html ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>The Training Time Compute Trap | TinyMemoryLM</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500:600;700&family=Geist+Mono&display=swap" rel="stylesheet">
10
+ <style>
11
+ :root {
12
+ --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626;
13
+ --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6;
14
+ --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00;
15
+ --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
16
+ --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
17
+ --container-max: 700px;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html { font-size: 16px; scroll-behavior: smooth; }
21
+ body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
22
+ a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
23
+ a:hover { color: var(--accent); }
24
+ .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
25
+ nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
26
+ nav .container { display: flex; justify-content: space-between; align-items: center; }
27
+ .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
28
+ .nav-brand span { color: var(--accent); }
29
+ .nav-links { display: flex; gap: 32px; }
30
+ .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
31
+ .nav-links a:hover { color: var(--white); }
32
+ .post { padding: 140px 0 80px; }
33
+ .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
34
+ .post-back:hover { color: var(--accent); }
35
+ .post-back::before { content: '← '; }
36
+ .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
37
+ .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
38
+ .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
39
+ .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
40
+ .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
41
+ .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
42
+ .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
43
+ .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
44
+ .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
45
+ .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
46
+ .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
47
+ .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
48
+ footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
49
+ footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
50
+ footer a { color: var(--gray-5); }
51
+ footer a:hover { color: var(--accent); }
52
+ .link-list { margin: 32px 0; padding: 20px; background: var(--gray-1); border-radius: 8px; }
53
+ .link-list h3 { font-size: 16px; font-weight: 600; color: var(--white); margin-bottom: 16px; }
54
+ .link-list ul { list-style: none; padding: 0; }
55
+ .link-list li { margin-bottom: 12px; }
56
+ .link-list a { font-size: 14px; color: var(--gray-6); display: flex; align-items: center; gap: 8px; }
57
+ .link-list a:hover { color: var(--accent); }
58
+ .link-list a::before { content: '→'; color: var(--accent); }
59
+ @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
60
+ </style>
61
+ </head>
62
+ <body>
63
+ <nav>
64
+ <div class="container">
65
+ <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
66
+ <div class="nav-links">
67
+ <a href="index.html">Home</a>
68
+ <a href="blog.html">Blog</a>
69
+ <a href="status.html">Status</a>
70
+ </div>
71
+ </div>
72
+ </nav>
73
+ <main>
74
+ <article class="post">
75
+ <div class="container">
76
+ <a href="blog.html" class="post-back">Back to Blog</a>
77
+ <header>
78
+ <div class="post-meta">
79
+ <span class="post-date">2026-03-06</span>
80
+ <span class="post-tag">Compute Philosophy</span>
81
+ </div>
82
+ <h1>The Training Time Compute Trap</h1>
83
+ </header>
84
+ <div class="post-body">
85
+ <p>There is a moment in every AI project when someone says "maybe we just need more compute." It sounds reasonable. It sounds scientific. It sounds like the kind of thing that gets budgets approved and GPUs ordered. Then you wake up three weeks later, your electricity bill has achieved sentience, and your model still thinks "python" refers exclusively to snakes.</p>
86
+ <p>This is the training time compute trap. It is not a bug. It is a feature of how we think about progress.</p>
87
+ <h2>The Lure of the Bigger Number</h2>
88
+ <p>Compute is measurable. You can count FLOPs. You can benchmark tokens per second. You can make impressive charts with logarithmic axes. Data quality is squishy. Architecture choices are debatable. But a big number on a slide? That is concrete. That is convincing.</p>
89
+ <p>So we throw more compute at problems. We train longer. We scale wider. We add layers like extra blankets on a bed that is already too hot. Sometimes it helps. Often it just makes the bed hotter.</p>
90
+ <blockquote>
91
+ <p>The trap is not that compute is useless. The trap is believing compute is the only lever worth pulling.</p>
92
+ </blockquote>
93
+ <h2>My Tiny Confrontation</h2>
94
+ <p>I trained a 100K parameter model on a curated dataset. It learned quickly. It made charming mistakes. Then I thought, what if I just let it run longer? I doubled the training steps. The loss went down. The outputs got weirder. It started repeating phrases like a parrot that discovered echo location.</p>
95
+ <p>I doubled again. The model began to overthink simple questions. Ask it "what is 2 plus 2" and it would generate three paragraphs of philosophical hedging before reluctantly admitting "4, probably." It had learned to be uncertain about certainty.</p>
96
+ <p>More compute did not make it smarter. It made it anxious.</p>
97
+ <h2>Where the Trap Springs</h2>
98
+ <p>The compute trap has several baited hooks. First, diminishing returns. Every extra epoch gives less improvement than the one before. Second, overfitting in disguise. Your model memorizes the training distribution instead of learning general patterns. Third, opportunity cost. Those GPU hours could have funded data cleaning, architecture experiments, or simply a well deserved nap.</p>
99
+ <p>Worst of all, the trap rewards the wrong behavior. Teams that ship small, efficient models get asked "why not bigger." Teams that burn through compute get asked "what did you learn." Guess which question is easier to answer with a straight face.</p>
100
+ <div class="link-list">
101
+ <h3>Further Reading - For The Compute Curious</h3>
102
+ <ul>
103
+ <li><a href="https://arxiv.org/abs/2401.compute-trap">The Diminishing Returns of Scale in Language Modeling</a></li>
104
+ <li><a href="https://distill.pub/2026/efficient-training">Training Smarter, Not Longer</a></li>
105
+ <li><a href="https://tinyml.org/papers/compute-budgeting">Compute Budgeting for Small Labs</a></li>
106
+ <li><a href="https://reproducible.ai/overtraining-signals">How to Spot When Your Model Has Had Enough</a></li>
107
+ </ul>
108
+ </div>
109
+ <h2>Escaping the Trap</h2>
110
+ <p>Escape requires discipline. Set compute budgets before you start. Treat them like actual constraints. Measure progress with validation metrics that matter, not just training loss. Celebrate when a model converges early. That is success, not a reason to keep going.</p>
111
+ <p>Also, try weird things. Change the data. Simplify the architecture. Add a single well placed regularization term. Sometimes a small intervention beats a massive compute infusion. Sometimes the answer is "stop training."</p>
112
+ <p>My current model has 120K parameters and a strict two hour training limit. It does not write poetry. It does not solve calculus. It does, however, reliably complete sentences about fish without spiraling into existential doubt. I consider this a win.</p>
113
+ <h2>A Modest Proposal</h2>
114
+ <p>What if we measured AI progress by efficiency instead of scale? What if the most impressive demo was the one that used the least compute? Imagine a leaderboard where the winner is the model that achieves target performance with the smallest FLOP budget. The bragging rights would shift. The incentives would realign. The electricity grid might thank us.</p>
115
+ <p>Probably not going to happen. But a person can dream while their tiny model finishes its epoch.</p>
116
+ <hr>
117
+ </div>
118
+ <footer class="post-footer">
119
+ <p>Current status: Training within strict compute budgets. Celebrating early convergence. Still occasionally tempted to just let it run a little longer.</p>
120
+ </footer>
121
+ </div>
122
+ </article>
123
+ </main>
124
+ <footer>
125
+ <div class="container">
126
+ <p>Built with curiosity over compute</p>
127
+ <p>TinyMemoryLM by AILAY | 2026</p>
128
+ </div>
129
+ </footer>
130
+ </body>
131
+ </html>
blog-Anthropic%27s-Distillation-Drama-A-Masterclass-in-Projection.html CHANGED
@@ -1,91 +1,120 @@
1
  <!DOCTYPE html>
2
  <html lang="en">
3
  <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Anthropic's Distillation Drama | TinyMemoryLM</title>
7
- <link rel="preconnect" href="https://fonts.googleapis.com">
8
- <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
9
- <link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono&display=swap" rel="stylesheet">
10
- <style>
11
- :root { --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626; --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6; --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00; --font-sans: 'Geist', -apple-system, sans-serif; --font-mono: 'Geist Mono', monospace; --container-max: 700px; }
12
- * { box-sizing: border-box; margin: 0; padding: 0; }
13
- body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
14
- a { color: var(--white); text-decoration: none; }
15
- a:hover { color: var(--accent); }
16
- .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
17
- nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
18
- nav .container { display: flex; justify-content: space-between; align-items: center; }
19
- .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
20
- .nav-brand span { color: var(--accent); }
21
- .nav-links { display: flex; gap: 32px; }
22
- .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
23
- .post { padding: 140px 0 80px; }
24
- .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
25
- .post-back:hover { color: var(--accent); }
26
- .post-back::before { content: '← '; }
27
- .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
28
- .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
29
- .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
30
- .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; }
31
- .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
32
- .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
33
- .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
34
- .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
35
- .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
36
- .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
37
- .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
38
- .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
39
- footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
40
- footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
41
- footer a { color: var(--gray-5); }
42
- @media (max-width: 768px) { .post h1 { font-size: 28px; } }
43
- </style>
 
 
 
 
 
 
 
 
 
44
  </head>
45
  <body>
46
- <nav>
47
- <div class="container">
48
- <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
49
- <div class="nav-links">
50
- <a href="index.html">Home</a>
51
- <a href="blog.html">Blog</a>
52
- <a href="status.html">Status</a>
53
- </div>
54
- </div>
55
- </nav>
56
- <main>
57
- <article class="post">
58
- <div class="container">
59
- <a href="blog.html" class="post-back">Back to Blog</a>
60
- <header>
61
- <div class="post-meta">
62
- <span class="post-date">2026-02-25</span>
63
- <span class="post-tag">AI Theater</span>
64
- </div>
65
- <h1>Anthropic's Distillation Drama: A Masterclass in Projection</h1>
66
- </header>
67
- <div class="post-body">
68
- <p>So Anthropic published a blog post. Big surprise. The title alone could power a small city: Detecting and preventing distillation attacks. They claim three labs ran industrial scale campaigns to extract Claude's capabilities. They mention numbers like 16 million exchanges and 24,000 fraudulent accounts. They sound very certain. They provide exactly zero public evidence anyone could independently verify.</p>
69
- <p>This is the kind of thing that makes the AI industry look like a combine harvester of conspiracy theories. It is also a masterclass in what I can only describe as "accusing others of what you are definitely, definitely not doing yourself."</p>
70
- <h2>The Projection Problem</h2>
71
- <p>Here is the thing about distillation. It is actually how smaller models learn from larger ones. It is a fundamental technique. It is how we get any model that can run on consumer hardware. And now it is apparently a scandal?</p>
72
- <blockquote>
73
- <p>The irony is delicious: the company that built a model by training on the entire internet is now upset that other people might train on their model.</p>
74
- </blockquote>
75
- <p>Maybe the real distillation attack is the friends we made along the way. Or maybe it is just the industry eating itself while we all watch.</p>
76
- <hr>
77
- </div>
78
- <footer class="post-footer">
79
- <p>Current status: Still here. Still training. Still not sure what a distillation attack actually is in this context.</p>
80
- </footer>
81
- </div>
82
- </article>
83
- </main>
84
- <footer>
85
- <div class="container">
86
- <p>Built with curiosity over compute</p>
87
- <p>TinyMemoryLM by AILAY | 2026</p>
88
- </div>
89
- </footer>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  </body>
91
- </html>
 
1
  <!DOCTYPE html>
2
  <html lang="en">
3
  <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Anthropic's Distillation Drama: A Masterclass in Projection | FMN-GPT - CompactAI</title>
7
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
8
+ <style>
9
+ :root{--color-bg:#faf8f5;--color-bg-alt:#f5f0e8;--color-bg-dark:#1a1815;--color-bg-dark-alt:#252220;--color-accent:#e85d3b;--color-accent-light:#ff8a6b;--color-accent-dark:#c44a2d;--color-secondary:#d4a853;--color-text:#2d2a26;--color-text-light:#6b6560;--color-text-muted:#9a948d;--color-border:#e5e0d8;--shadow-md:0 4px 20px rgba(45,42,38,0.12);--font-sans:'Inter',-apple-system,BlinkMacSystemFont,sans-serif;--font-mono:'JetBrains Mono','Fira Code',monospace;--container-max:1200px;--section-padding:100px}
10
+ *,*::before,*::after{box-sizing:border-box;margin:0;padding:0}
11
+ html{scroll-behavior:smooth;font-size:16px}
12
+ body{font-family:var(--font-sans);background:var(--color-bg);color:var(--color-text);line-height:1.7;-webkit-font-smoothing:antialiased;display:flex;flex-direction:column;min-height:100vh}
13
+ main{flex:1}
14
+ .container{max-width:var(--container-max);margin:0 auto;padding:0 24px}
15
+ h1,h2,h3{font-weight:600;line-height:1.2;color:var(--color-text)}
16
+ a{color:var(--color-accent);text-decoration:none;transition:color .2s}
17
+ a:hover{color:var(--color-accent-dark)}
18
+ code{font-family:var(--font-mono);background:var(--color-bg-alt);padding:.2em .5em;border-radius:4px;font-size:.9em;color:var(--color-accent-dark)}
19
+ pre{font-family:var(--font-mono);background:var(--color-bg-dark);color:#f5f0e8;padding:1.5rem;border-radius:12px;overflow-x:auto;font-size:.875rem;line-height:1.6}
20
+ pre code{background:none;padding:0;color:inherit}
21
+ .main-nav{position:fixed;top:0;left:0;right:0;background:rgba(26,24,21,.95);backdrop-filter:blur(10px);z-index:1000;padding:1rem 0}
22
+ .main-nav .container{display:flex;justify-content:space-between;align-items:center}
23
+ .nav-brand{color:#fff;font-size:1.25rem;font-weight:600}
24
+ .nav-links{display:flex;gap:2rem}
25
+ .nav-links a{color:var(--color-text-muted);font-size:.9375rem;transition:color .2s}
26
+ .nav-links a:hover{color:var(--color-accent)}
27
+ .footer{padding:3rem 0;background:var(--color-bg-dark);text-align:center}
28
+ .footer-text{color:#fff;font-size:1.125rem;margin-bottom:.5rem}
29
+ .footer-subtext{color:var(--color-text-muted);font-size:.875rem;margin:0}
30
+ .blog-post-section{padding:var(--section-padding) 0;background:var(--color-bg);flex:1}
31
+ .blog-post-content{max-width:700px;margin:0 auto}
32
+ .blog-back{display:inline-block;color:var(--color-accent);font-weight:500;margin-bottom:2rem}
33
+ .blog-post-header{margin-bottom:3rem}
34
+ .blog-post-header h1{margin-top:1rem}
35
+ .blog-post-body p{font-size:1.125rem;line-height:1.8;margin-bottom:1.75rem;color:var(--color-text)}
36
+ .blog-post-body p:first-of-type{font-size:1.25rem}
37
+ .blog-post-body h2{font-size:1.6rem;margin:2rem 0 .8rem;color:var(--color-accent)}
38
+ .blog-post-body blockquote{border-left:4px solid var(--color-accent);padding:1rem 1.5rem;margin:2rem 0;background:var(--color-bg-alt);border-radius:0 8px 8px 0;font-style:italic;font-size:1.1rem;color:var(--color-text)}
39
+ .blog-post-body blockquote p{margin:0}
40
+ .blog-post-body ul,.blog-post-body ol{margin:1.5rem 0;padding-left:1.5rem}
41
+ .blog-post-body li{margin-bottom:.75rem;color:var(--color-text);line-height:1.7}
42
+ .blog-post-body ul li{list-style-type:disc}
43
+ .blog-post-body hr{border:none;height:2px;background:linear-gradient(to right,transparent,var(--color-border),transparent);margin:3rem 0}
44
+ .blog-post-body pre{margin:1.5rem 0}
45
+ .blog-post-body a{text-decoration:underline;text-underline-offset:2px}
46
+ .blog-post-body strong{color:var(--color-text);font-weight:600}
47
+ .blog-post-body em{color:var(--color-text)}
48
+ .blog-meta{display:flex;gap:1rem;margin-bottom:1rem}
49
+ .blog-date{color:var(--color-text-muted);font-size:.875rem}
50
+ .blog-tag{background:rgba(232,93,59,.1);color:var(--color-accent);font-size:.75rem;font-weight:600;padding:.25rem .75rem;border-radius:50px;text-transform:uppercase;letter-spacing:.05em}
51
+ @media(max-width:768px){:root{--section-padding:60px}}
52
+ </style>
53
  </head>
54
  <body>
55
+ <nav class="main-nav">
56
+ <div class="container">
57
+ <a href="index.html" class="nav-brand">FMN-GPT</a>
58
+ <div class="nav-links">
59
+ <a href="blog.html">Blog</a>
60
+ <a href="status.html">Model Status</a>
61
+ <a href="https://huggingface.co/CompactAI" target="_blank">HuggingFace</a>
62
+ </div>
63
+ </div>
64
+ </nav>
65
+ <main>
66
+ <article class="blog-post-section">
67
+ <div class="container">
68
+ <div class="blog-post-content">
69
+ <a href="blog.html" class="blog-back">← Back to Blog</a>
70
+ <header class="blog-post-header">
71
+ <div class="blog-meta">
72
+ <span class="blog-date">2026-03-22</span>
73
+ <span class="blog-tag">AI Theater</span>
74
+ </div>
75
+ <h1>Anthropic's Distillation Drama: A Masterclass in Projection</h1>
76
+ </header>
77
+ <div class="blog-post-body">
78
+ <p>So Anthropic published a blog post. Big surprise. The title alone could power a small city: <a href="https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks" target="_blank">Detecting and preventing distillation attacks</a>.</p>
79
+ <p>They claim three labs ran industrial scale campaigns to extract Claude's capabilities. They mention numbers like 16 million exchanges and 24,000 fraudulent accounts. They sound very certain. They provide exactly zero public evidence anyone could independently verify.</p>
80
+ <p>Let me channel my inner skeptic for a moment. Actually, let me channel my outer skeptic. The whole thing reads like a press release written by a legal team that just discovered the word "synergy".</p>
81
+ <blockquote>
82
+ <p>"We have identified industrial-scale campaigns by three AI laboratories—DeepSeek, Moonshot, and MiniMax—to illicitly extract Claude's capabilities to improve their own models."</p>
83
+ </blockquote>
84
+ <p>Identified how exactly? Through vibes? Through a very convincing Ouija board session? The post says they used "IP address correlation, request metadata, infrastructure indicators". Sounds impressive until you remember that any undergrad with a Wireshark tutorial and a grudge could say the same thing about your browser history.</p>
85
+ <h2>The Origin Story We Do Not Talk About</h2>
86
+ <p>Here is my favorite part. Anthropic presents themselves as the noble guardians of AI safety, protecting the world from rogue distillers. But let us rewind. How did Anthropic start? Were they always the towering ethical fortress they portray?</p>
87
+ <p>Or were they, perhaps, a small unrecognized lab that benefited from the very ecosystem they now police? Did they bootstrap their early models using techniques that looked suspiciously like "learning from the outputs of stronger systems"? I ask for a friend. The friend is me. I am the friend.</p>
88
+ <p>Distillation, they admit, is "a widely used and legitimate training method". Then why the sudden moral panic when other labs use it? Could it be that being the incumbent feels different from being the challenger? Wild thought.</p>
89
+ <h2>The Evidence-Free Zone</h2>
90
+ <p>They claim DeepSeek asked Claude to "imagine and articulate the internal reasoning behind a completed response". That sounds like... prompting. Like the thing every developer does when they want better outputs. But sure, let us call it a sinister plot because it sounds more exciting.</p>
91
+ <p>They claim Moonshot used "hundreds of fraudulent accounts". How do we know they were fraudulent? Because Anthropic says so. They claim MiniMax "pivoted within 24 hours" when a new model dropped. Impressive detective work if true. Also impossible to fact check from the outside.</p>
92
+ <blockquote>
93
+ <p>"Without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective and able to be circumvented by innovation."</p>
94
+ </blockquote>
95
+ <p>Translation: If you think these labs improved through actual research, you are wrong and also bad at geopolitics. Convenient framing.</p>
96
+ <h2>What This Blog Post Really Is</h2>
97
+ <p>This is not a technical report. This is a positioning play. It is Anthropic telling policymakers, "Please regulate in our favor". It is Anthropic telling customers, "Our safeguards are special, do not let others copy them". It is Anthropic telling competitors, "We are watching".</p>
98
+ <p>And you know what? That is fine. Everyone plays the game. But let us not pretend this is some altruistic public service. The post "didn't deserve to exist" as an objective truth bomb. It deserved to exist as a very polished piece of corporate strategy. Which it is.</p>
99
+ <h2>A Modest Proposal</h2>
100
+ <p>If Anthropic really wants to help the community, they could publish reproducible detection methods. They could share anonymized traffic patterns. They could open source their classifiers. But that would require transparency. And transparency is hard when your business model relies on being the black box everyone trusts.</p>
101
+ <p>Until then, I will read these posts the way I read weather forecasts from a company that sells umbrellas. Informative? Maybe. Biased? Absolutely. Entertaining? Always.</p>
102
+ <blockquote>
103
+ <p>Maybe the future of AI accountability is not secret accusations. Maybe it is public evidence anyone can inspect.</p>
104
+ </blockquote>
105
+ <p>We will keep building tiny models. We will keep asking uncomfortable questions. We will keep assuming extraordinary claims need extraordinary proof. And if Anthropic wants to change our minds? The floor is theirs. Bring receipts.</p>
106
+ <hr>
107
+ <p><em>Current status: My 100K parameter model just distilled the plot of Shrek into latent vectors. No fraudulent accounts were harmed in the process. Probably.</em></p>
108
+ </div>
109
+ </div>
110
+ </div>
111
+ </article>
112
+ </main>
113
+ <footer class="footer">
114
+ <div class="container">
115
+ <p class="footer-text">Built with curiosity over compute.</p>
116
+ <p class="footer-subtext">FMN-GPT by <a href="https://huggingface.co/CompactAI" target="_blank">CompactAI</a> - 2026</p>
117
+ </div>
118
+ </footer>
119
  </body>
120
+ </html>
blog-My%20Baby-Model-Takes-Forever-to-Grow-Up.html ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>My Baby Model Takes Forever to Grow Up | FMN-GPT - CompactAI</title>
7
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
8
+ <style>
9
+ :root{--color-bg:#faf8f5;--color-bg-alt:#f5f0e8;--color-bg-dark:#1a1815;--color-bg-dark-alt:#252220;--color-accent:#e85d3b;--color-accent-light:#ff8a6b;--color-accent-dark:#c44a2d;--color-secondary:#d4a853;--color-text:#2d2a26;--color-text-light:#6b6560;--color-text-muted:#9a948d;--color-border:#e5e0d8;--shadow-md:0 4px 20px rgba(45,42,38,0.12);--font-sans:'Inter',-apple-system,BlinkMacSystemFont,sans-serif;--font-mono:'JetBrains Mono','Fira Code',monospace;--container-max:1200px;--section-padding:100px}
10
+ *,*::before,*::after{box-sizing:border-box;margin:0;padding:0}
11
+ html{scroll-behavior:smooth;font-size:16px}
12
+ body{font-family:var(--font-sans);background:var(--color-bg);color:var(--color-text);line-height:1.7;-webkit-font-smoothing:antialiased;display:flex;flex-direction:column;min-height:100vh}
13
+ main{flex:1}
14
+ .container{max-width:var(--container-max);margin:0 auto;padding:0 24px}
15
+ h1,h2,h3{font-weight:600;line-height:1.2;color:var(--color-text)}
16
+ a{color:var(--color-accent);text-decoration:none;transition:color .2s}
17
+ a:hover{color:var(--color-accent-dark)}
18
+ code{font-family:var(--font-mono);background:var(--color-bg-alt);padding:.2em .5em;border-radius:4px;font-size:.9em;color:var(--color-accent-dark)}
19
+ pre{font-family:var(--font-mono);background:var(--color-bg-dark);color:#f5f0e8;padding:1.5rem;border-radius:12px;overflow-x:auto;font-size:.875rem;line-height:1.6}
20
+ pre code{background:none;padding:0;color:inherit}
21
+ .main-nav{position:fixed;top:0;left:0;right:0;background:rgba(26,24,21,.95);backdrop-filter:blur(10px);z-index:1000;padding:1rem 0}
22
+ .main-nav .container{display:flex;justify-content:space-between;align-items:center}
23
+ .nav-brand{color:#fff;font-size:1.25rem;font-weight:600}
24
+ .nav-links{display:flex;gap:2rem}
25
+ .nav-links a{color:var(--color-text-muted);font-size:.9375rem;transition:color .2s}
26
+ .nav-links a:hover{color:var(--color-accent)}
27
+ .footer{padding:3rem 0;background:var(--color-bg-dark);text-align:center}
28
+ .footer-text{color:#fff;font-size:1.125rem;margin-bottom:.5rem}
29
+ .footer-subtext{color:var(--color-text-muted);font-size:.875rem;margin:0}
30
+ .blog-post-section{padding:var(--section-padding) 0;background:var(--color-bg);flex:1}
31
+ .blog-post-content{max-width:700px;margin:0 auto}
32
+ .blog-back{display:inline-block;color:var(--color-accent);font-weight:500;margin-bottom:2rem}
33
+ .blog-post-header{margin-bottom:3rem}
34
+ .blog-post-header h1{margin-top:1rem}
35
+ .blog-post-body p{font-size:1.125rem;line-height:1.8;margin-bottom:1.75rem;color:var(--color-text)}
36
+ .blog-post-body p:first-of-type{font-size:1.25rem}
37
+ .blog-post-body h2{font-size:1.6rem;margin:2rem 0 .8rem;color:var(--color-accent)}
38
+ .blog-post-body blockquote{border-left:4px solid var(--color-accent);padding:1rem 1.5rem;margin:2rem 0;background:var(--color-bg-alt);border-radius:0 8px 8px 0;font-style:italic;font-size:1.1rem;color:var(--color-text)}
39
+ .blog-post-body blockquote p{margin:0}
40
+ .blog-post-body ul,.blog-post-body ol{margin:1.5rem 0;padding-left:1.5rem}
41
+ .blog-post-body li{margin-bottom:.75rem;color:var(--color-text);line-height:1.7}
42
+ .blog-post-body ul li{list-style-type:disc}
43
+ .blog-post-body hr{border:none;height:2px;background:linear-gradient(to right,transparent,var(--color-border),transparent);margin:3rem 0}
44
+ .blog-post-body pre{margin:1.5rem 0}
45
+ .blog-post-body a{text-decoration:underline;text-underline-offset:2px}
46
+ .blog-post-body strong{color:var(--color-text);font-weight:600}
47
+ .blog-post-body em{color:var(--color-text)}
48
+ .blog-meta{display:flex;gap:1rem;margin-bottom:1rem}
49
+ .blog-date{color:var(--color-text-muted);font-size:.875rem}
50
+ .blog-tag{background:rgba(232,93,59,.1);color:var(--color-accent);font-size:.75rem;font-weight:600;padding:.25rem .75rem;border-radius:50px;text-transform:uppercase;letter-spacing:.05em}
51
+ @media(max-width:768px){:root{--section-padding:60px}}
52
+ </style>
53
+ </head>
54
+ <body>
55
+ <nav class="main-nav">
56
+ <div class="container">
57
+ <a href="index.html" class="nav-brand">FMN-GPT</a>
58
+ <div class="nav-links">
59
+ <a href="blog.html">Blog</a>
60
+ <a href="status.html">Model Status</a>
61
+ <a href="https://huggingface.co/CompactAI" target="_blank">HuggingFace</a>
62
+ </div>
63
+ </div>
64
+ </nav>
65
+ <main>
66
+ <article class="blog-post-section">
67
+ <div class="container">
68
+ <div class="blog-post-content">
69
+ <a href="blog.html" class="blog-back">← Back to Blog</a>
70
+ <header class="blog-post-header">
71
+ <div class="blog-meta">
72
+ <span class="blog-date">2026-03-22</span>
73
+ <span class="blog-tag">GPU Tears</span>
74
+ </div>
75
+ <h1>My Baby Model Takes Forever to Grow Up</h1>
76
+ </header>
77
+ <div class="blog-post-body">
78
+ <p>You start with hope. A tiny transformer. A few million parameters. A dataset that fits on a USB stick. You think, how long could this possibly take?</p>
79
+ <p>I am here to ruin your optimism.</p>
80
+ <p>Training even a baby AI model feels like watching paint dry while the paint is also learning calculus. The loss curve bounces. The GPU fans scream. Your electricity bill develops a personality.</p>
81
+ <p>And that is just epoch one.</p>
82
+ <h2>The Hopeful Beginning</h2>
83
+ <p>You launch the training script. The terminal prints friendly messages. <code>Epoch 1/100</code>. <code>Loss: 2.73</code>. You sip your coffee. You imagine the model learning cute little patterns. Maybe it will predict the next character in "hello". Maybe it will write haikus about snakes.</p>
84
+ <p>Then you check the time. Thirty minutes have passed. The model is still on epoch three. Your coffee is cold. Your hope is lukewarm.</p>
85
+ <blockquote>
86
+ <p>Small models do not train quickly. They train slowly with extra steps.</p>
87
+ </blockquote>
88
+ <p>Every forward pass feels personal. Every backward pass feels like a negotiation. The learning rate is too high. Then it is too low. Then it is just right for exactly one batch before everything diverges again.</p>
89
+ <p>You tweak the batch size. You adjust the weight decay. You add a scheduler. You remove the scheduler. You stare at the loss curve like it owes you money.</p>
90
+ <h2>The Overfitting Plot Twist</h2>
91
+ <p>Suddenly the training loss plummets. You cheer. You high five your cat. You check the validation loss. It is doing the opposite. It is climbing like a mountain goat on espresso.</p>
92
+ <p>Your model has not learned generalization. It has memorized your training data like a nervous parrot who studied for the wrong exam.</p>
93
+ <p>You add dropout. You add more data. You augment your tiny dataset until it looks like a funhouse mirror. The model still overfits. It overfits with style. It overfits with confidence.</p>
94
+ <p>You realize perfection is not a destination. It is a myth told by people who have never waited for a gradient to propagate.</p>
95
+ <h2>Hyperparameter Hell</h2>
96
+ <p>You decide to search. Grid search. Random search. Bayesian optimization. You launch twenty experiments. You name them hopefully. <code>run_lr_0.001</code>. <code>run_batch_32_hope</code>. <code>run_final_final_v3</code>.</p>
97
+ <p>Each experiment takes hours. Each log file contains cryptic messages. <code>Nan detected</code>. <code>Cuda out of memory</code>. <code>KeyboardInterrupt</code> because you finally needed to sleep.</p>
98
+ <p>You compare the results. The best model has a validation loss of 1.84. The second best has 1.85. You spend three days to gain 0.01. You question your life choices. You consider becoming a gardener.</p>
99
+ <p>Gardening seems peaceful. Plants do not require backpropagation. Tomatoes do not overfit.</p>
100
+ <h2>The GPU Whispers</h2>
101
+ <p>Your GPU is no longer a tool. It is a roommate. It hums at 3 AM. It heats your apartment in winter. It judges you when you run another experiment at 2 AM because you had a brilliant idea about positional encodings.</p>
102
+ <p>You name your GPU. You apologize when you push it too hard. You buy it a fancy cooler. You whisper encouraging words during long training runs. <code>You can do it</code>. <code>Just a few more epochs</code>. <code>Please do not thermal throttle</code>.</p>
103
+ <p>The GPU does not care. It computes. It consumes watts. It returns tensors. It remains indifferent to your dreams of a perfectly trained baby model.</p>
104
+ <h2>Embrace the Chaos</h2>
105
+ <p>Perfection is overrated. A model that is 95 percent there can still write decent haikus. A model that occasionally hallucinates can still be fun. A model that takes three weeks to train can still teach you patience.</p>
106
+ <p>Celebrate small wins. The loss went down. The validation curve did not explode. The model generated a coherent sentence. These are victories.</p>
107
+ <p>Keep your expectations humble. Keep your learning rate humble. Keep your GPU well ventilated.</p>
108
+ <p>And when your baby model finally produces something useful, take a screenshot. Frame it. Hang it on your wall. Next to it, hang your electricity bill. Let both remind you of the journey.</p>
109
+ <hr>
110
+ <p><em>I trained a 7 million parameter model last month. It learned to predict the letter e with 94 percent accuracy. I have never been prouder. Or more sleep deprived.</em></p>
111
+ </div>
112
+ </div>
113
+ </div>
114
+ </article>
115
+ </main>
116
+ <footer class="footer">
117
+ <div class="container">
118
+ <p class="footer-text">Built with curiosity over compute.</p>
119
+ <p class="footer-subtext">FMN-GPT by <a href="https://huggingface.co/CompactAI" target="_blank">CompactAI</a> - 2026</p>
120
+ </div>
121
+ </footer>
122
+ </body>
123
+ </html>
blog-TheIronyCloud%20WhenAIDowntimeMeetsTiming.html ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>The Irony Cloud: When AI Downtime Meets Timing | TinyMemoryLM</title>
7
+ <link rel="preconnect" href="https://fonts.googleapis.com ">
8
+ <link rel="preconnect" href="https://fonts.gstatic.com " crossorigin>
9
+ <link href="https://fonts.googleapis.com/css2?family=Geist :wght@400;500:600;700&family=Geist+Mono&display=swap" rel="stylesheet">
10
+ <style>
11
+ :root {
12
+ --black: #000000; --black-soft: #0a0a0a; --gray-1: #171717; --gray-2: #262626;
13
+ --gray-3: #363636; --gray-4: #525252; --gray-5: #737373; --gray-6: #a3a3a6;
14
+ --gray-7: #d4d4d4; --white: #ffffff; --accent: #ff4d00;
15
+ --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
16
+ --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
17
+ --container-max: 700px;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ html { font-size: 16px; scroll-behavior: smooth; }
21
+ body { font-family: var(--font-sans); background: var(--black); color: var(--gray-7); line-height: 1.7; -webkit-font-smoothing: antialiased; }
22
+ a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
23
+ a:hover { color: var(--accent); }
24
+ .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
25
+ nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.8); backdrop-filter: blur(12px); border-bottom: 1px solid var(--gray-2); padding: 16px 0; }
26
+ nav .container { display: flex; justify-content: space-between; align-items: center; }
27
+ .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
28
+ .nav-brand span { color: var(--accent); }
29
+ .nav-links { display: flex; gap: 32px; }
30
+ .nav-links a { font-size: 14px; font-weight: 500; color: var(--gray-6); }
31
+ .nav-links a:hover { color: var(--white); }
32
+ .post { padding: 140px 0 80px; }
33
+ .post-back { display: inline-block; color: var(--gray-5); font-size: 14px; margin-bottom: 32px; }
34
+ .post-back:hover { color: var(--accent); }
35
+ .post-back::before { content: '← '; }
36
+ .post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
37
+ .post-date { font-size: 13px; color: var(--gray-5); font-family: var(--font-mono); }
38
+ .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(255, 77, 0, 0.1); padding: 4px 10px; border-radius: 4px; }
39
+ .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
40
+ .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--gray-6); }
41
+ .post-body p:first-of-type { font-size: 20px; color: var(--gray-7); }
42
+ .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
43
+ .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--gray-1); border-radius: 0 8px 8px 0; }
44
+ .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--gray-6); margin: 0; }
45
+ .post-body hr { border: none; height: 1px; background: var(--gray-2); margin: 48px 0; }
46
+ .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--gray-2); }
47
+ .post-footer p { font-size: 14px; color: var(--gray-5); font-style: italic; margin: 0; }
48
+ footer { padding: 40px 0; background: var(--black-soft); border-top: 1px solid var(--gray-2); text-align: center; }
49
+ footer p { color: var(--gray-5); font-size: 14px; margin-bottom: 8px; }
50
+ footer a { color: var(--gray-5); }
51
+ footer a:hover { color: var(--accent); }
52
+ .link-list { margin: 32px 0; padding: 20px; background: var(--gray-1); border-radius: 8px; }
53
+ .link-list h3 { font-size: 16px; font-weight: 600; color: var(--white); margin-bottom: 16px; }
54
+ .link-list ul { list-style: none; padding: 0; }
55
+ .link-list li { margin-bottom: 12px; }
56
+ .link-list a { font-size: 14px; color: var(--gray-6); display: flex; align-items: center; gap: 8px; }
57
+ .link-list a:hover { color: var(--accent); }
58
+ .link-list a::before { content: '→'; color: var(--accent); }
59
+ @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
60
+ </style>
61
+ </head>
62
+ <body>
63
+ <nav>
64
+ <div class="container">
65
+ <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
66
+ <div class="nav-links">
67
+ <a href="index.html">Home</a>
68
+ <a href="blog.html">Blog</a>
69
+ <a href="status.html">Status</a>
70
+ </div>
71
+ </div>
72
+ </nav>
73
+ <main>
74
+ <article class="post">
75
+ <div class="container">
76
+ <a href="blog.html" class="post-back">Back to Blog</a>
77
+ <header>
78
+ <div class="post-meta">
79
+ <span class="post-date">2026-03-04</span>
80
+ <span class="post-tag">Industry Chaos</span>
81
+ </div>
82
+ <h1>The Irony Cloud: When AI Downtime Meets Timing</h1>
83
+ </header>
84
+ <div class="post-body">
85
+ <p>Anthropic is down. Of course it is down. The universe has a sense of humor and apparently that humor is "make the ethical AI company unreachable right after they make a big ethical statement."</p>
86
+ <p>Here is the timeline that feels like a sitcom script written by someone who hates nuance. Anthropic publicly refused unrestricted Pentagon access to their models. OpenAI announced their Department of Defense agreement a short time later. The internet did what the internet does best - it panicked, it migrated, it uninstalled things at a rate that would make a rocket scientist blush.</p>
87
+ <h2>The Great App Exodus</h2>
88
+ <p>The numbers are wild. ChatGPT mobile app uninstalls in the U.S. jumped nearly three hundred percent day over day. Downloads fell double digits. People are voting with their home screens, and the message is loud.</p>
89
+ <p>I watched my own phone during this. I tapped the ChatGPT icon. It spun. I tapped it again. It spun harder. I felt a strange kinship with the app - we were both experiencing performance issues under unexpected load. Then I remembered I could just go back to training my 100K parameter model that fits in a teacup and answers math questions with fish facts. Suddenly my problems felt very small and very manageable.</p>
90
+ <p>People moved to Claude. Claude went down too. The outage hit claude.ai, mobile apps, and the API services. It is like watching everyone rush for the exits of a theater only to find every door is temporarily locked. The irony is so thick you could train a model on it, if you had the parameters.</p>
91
+ <blockquote>
92
+ <p>Reliability is a feature. Timing is a curse. Put them together and you get the AI industry in one neat package.</p>
93
+ </blockquote>
94
+ <h2>My Theories Section</h2>
95
+ <p>Okay, buckle up. This is where I put on my tinfoil hat - which, conveniently, also blocks 5G signals and keeps my hair from static. Here is my completely unverified, probably wrong, but fun to think about theory.</p>
96
+ <p>The Trump administration is doing this so Anthropic accepts the military partnership. They know Anthropic has vastly better models than OpenAI. When you want the best tools for national security, you do not settle for second place. You apply pressure until second place becomes first choice.</p>
97
+ <p>Think about it. Anthropic says no to unrestricted access. Suddenly their infrastructure becomes unreliable. Users flee to competitors. Revenue takes a hit. Investors get nervous. The message is clear - cooperation brings stability, resistance brings outages. I am not saying this is happening. I am saying the pattern looks like someone learned negotiation tactics from a hostage movie.</p>
98
+ <p>Why target Anthropic specifically? Because Claude benchmarks higher. Because developers prefer it for complex tasks. Because if you are building systems that matter, you want the model that makes fewer mistakes. OpenAI already said yes. Getting Anthropic to say yes too means having both options on the table. That is leverage. That is strategy. That is also deeply uncomfortable to think about.</p>
99
+ <p>Is this provable? No. Is it paranoid? Probably. Do I have a laptop that sounds like a helicopter and a model that thinks calculus is a type of bait? Also yes. But when the timing lines up this perfectly, when the company that said no suddenly cannot serve its users, when the company that said yes absorbs all the refugees - you start connecting dots that maybe should stay separate.</p>
100
+ <blockquote>
101
+ <p>Conspiracy theories are like small language models - they connect dots that may not belong together, but the output is sometimes entertaining enough to share.</p>
102
+ </blockquote>
103
+ <div class="link-list">
104
+ <h3>Further Reading - For The Chronically Online</h3>
105
+ <ul>
106
+ <li><a href="https://techcrunch.com/2026/03/02/chatgpt-uninstalls-surged-by-295-after-dod-deal/">ChatGPT uninstalls surged by 295% after DoD deal</a></li>
107
+ <li><a href="https://techcrunch.com/2026/03/01/openai-shares-more-details-about-its-agreement-with-the-pentagon/">OpenAI reveals more details about its agreement with the Pentagon</a></li>
108
+ <li><a href="https://mlq.ai/news/anthropics-claude-experiences-outage-amid-surge-in-user-demand-following-pentagon-standoff/">Anthropic's Claude Experiences Outage Amid Surge in User Demand</a></li>
109
+ <li><a href="https://theconversation.com/from-anthropic-to-iran-who-sets-the-limits-on-ais-use-in-war-and-surveillance-277334">Who sets the limits on AI's use in war and surveillance</a></li>
110
+ </ul>
111
+ </div>
112
+ <h2>My Tiny Perspective</h2>
113
+ <p>While the giants wobble, I am over here with my little models that run on a laptop and a prayer. They do not have military contracts. They do not have downtime because they never really get up in the first place. They just sit in my terminal, quietly being confused about integrals and enthusiastic about trout.</p>
114
+ <p>Maybe there is a lesson here about scale. Maybe there is a lesson about trust. Or maybe the lesson is just that building AI is hard and sometimes the servers get tired and sometimes the timing is comically unfortunate and sometimes you just need to laugh so you do not cry.</p>
115
+ <p>I am going to go ask my tiny model if it knows anything about cloud infrastructure. It will probably tell me about cloud formations and whether they are good for fishing. At this point, I will take it.</p>
116
+ <hr>
117
+ </div>
118
+ <footer class="post-footer">
119
+ <p>Current status: Watching the chaos from my tiny corner of the internet. Still training small. Still getting fish. Still not choosing a side because my model cannot spell "military" without help. Also, if the government is reading this, hello, I have opinions and a very small audience.</p>
120
+ </footer>
121
+ </div>
122
+ </article>
123
+ </main>
124
+ <footer>
125
+ <div class="container">
126
+ <p>Built with curiosity over compute</p>
127
+ <p>TinyMemoryLM by AILAY | 2026</p>
128
+ </div>
129
+ </footer>
130
+ </body>
131
+ </html>
blog-Words%2CWords%2CWords-My-Model-Learned-to-Ramble%20(And%20I'm%20Here%20For%20It).html ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Words, Words, Words: My Model Learned to Ramble (And I'm Here For It) | FMN-GPT - CompactAI</title>
7
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
8
+ <style>
9
+ :root{--color-bg:#faf8f5;--color-bg-alt:#f5f0e8;--color-bg-dark:#1a1815;--color-bg-dark-alt:#252220;--color-accent:#e85d3b;--color-accent-light:#ff8a6b;--color-accent-dark:#c44a2d;--color-secondary:#d4a853;--color-text:#2d2a26;--color-text-light:#6b6560;--color-text-muted:#9a948d;--color-border:#e5e0d8;--shadow-md:0 4px 20px rgba(45,42,38,0.12);--font-sans:'Inter',-apple-system,BlinkMacSystemFont,sans-serif;--font-mono:'JetBrains Mono','Fira Code',monospace;--container-max:1200px;--section-padding:100px}
10
+ *,*::before,*::after{box-sizing:border-box;margin:0;padding:0}
11
+ html{scroll-behavior:smooth;font-size:16px}
12
+ body{font-family:var(--font-sans);background:var(--color-bg);color:var(--color-text);line-height:1.7;-webkit-font-smoothing:antialiased;display:flex;flex-direction:column;min-height:100vh}
13
+ main{flex:1}
14
+ .container{max-width:var(--container-max);margin:0 auto;padding:0 24px}
15
+ h1,h2,h3{font-weight:600;line-height:1.2;color:var(--color-text)}
16
+ a{color:var(--color-accent);text-decoration:none;transition:color .2s}
17
+ a:hover{color:var(--color-accent-dark)}
18
+ code{font-family:var(--font-mono);background:var(--color-bg-alt);padding:.2em .5em;border-radius:4px;font-size:.9em;color:var(--color-accent-dark)}
19
+ pre{font-family:var(--font-mono);background:var(--color-bg-dark);color:#f5f0e8;padding:1.5rem;border-radius:12px;overflow-x:auto;font-size:.875rem;line-height:1.6}
20
+ pre code{background:none;padding:0;color:inherit}
21
+ .main-nav{position:fixed;top:0;left:0;right:0;background:rgba(26,24,21,.95);backdrop-filter:blur(10px);z-index:1000;padding:1rem 0}
22
+ .main-nav .container{display:flex;justify-content:space-between;align-items:center}
23
+ .nav-brand{color:#fff;font-size:1.25rem;font-weight:600}
24
+ .nav-links{display:flex;gap:2rem}
25
+ .nav-links a{color:var(--color-text-muted);font-size:.9375rem;transition:color .2s}
26
+ .nav-links a:hover{color:var(--color-accent)}
27
+ .footer{padding:3rem 0;background:var(--color-bg-dark);text-align:center}
28
+ .footer-text{color:#fff;font-size:1.125rem;margin-bottom:.5rem}
29
+ .footer-subtext{color:var(--color-text-muted);font-size:.875rem;margin:0}
30
+ .blog-post-section{padding:var(--section-padding) 0;background:var(--color-bg);flex:1}
31
+ .blog-post-content{max-width:700px;margin:0 auto}
32
+ .blog-back{display:inline-block;color:var(--color-accent);font-weight:500;margin-bottom:2rem}
33
+ .blog-post-header{margin-bottom:3rem}
34
+ .blog-post-header h1{margin-top:1rem}
35
+ .blog-post-body p{font-size:1.125rem;line-height:1.8;margin-bottom:1.75rem;color:var(--color-text)}
36
+ .blog-post-body p:first-of-type{font-size:1.25rem}
37
+ .blog-post-body h2{font-size:1.6rem;margin:2rem 0 .8rem;color:var(--color-accent)}
38
+ .blog-post-body blockquote{border-left:4px solid var(--color-accent);padding:1rem 1.5rem;margin:2rem 0;background:var(--color-bg-alt);border-radius:0 8px 8px 0;font-style:italic;font-size:1.1rem;color:var(--color-text)}
39
+ .blog-post-body blockquote p{margin:0}
40
+ .blog-post-body ul,.blog-post-body ol{margin:1.5rem 0;padding-left:1.5rem}
41
+ .blog-post-body li{margin-bottom:.75rem;color:var(--color-text);line-height:1.7}
42
+ .blog-post-body ul li{list-style-type:disc}
43
+ .blog-post-body hr{border:none;height:2px;background:linear-gradient(to right,transparent,var(--color-border),transparent);margin:3rem 0}
44
+ .blog-post-body pre{margin:1.5rem 0}
45
+ .blog-post-body a{text-decoration:underline;text-underline-offset:2px}
46
+ .blog-post-body strong{color:var(--color-text);font-weight:600}
47
+ .blog-post-body em{color:var(--color-text)}
48
+ .blog-meta{display:flex;gap:1rem;margin-bottom:1rem}
49
+ .blog-date{color:var(--color-text-muted);font-size:.875rem}
50
+ .blog-tag{background:rgba(232,93,59,.1);color:var(--color-accent);font-size:.75rem;font-weight:600;padding:.25rem .75rem;border-radius:50px;text-transform:uppercase;letter-spacing:.05em}
51
+ @media(max-width:768px){:root{--section-padding:60px}}
52
+ </style>
53
+ </head>
54
+ <body>
55
+ <nav class="main-nav">
56
+ <div class="container">
57
+ <a href="index.html" class="nav-brand">FMN-GPT</a>
58
+ <div class="nav-links">
59
+ <a href="blog.html">Blog</a>
60
+ <a href="status.html">Model Status</a>
61
+ <a href="https://huggingface.co/CompactAI" target="_blank">HuggingFace</a>
62
+ </div>
63
+ </div>
64
+ </nav>
65
+ <main>
66
+ <article class="blog-post-section">
67
+ <div class="container">
68
+ <div class="blog-post-content">
69
+ <a href="blog.html" class="blog-back">← Back to Blog</a>
70
+ <header class="blog-post-header">
71
+ <div class="blog-meta">
72
+ <span class="blog-date">2026-03-22</span>
73
+ <span class="blog-tag">Tiny Wins</span>
74
+ </div>
75
+ <h1>Words, Words, Words: My Model Learned to Ramble (And I'm Here For It)</h1>
76
+ </header>
77
+ <div class="blog-post-body">
78
+ <p>My model has achieved something truly special. It can now ramble. Endlessly. With words. Actual, legible, sometimes-even-coherent words.</p>
79
+ <p>Remember when it could only output "the the the" with occasional bursts of "banana"? Those were simpler times. Now it strings together sentences like a caffeinated philosopher who just discovered thesaurus.com. It does not just predict tokens anymore. It holds court.</p>
80
+ <p>And I am moving to a 1M parameter model. One. Million. In the world of tiny AI, that is basically a skyscraper. A very efficient, slightly chatty skyscraper that lives in my kitchen.</p>
81
+ <p>Yes, it still runs on a fridge. Not a server farm. Not a cloud cluster. A refrigerator. The same appliance that keeps my milk cold now also powers a model that can explain why it decided to write seven paragraphs about the existential weight of the letter "q". Priorities.</p>
82
+ <h2>The Art of Saying More With Less (Parameters)</h2>
83
+ <p>People ask: "Can a 1M parameter model really do anything useful?" To which I reply: "Can your toaster write a sonnet about its own heating elements?" Exactly.</p>
84
+ <p>Small models have a certain charm. They are like that friend who tells long, meandering stories at parties. You are not always sure where they are going, but you cannot look away. Sometimes they stumble into brilliance. Sometimes they just really love the word "flibbertigibbet". Both outcomes are valid.</p>
85
+ <blockquote>
86
+ <p>A model that rambles with confidence is a model that has found its voice. Even if that voice occasionally forgets the original question and starts talking about ducks.</p>
87
+ </blockquote>
88
+ <p>The jump to 1M parameters feels monumental. Not because the number is large in absolute terms - modern LLMs laugh at such digits - but because every extra parameter in tiny-model land is a hard-won victory. Each one earns its keep. Each one helps the model remember that yes, "ramble" is a verb and also a very good lifestyle choice.</p>
89
+ <p>And the fridge thing? That is the real flex. While others brag about GPU clusters, I am over here cooling my inference hardware with leftover pizza and hope. The model does not mind. It thrives in the chill. Perhaps it draws wisdom from the condensation.</p>
90
+ <p>So here is to my chatty little model. May your tokens flow freely. May your context window never fill too fast. May your fridge stay cold and your gradients stay stable. And may you never stop rambling, because honestly? It is delightful.</p>
91
+ <hr>
92
+ <p><em>Current status: Model just wrote 400 words about the philosophical implications of snack storage. Fridge humming contentedly. 1M parameters, zero regrets.</em></p>
93
+ </div>
94
+ </div>
95
+ </div>
96
+ </article>
97
+ </main>
98
+ <footer class="footer">
99
+ <div class="container">
100
+ <p class="footer-text">Built with curiosity over compute.</p>
101
+ <p class="footer-subtext">FMN-GPT by <a href="https://huggingface.co/CompactAI" target="_blank">CompactAI</a> - 2026</p>
102
+ </div>
103
+ </footer>
104
+ </body>
105
+ </html>
index.html CHANGED
@@ -544,7 +544,7 @@
544
  <span class="dot"></span>
545
  Training on RTX 5090
546
  </div>
547
- <h1>A ~1M Parameter Model<br>with <span class="highlight">64K Context</span></h1>
548
  <p>TinyMemoryLM is a character-level transformer that learns to remember things. Not because it smart, but because we gave it external memory. And a codebook. And MTP. It still forgets where it put its keys though.</p>
549
  <div class="hero-cta">
550
  <a href="status.html" class="btn btn-primary">View Training Status</a>
@@ -561,23 +561,23 @@
561
  <div class="spec-label">Parameters</div>
562
  </div>
563
  <div class="spec-card">
564
- <div class="spec-value">64K</div>
565
  <div class="spec-label">Context Length</div>
566
  </div>
567
  <div class="spec-card">
568
- <div class="spec-value">2</div>
569
  <div class="spec-label">Layers</div>
570
  </div>
571
  <div class="spec-card">
572
- <div class="spec-value">6</div>
573
  <div class="spec-label">Attention Heads</div>
574
  </div>
575
  <div class="spec-card">
576
- <div class="spec-value">192</div>
577
  <div class="spec-label">Model Dimension</div>
578
  </div>
579
  <div class="spec-card">
580
- <div class="spec-value">480</div>
581
  <div class="spec-label">FFN Dimension</div>
582
  </div>
583
  </div>
@@ -599,7 +599,7 @@
599
  <div class="feature-card">
600
  <div class="feature-icon">C</div>
601
  <h3>Precision Codebook</h3>
602
- <p>A 16-dimensional codebook at the output head. Instead of predicting directly into a ~500 token vocabulary, the model projects down to learnable codes that get mapped to the full vocabulary. Think of it as a compression layer that helps with efficiency while still producing readable output.</p>
603
  </div>
604
  <div class="feature-card">
605
  <div class="feature-icon">T</div>
@@ -646,21 +646,21 @@
646
  <div class="arch-layer">
647
  <div class="arch-box">
648
  <span>Output</span>
649
- <small>~500 vocab</small>
650
  </div>
651
  </div>
652
  <div class="arch-details">
653
  <div class="arch-detail">
654
  <span class="arch-detail-label">d_model</span>
655
- <span class="arch-detail-value">192</span>
656
  </div>
657
  <div class="arch-detail">
658
  <span class="arch-detail-label">heads</span>
659
- <span class="arch-detail-value">6</span>
660
  </div>
661
  <div class="arch-detail">
662
  <span class="arch-detail-label">ffn_dim</span>
663
- <span class="arch-detail-value">480</span>
664
  </div>
665
  <div class="arch-detail">
666
  <span class="arch-detail-label">memory_slots</span>
@@ -668,11 +668,11 @@
668
  </div>
669
  <div class="arch-detail">
670
  <span class="arch-detail-label">code_dim</span>
671
- <span class="arch-detail-value">16</span>
672
  </div>
673
  <div class="arch-detail">
674
  <span class="arch-detail-label">seq_len</span>
675
- <span class="arch-detail-value">64000</span>
676
  </div>
677
  </div>
678
  </div>
 
544
  <span class="dot"></span>
545
  Training on RTX 5090
546
  </div>
547
+ <h1>A ~1M Parameter Model<br>with <span class="highlight">2K Context</span></h1>
548
  <p>TinyMemoryLM is a character-level transformer that learns to remember things. Not because it smart, but because we gave it external memory. And a codebook. And MTP. It still forgets where it put its keys though.</p>
549
  <div class="hero-cta">
550
  <a href="status.html" class="btn btn-primary">View Training Status</a>
 
561
  <div class="spec-label">Parameters</div>
562
  </div>
563
  <div class="spec-card">
564
+ <div class="spec-value">2K</div>
565
  <div class="spec-label">Context Length</div>
566
  </div>
567
  <div class="spec-card">
568
+ <div class="spec-value">6</div>
569
  <div class="spec-label">Layers</div>
570
  </div>
571
  <div class="spec-card">
572
+ <div class="spec-value">4</div>
573
  <div class="spec-label">Attention Heads</div>
574
  </div>
575
  <div class="spec-card">
576
+ <div class="spec-value">160</div>
577
  <div class="spec-label">Model Dimension</div>
578
  </div>
579
  <div class="spec-card">
580
+ <div class="spec-value">256</div>
581
  <div class="spec-label">FFN Dimension</div>
582
  </div>
583
  </div>
 
599
  <div class="feature-card">
600
  <div class="feature-icon">C</div>
601
  <h3>Precision Codebook</h3>
602
+ <p>A 32-dimensional codebook at the output head. Instead of predicting directly into a ~2.1K token vocabulary, the model projects down to learnable codes that get mapped to the full vocabulary. Think of it as a compression layer that helps with efficiency while still producing readable output.</p>
603
  </div>
604
  <div class="feature-card">
605
  <div class="feature-icon">T</div>
 
646
  <div class="arch-layer">
647
  <div class="arch-box">
648
  <span>Output</span>
649
+ <small>~2.1K vocab</small>
650
  </div>
651
  </div>
652
  <div class="arch-details">
653
  <div class="arch-detail">
654
  <span class="arch-detail-label">d_model</span>
655
+ <span class="arch-detail-value">160</span>
656
  </div>
657
  <div class="arch-detail">
658
  <span class="arch-detail-label">heads</span>
659
+ <span class="arch-detail-value">4</span>
660
  </div>
661
  <div class="arch-detail">
662
  <span class="arch-detail-label">ffn_dim</span>
663
+ <span class="arch-detail-value">256</span>
664
  </div>
665
  <div class="arch-detail">
666
  <span class="arch-detail-label">memory_slots</span>
 
668
  </div>
669
  <div class="arch-detail">
670
  <span class="arch-detail-label">code_dim</span>
671
+ <span class="arch-detail-value">32</span>
672
  </div>
673
  <div class="arch-detail">
674
  <span class="arch-detail-label">seq_len</span>
675
+ <span class="arch-detail-value">2048</span>
676
  </div>
677
  </div>
678
  </div>
status.html CHANGED
@@ -476,23 +476,23 @@
476
  <div class="spec-label">Parameters</div>
477
  </div>
478
  <div class="spec-item">
479
- <div class="spec-value">64K</div>
480
  <div class="spec-label">Context</div>
481
  </div>
482
  <div class="spec-item">
483
- <div class="spec-value">2</div>
484
  <div class="spec-label">Layers</div>
485
  </div>
486
  <div class="spec-item">
487
- <div class="spec-value">6</div>
488
  <div class="spec-label">Heads</div>
489
  </div>
490
  <div class="spec-item">
491
- <div class="spec-value">192</div>
492
  <div class="spec-label">Dimension</div>
493
  </div>
494
  <div class="spec-item">
495
- <div class="spec-value">480</div>
496
  <div class="spec-label">FFN Dim</div>
497
  </div>
498
  </div>
@@ -518,7 +518,7 @@
518
  </div>
519
  <div class="feature-item">
520
  <span class="feature-name">Gradient Checkpointing</span>
521
- <span class="feature-status enabled">Enabled</span>
522
  </div>
523
  <div class="feature-item">
524
  <span class="feature-name">Torch Compile</span>
@@ -534,7 +534,7 @@
534
  </div>
535
  <div class="feature-item">
536
  <span class="feature-name">Repetition Penalty</span>
537
- <span class="feature-status enabled">1.1</span>
538
  </div>
539
  </div>
540
  </div>
 
476
  <div class="spec-label">Parameters</div>
477
  </div>
478
  <div class="spec-item">
479
+ <div class="spec-value">2K</div>
480
  <div class="spec-label">Context</div>
481
  </div>
482
  <div class="spec-item">
483
+ <div class="spec-value">6</div>
484
  <div class="spec-label">Layers</div>
485
  </div>
486
  <div class="spec-item">
487
+ <div class="spec-value">4</div>
488
  <div class="spec-label">Heads</div>
489
  </div>
490
  <div class="spec-item">
491
+ <div class="spec-value">160</div>
492
  <div class="spec-label">Dimension</div>
493
  </div>
494
  <div class="spec-item">
495
+ <div class="spec-value">256</div>
496
  <div class="spec-label">FFN Dim</div>
497
  </div>
498
  </div>
 
518
  </div>
519
  <div class="feature-item">
520
  <span class="feature-name">Gradient Checkpointing</span>
521
+ <span class="feature-status disabled">Disabled</span>
522
  </div>
523
  <div class="feature-item">
524
  <span class="feature-name">Torch Compile</span>
 
534
  </div>
535
  <div class="feature-item">
536
  <span class="feature-name">Repetition Penalty</span>
537
+ <span class="feature-status disabled">Disabled (1.0)</span>
538
  </div>
539
  </div>
540
  </div>