fchis commited on
Commit
7f97afb
·
verified ·
1 Parent(s): 22b85f7

Upload index.html with huggingface_hub

Browse files
Files changed (1) hide show
  1. index.html +238 -19
index.html CHANGED
@@ -1,19 +1,238 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
19
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <style>
7
+ body {
8
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
9
+ max-width: 800px;
10
+ margin: 0 auto;
11
+ padding: 20px 40px;
12
+ line-height: 1.7;
13
+ color: #1a1a2e;
14
+ background: #fafafa;
15
+ }
16
+ h1 { font-size: 2em; margin-top: 1em; color: #0f0f23; }
17
+ h2 { font-size: 1.5em; margin-top: 1.5em; color: #16213e; border-bottom: 2px solid #e2e8f0; padding-bottom: 0.3em; }
18
+ h3 { font-size: 1.2em; color: #1a1a4e; }
19
+ code { background: #e8ecf1; padding: 2px 6px; border-radius: 3px; font-size: 0.9em; }
20
+ pre { background: #1e1e2e; color: #cdd6f4; padding: 16px; border-radius: 8px; overflow-x: auto; }
21
+ pre code { background: none; color: inherit; padding: 0; }
22
+ table { border-collapse: collapse; width: 100%; margin: 1em 0; }
23
+ th, td { border: 1px solid #d1d5db; padding: 8px 12px; text-align: left; }
24
+ th { background: #e8ecf1; font-weight: 600; }
25
+ tr:nth-child(even) { background: #f3f4f6; }
26
+ blockquote { border-left: 4px solid #6366f1; margin: 1em 0; padding: 0.5em 1em; background: #eef2ff; color: #312e81; }
27
+ a { color: #4f46e5; text-decoration: none; }
28
+ a:hover { text-decoration: underline; }
29
+ strong { color: #0f172a; }
30
+ hr { border: none; border-top: 2px solid #e2e8f0; margin: 2em 0; }
31
+ </style>
32
+ </head>
33
+ <body>
34
+ <h1>I Spent 34 Steps Building a Code Generator on My MacBook — Here's What Actually Worked</h1>
35
+ <p><strong>Florinel Chis</strong> · March 2026</p>
36
+ <hr />
37
+ <p>Most fine-tuning tutorials show you the happy path. This is the full path — including 6 training rounds that taught the model absolutely nothing, OOM crashes that killed my machine, and the realization that the real problem was never about the model.</p>
38
+ <p><strong>The end result:</strong> A Laravel PHP code generator that produces 26/26 valid PHP files with 20/20 Pest tests passing. Trained on 49 examples. Runs on an Apple M2 Pro with 16GB RAM. Total cloud GPU cost: $0.</p>
39
+ <p>Here's how I actually got there.</p>
40
+ <h2>The Hardware</h2>
41
+ <ul>
42
+ <li>Apple M2 Pro, 16GB unified memory</li>
43
+ <li>Qwen2.5-Coder-7B-Instruct, 4-bit quantized</li>
44
+ <li>MLX framework with LoRA</li>
45
+ <li>Target: Laravel 13.x PHP code generation</li>
46
+ </ul>
47
+ <p>The 16GB constraint shaped every architectural decision. You can't load two 7B models. You can't train with <code>max_seq_length=4096</code>. You close LM Studio before training or your machine crashes.</p>
48
+ <h2>Phase 1: Six Sprints of Nothing (The Silent Truncation Bug)</h2>
49
+ <p>I started with 90 training examples and grew to 261 across 6 sprints. <code>val_loss</code> kept dropping. By Sprint 6, it hit <strong>0.000</strong>. Perfect.</p>
50
+ <p>Except the generated code wasn't getting better. At all.</p>
51
+ <h3>The Root Cause</h3>
52
+ <p>The system prompt (guidelines for the model) had grown organically across sprints to <strong>2,380 tokens</strong>. My <code>max_seq_length</code> was <strong>1,500</strong>.</p>
53
+ <p>MLX truncates training examples silently at <code>max_seq_length</code>. Every single training example was cut off before the code completion even started. The model was being trained to predict its own system prompt — and it got really good at that (hence val_loss=0.000).</p>
54
+ <p><strong>Six sprints. Hundreds of examples. Zero code learning.</strong></p>
55
+ <h3>The Fix</h3>
56
+ <div class="codehilite"><pre><span></span><code><span class="c1"># BEFORE: 2380 tokens of verbose guidelines</span>
57
+ <span class="n">SYSTEM</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;You are an expert Laravel developer. When writing models,</span>
58
+ <span class="s2">always use the HasFactory trait. The HasFactory trait enables...</span>
59
+ <span class="s2">[2380 tokens of examples and explanations]&quot;&quot;&quot;</span>
60
+
61
+ <span class="c1"># AFTER: 843 tokens, compressed</span>
62
+ <span class="n">SYSTEM</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;Laravel 13.x code generator. Output ONLY PHP.</span>
63
+ <span class="s2">- model: use HasFactory, add relationships from spec</span>
64
+ <span class="s2">- controller: import Controller, destroy() returns noContent()</span>
65
+ <span class="s2">...&quot;&quot;&quot;</span>
66
+ </code></pre></div>
67
+
68
+ <p>And the verification I should have done from the start:</p>
69
+ <div class="codehilite"><pre><span></span><code><span class="c1"># Check that completions aren&#39;t truncated</span>
70
+ <span class="k">for</span> <span class="n">example</span> <span class="ow">in</span> <span class="n">dataset</span><span class="p">:</span>
71
+ <span class="n">tokens</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">example</span><span class="p">[</span><span class="s2">&quot;text&quot;</span><span class="p">])</span>
72
+ <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">max_seq_length</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;Truncated at </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span><span class="si">}</span><span class="s2"> tokens&quot;</span>
73
+ </code></pre></div>
74
+
75
+ <p><strong>Lesson: <code>val_loss=0.000</code> means nothing is being learned, not that everything is perfect. Always verify your training data reaches the completions.</strong></p>
76
+ <h2>Phase 2: Targeted Bug Fixing (The 10-15 Example Rule)</h2>
77
+ <p>After fixing the truncation bug, real training started. val_loss: 0.080 (not 0.000!).</p>
78
+ <p>I discovered that <strong>every systematic bug can be fixed with 10-15 targeted examples</strong>:</p>
79
+ <table>
80
+ <thead>
81
+ <tr>
82
+ <th>Bug</th>
83
+ <th style="text-align: center;">Examples needed</th>
84
+ <th>Result</th>
85
+ </tr>
86
+ </thead>
87
+ <tbody>
88
+ <tr>
89
+ <td><code>'optional'</code> validation rule (not a Laravel rule)</td>
90
+ <td style="text-align: center;">10</td>
91
+ <td>Fixed — generates <code>'nullable'</code></td>
92
+ </tr>
93
+ <tr>
94
+ <td><code>wasRecentlyCreated</code> in resources</td>
95
+ <td style="text-align: center;">5</td>
96
+ <td>Fixed — uses correct timestamps</td>
97
+ </tr>
98
+ <tr>
99
+ <td>Cross-resource missing imports</td>
100
+ <td style="text-align: center;">13</td>
101
+ <td>Fixed — 12 bugs → 0</td>
102
+ </tr>
103
+ <tr>
104
+ <td>Missing <code>HasFactory</code> trait</td>
105
+ <td style="text-align: center;">20 (fixed existing)</td>
106
+ <td>Fixed — 5 bugs → 0</td>
107
+ </tr>
108
+ </tbody>
109
+ </table>
110
+ <p>The model already knows PHP. You're nudging a trained distribution, not teaching from scratch. 10-15 diverse examples of the correct pattern is enough.</p>
111
+ <h3>The Eval Script Trap</h3>
112
+ <p>I built an automated bug checker. It flagged <code>StoreBookRequest $request</code> as "missing <code>Illuminate\Http\Request</code> import" because the regex <code>'Request $request'</code> matched as a substring.</p>
113
+ <p><strong>Test your eval script on correct code before trusting it.</strong></p>
114
+ <h3>Where I Hit the Wall</h3>
115
+ <p>After Sprint 9: 52/58 Pest tests passing. 6 failures remained. All were <strong>semantic hallucinations</strong>:</p>
116
+ <ul>
117
+ <li>Model invents a <code>user()</code> relationship that doesn't exist</li>
118
+ <li>Controller uses closure-based eager loading when array format is correct</li>
119
+ <li>Model generates <code>-&gt;withHttpStatus()</code> — a method that doesn't exist</li>
120
+ </ul>
121
+ <p>Adding more NL training examples didn't help. The model was filling prompt ambiguity with its pretraining priors. The problem wasn't the model — it was the input format.</p>
122
+ <h2>Phase 3: The Spec Pivot (The Real Breakthrough)</h2>
123
+ <p>Instead of natural language:</p>
124
+ <blockquote>
125
+ <p>"Create a Post model with author relationship, fillable title and body, soft deletes"</p>
126
+ </blockquote>
127
+ <p>I switched to structured JSON specs:</p>
128
+ <div class="codehilite"><pre><span></span><code><span class="p">{</span>
129
+ <span class="w"> </span><span class="nt">&quot;artifact&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;model&quot;</span><span class="p">,</span>
130
+ <span class="w"> </span><span class="nt">&quot;class&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;Post&quot;</span><span class="p">,</span>
131
+ <span class="w"> </span><span class="nt">&quot;table&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;posts&quot;</span><span class="p">,</span>
132
+ <span class="w"> </span><span class="nt">&quot;has_factory&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
133
+ <span class="w"> </span><span class="nt">&quot;soft_deletes&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
134
+ <span class="w"> </span><span class="nt">&quot;fillable&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;title&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;body&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;user_id&quot;</span><span class="p">],</span>
135
+ <span class="w"> </span><span class="nt">&quot;relationships&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
136
+ <span class="w"> </span><span class="p">{</span><span class="nt">&quot;type&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;BelongsTo&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;model&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;User&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;method&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;author&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;foreign_key&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;user_id&quot;</span><span class="p">}</span>
137
+ <span class="w"> </span><span class="p">]</span>
138
+ <span class="p">}</span>
139
+ </code></pre></div>
140
+
141
+ <h3>First test: 28 examples, 100 iterations</h3>
142
+ <p>Result: <strong>26/26 eval perfect. Zero semantic hallucinations.</strong> (Compare: 308 NL examples still had 5 hallucinations.)</p>
143
+ <p>The model can't invent a <code>user()</code> relationship if <code>relationships[]</code> explicitly lists only <code>author</code>. The spec removes the model's ability to hallucinate about <em>what</em> to generate. It only decides <em>how</em>.</p>
144
+ <h3>The Spec Compiler</h3>
145
+ <p>I built a compiler that validates specs before generation:</p>
146
+ <div class="codehilite"><pre><span></span><code>$<span class="w"> </span>python3<span class="w"> </span>spec_compiler.py<span class="w"> </span>bad_spec.json
147
+
148
+ SpecCompileError:<span class="w"> </span>rules<span class="o">[</span><span class="s1">&#39;venue_id&#39;</span><span class="o">]</span><span class="w"> </span>contains<span class="w"> </span>conditional<span class="w"> </span>token
149
+ <span class="s1">&#39;required_on_post&#39;</span>.<span class="w"> </span>Use<span class="w"> </span><span class="s1">&#39;conditional_rules&#39;</span><span class="w"> </span>dict<span class="w"> </span>instead.
150
+ </code></pre></div>
151
+
152
+ <p>Validation: &lt;1ms. Generation: ~30s per file. Catch errors early.</p>
153
+ <h3>Final Results: adapters_spec_v4</h3>
154
+ <table>
155
+ <thead>
156
+ <tr>
157
+ <th>Metric</th>
158
+ <th style="text-align: center;">NL Pipeline (308 ex)</th>
159
+ <th style="text-align: center;">Spec Pipeline (49 ex)</th>
160
+ </tr>
161
+ </thead>
162
+ <tbody>
163
+ <tr>
164
+ <td>PHP valid</td>
165
+ <td style="text-align: center;">26/26</td>
166
+ <td style="text-align: center;">26/26</td>
167
+ </tr>
168
+ <tr>
169
+ <td>Pest pass</td>
170
+ <td style="text-align: center;">52/58</td>
171
+ <td style="text-align: center;"><strong>20/20</strong></td>
172
+ </tr>
173
+ <tr>
174
+ <td>Manual fixes</td>
175
+ <td style="text-align: center;">5</td>
176
+ <td style="text-align: center;">4</td>
177
+ </tr>
178
+ <tr>
179
+ <td>Semantic hallucinations</td>
180
+ <td style="text-align: center;">5</td>
181
+ <td style="text-align: center;"><strong>0</strong></td>
182
+ </tr>
183
+ <tr>
184
+ <td>Training time</td>
185
+ <td style="text-align: center;">~30 min</td>
186
+ <td style="text-align: center;">~15 min</td>
187
+ </tr>
188
+ </tbody>
189
+ </table>
190
+ <h2>The Debugging Checklist</h2>
191
+ <p>Distilled from 34 steps of hitting walls:</p>
192
+ <p><strong>Before training:</strong>
193
+ 1. Tokenize ALL examples. Check <code>max(total_tokens) &lt; max_seq_length</code>
194
+ 2. Check <code>min(completion_tokens) &gt; 0</code>. If zero, system prompt is too long.
195
+ 3. Close all GPU-using processes. Check memory with <code>vm_stat</code>.
196
+ 4. Use <code>--num-layers 8</code> (not <code>--lora-layers 8</code>) on 16GB machines.</p>
197
+ <p><strong>After training:</strong>
198
+ 5. If <code>val_loss = 0.000</code>: training is broken, not perfect.
199
+ 6. Generate 3-5 test files and inspect manually before full benchmark.
200
+ 7. Run <code>php -l</code> on all output (syntax check).</p>
201
+ <p><strong>When bugs persist:</strong>
202
+ 8. Classify: is it a training data gap or a model capability limit?
203
+ 9. If data gap: write 10-15 targeted examples with diverse contexts.
204
+ 10. If capability limit: change the input format (structured specs).
205
+ 11. If hallucinations persist after targeted training: the problem is <strong>ontological</strong> — the model's pretraining domain model diverges from yours. Give it an explicit ontology (structured spec), don't fight with more NL examples.</p>
206
+ <h2>What 7B Models Do Well vs Poorly</h2>
207
+ <p><strong>Does well:</strong>
208
+ - Individual class generation with clear patterns
209
+ - PHP syntax (very rare errors after basic fine-tuning)
210
+ - Following explicit rules in the system prompt
211
+ - CRUD operations with a single model</p>
212
+ <p><strong>Does poorly:</strong>
213
+ - Multi-file consistency (imports across files)
214
+ - Knowing what NOT to add (hallucinated relationships)
215
+ - Distinguishing Laravel API versions (mixes 9.x and 13.x patterns)
216
+ - Complex relationship traversal</p>
217
+ <p><strong>The key insight:</strong> 7B models don't reason about code. They pattern-match against pretraining. Every persistent bug is a missing pattern. The fix is always: add examples. If that's not enough: change the input format to remove the decision from the model entirely.</p>
218
+ <h2>Try It Yourself</h2>
219
+ <p>Everything is open source:</p>
220
+ <ul>
221
+ <li><strong>Spec-trained model</strong>: <a href="https://huggingface.co/fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec">fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec</a></li>
222
+ <li><strong>Training data</strong>: <a href="https://huggingface.co/datasets/fchis/laravel-buildspec-training">fchis/laravel-buildspec-training</a> (49 examples)</li>
223
+ <li><strong>Full pipeline</strong>: <a href="https://github.com/florinel-chis/laravel-ai-gen">github.com/florinel-chis/laravel-ai-gen</a></li>
224
+ </ul>
225
+ <div class="codehilite"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>mlx-lm
226
+
227
+ <span class="c1"># Full pipeline: NL → specs → compile → PHP files</span>
228
+ python3<span class="w"> </span>pipeline_spec.py<span class="w"> </span><span class="s2">&quot;Create a REST API for managing blog posts with tags&quot;</span>
229
+
230
+ <span class="c1"># Or use a spec directly</span>
231
+ python3<span class="w"> </span>pipeline_spec.py<span class="w"> </span>--spec<span class="w"> </span>my_specs.json<span class="w"> </span>--output<span class="w"> </span>./generated
232
+ </code></pre></div>
233
+
234
+ <p>Runs entirely on Apple Silicon. M1/M2/M3/M4 with 16GB+ RAM.</p>
235
+ <hr />
236
+ <p><em>This post is an abbreviated version of: "From Hallucination to Ontology: 34 Steps Building a Domain-Specific Code Generator on Consumer Hardware" (Chis, 2026). The full paper with detailed results, bug taxonomy, and infrastructure lessons is available as a preprint.</em></p>
237
+ </body>
238
+ </html>