Spaces:

fchis
/

34-steps-code-generator

Running

App Files Files Community

fchis commited on Mar 28

Commit

7f97afb

verified ·

1 Parent(s): 22b85f7

Upload index.html with huggingface_hub

Browse files

Files changed (1) hide show

index.html +238 -19

index.html CHANGED Viewed

@@ -1,19 +1,238 @@
-<!doctype html>
-<html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div class="card">
-			<h1>Welcome to your static Space!</h1>
-			<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
-			<p>
-				Also don't forget to check the
-				<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
-			</p>
-		</div>
-	</body>
-</html>

+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<style>
+body {
+    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
+    max-width: 800px;
+    margin: 0 auto;
+    padding: 20px 40px;
+    line-height: 1.7;
+    color: #1a1a2e;
+    background: #fafafa;
+}
+h1 { font-size: 2em; margin-top: 1em; color: #0f0f23; }
+h2 { font-size: 1.5em; margin-top: 1.5em; color: #16213e; border-bottom: 2px solid #e2e8f0; padding-bottom: 0.3em; }
+h3 { font-size: 1.2em; color: #1a1a4e; }
+code { background: #e8ecf1; padding: 2px 6px; border-radius: 3px; font-size: 0.9em; }
+pre { background: #1e1e2e; color: #cdd6f4; padding: 16px; border-radius: 8px; overflow-x: auto; }
+pre code { background: none; color: inherit; padding: 0; }
+table { border-collapse: collapse; width: 100%; margin: 1em 0; }
+th, td { border: 1px solid #d1d5db; padding: 8px 12px; text-align: left; }
+th { background: #e8ecf1; font-weight: 600; }
+tr:nth-child(even) { background: #f3f4f6; }
+blockquote { border-left: 4px solid #6366f1; margin: 1em 0; padding: 0.5em 1em; background: #eef2ff; color: #312e81; }
+a { color: #4f46e5; text-decoration: none; }
+a:hover { text-decoration: underline; }
+strong { color: #0f172a; }
+hr { border: none; border-top: 2px solid #e2e8f0; margin: 2em 0; }
+</style>
+</head>
+<body>
+<h1>I Spent 34 Steps Building a Code Generator on My MacBook — Here's What Actually Worked</h1>
+<p><strong>Florinel Chis</strong> · March 2026</p>
+<hr />
+<p>Most fine-tuning tutorials show you the happy path. This is the full path — including 6 training rounds that taught the model absolutely nothing, OOM crashes that killed my machine, and the realization that the real problem was never about the model.</p>
+<p><strong>The end result:</strong> A Laravel PHP code generator that produces 26/26 valid PHP files with 20/20 Pest tests passing. Trained on 49 examples. Runs on an Apple M2 Pro with 16GB RAM. Total cloud GPU cost: $0.</p>
+<p>Here's how I actually got there.</p>
+<h2>The Hardware</h2>
+<ul>
+<li>Apple M2 Pro, 16GB unified memory</li>
+<li>Qwen2.5-Coder-7B-Instruct, 4-bit quantized</li>
+<li>MLX framework with LoRA</li>
+<li>Target: Laravel 13.x PHP code generation</li>
+</ul>
+<p>The 16GB constraint shaped every architectural decision. You can't load two 7B models. You can't train with <code>max_seq_length=4096</code>. You close LM Studio before training or your machine crashes.</p>
+<h2>Phase 1: Six Sprints of Nothing (The Silent Truncation Bug)</h2>
+<p>I started with 90 training examples and grew to 261 across 6 sprints. <code>val_loss</code> kept dropping. By Sprint 6, it hit <strong>0.000</strong>. Perfect.</p>
+<p>Except the generated code wasn't getting better. At all.</p>
+<h3>The Root Cause</h3>
+<p>The system prompt (guidelines for the model) had grown organically across sprints to <strong>2,380 tokens</strong>. My <code>max_seq_length</code> was <strong>1,500</strong>.</p>
+<p>MLX truncates training examples silently at <code>max_seq_length</code>. Every single training example was cut off before the code completion even started. The model was being trained to predict its own system prompt — and it got really good at that (hence val_loss=0.000).</p>
+<p><strong>Six sprints. Hundreds of examples. Zero code learning.</strong></p>
+<h3>The Fix</h3>
+<div class="codehilite"><pre><span></span><code><span class="c1"># BEFORE: 2380 tokens of verbose guidelines</span>
+<span class="n">SYSTEM</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;You are an expert Laravel developer. When writing models,</span>
+<span class="s2">always use the HasFactory trait. The HasFactory trait enables...</span>
+<span class="s2">[2380 tokens of examples and explanations]&quot;&quot;&quot;</span>
+<span class="c1"># AFTER: 843 tokens, compressed</span>
+<span class="n">SYSTEM</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;Laravel 13.x code generator. Output ONLY PHP.</span>
+<span class="s2">- model: use HasFactory, add relationships from spec</span>
+<span class="s2">- controller: import Controller, destroy() returns noContent()</span>
+<span class="s2">...&quot;&quot;&quot;</span>
+</code></pre></div>
+<p>And the verification I should have done from the start:</p>
+<div class="codehilite"><pre><span></span><code><span class="c1"># Check that completions aren&#39;t truncated</span>
+<span class="k">for</span> <span class="n">example</span> <span class="ow">in</span> <span class="n">dataset</span><span class="p">:</span>
+    <span class="n">tokens</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">example</span><span class="p">[</span><span class="s2">&quot;text&quot;</span><span class="p">])</span>
+    <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">max_seq_length</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;Truncated at </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span><span class="si">}</span><span class="s2"> tokens&quot;</span>
+</code></pre></div>
+<p><strong>Lesson: <code>val_loss=0.000</code> means nothing is being learned, not that everything is perfect. Always verify your training data reaches the completions.</strong></p>
+<h2>Phase 2: Targeted Bug Fixing (The 10-15 Example Rule)</h2>
+<p>After fixing the truncation bug, real training started. val_loss: 0.080 (not 0.000!).</p>
+<p>I discovered that <strong>every systematic bug can be fixed with 10-15 targeted examples</strong>:</p>
+<table>
+<thead>
+<tr>
+<th>Bug</th>
+<th style="text-align: center;">Examples needed</th>
+<th>Result</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>'optional'</code> validation rule (not a Laravel rule)</td>
+<td style="text-align: center;">10</td>
+<td>Fixed — generates <code>'nullable'</code></td>
+</tr>
+<tr>
+<td><code>wasRecentlyCreated</code> in resources</td>
+<td style="text-align: center;">5</td>
+<td>Fixed — uses correct timestamps</td>
+</tr>
+<tr>
+<td>Cross-resource missing imports</td>
+<td style="text-align: center;">13</td>
+<td>Fixed — 12 bugs → 0</td>
+</tr>
+<tr>
+<td>Missing <code>HasFactory</code> trait</td>
+<td style="text-align: center;">20 (fixed existing)</td>
+<td>Fixed — 5 bugs → 0</td>
+</tr>
+</tbody>
+</table>
+<p>The model already knows PHP. You're nudging a trained distribution, not teaching from scratch. 10-15 diverse examples of the correct pattern is enough.</p>
+<h3>The Eval Script Trap</h3>
+<p>I built an automated bug checker. It flagged <code>StoreBookRequest $request</code> as "missing <code>Illuminate\Http\Request</code> import" because the regex <code>'Request $request'</code> matched as a substring.</p>
+<p><strong>Test your eval script on correct code before trusting it.</strong></p>
+<h3>Where I Hit the Wall</h3>
+<p>After Sprint 9: 52/58 Pest tests passing. 6 failures remained. All were <strong>semantic hallucinations</strong>:</p>
+<ul>
+<li>Model invents a <code>user()</code> relationship that doesn't exist</li>
+<li>Controller uses closure-based eager loading when array format is correct</li>
+<li>Model generates <code>-&gt;withHttpStatus()</code> — a method that doesn't exist</li>
+</ul>
+<p>Adding more NL training examples didn't help. The model was filling prompt ambiguity with its pretraining priors. The problem wasn't the model — it was the input format.</p>
+<h2>Phase 3: The Spec Pivot (The Real Breakthrough)</h2>
+<p>Instead of natural language:</p>
+<blockquote>
+<p>"Create a Post model with author relationship, fillable title and body, soft deletes"</p>
+</blockquote>
+<p>I switched to structured JSON specs:</p>
+<div class="codehilite"><pre><span></span><code><span class="p">{</span>
+<span class="w">  </span><span class="nt">&quot;artifact&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;model&quot;</span><span class="p">,</span>
+<span class="w">  </span><span class="nt">&quot;class&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;Post&quot;</span><span class="p">,</span>
+<span class="w">  </span><span class="nt">&quot;table&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;posts&quot;</span><span class="p">,</span>
+<span class="w">  </span><span class="nt">&quot;has_factory&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
+<span class="w">  </span><span class="nt">&quot;soft_deletes&quot;</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
+<span class="w">  </span><span class="nt">&quot;fillable&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;title&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;body&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;user_id&quot;</span><span class="p">],</span>
+<span class="w">  </span><span class="nt">&quot;relationships&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
+<span class="w">    </span><span class="p">{</span><span class="nt">&quot;type&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;BelongsTo&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;model&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;User&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;method&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;author&quot;</span><span class="p">,</span><span class="w"> </span><span class="nt">&quot;foreign_key&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;user_id&quot;</span><span class="p">}</span>
+<span class="w">  </span><span class="p">]</span>
+<span class="p">}</span>
+</code></pre></div>
+<h3>First test: 28 examples, 100 iterations</h3>
+<p>Result: <strong>26/26 eval perfect. Zero semantic hallucinations.</strong> (Compare: 308 NL examples still had 5 hallucinations.)</p>
+<p>The model can't invent a <code>user()</code> relationship if <code>relationships[]</code> explicitly lists only <code>author</code>. The spec removes the model's ability to hallucinate about <em>what</em> to generate. It only decides <em>how</em>.</p>
+<h3>The Spec Compiler</h3>
+<p>I built a compiler that validates specs before generation:</p>
+<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>python3<span class="w"> </span>spec_compiler.py<span class="w"> </span>bad_spec.json
+SpecCompileError:<span class="w"> </span>rules<span class="o">[</span><span class="s1">&#39;venue_id&#39;</span><span class="o">]</span><span class="w"> </span>contains<span class="w"> </span>conditional<span class="w"> </span>token
+<span class="s1">&#39;required_on_post&#39;</span>.<span class="w"> </span>Use<span class="w"> </span><span class="s1">&#39;conditional_rules&#39;</span><span class="w"> </span>dict<span class="w"> </span>instead.
+</code></pre></div>
+<p>Validation: &lt;1ms. Generation: ~30s per file. Catch errors early.</p>
+<h3>Final Results: adapters_spec_v4</h3>
+<table>
+<thead>
+<tr>
+<th>Metric</th>
+<th style="text-align: center;">NL Pipeline (308 ex)</th>
+<th style="text-align: center;">Spec Pipeline (49 ex)</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PHP valid</td>
+<td style="text-align: center;">26/26</td>
+<td style="text-align: center;">26/26</td>
+</tr>
+<tr>
+<td>Pest pass</td>
+<td style="text-align: center;">52/58</td>
+<td style="text-align: center;"><strong>20/20</strong></td>
+</tr>
+<tr>
+<td>Manual fixes</td>
+<td style="text-align: center;">5</td>
+<td style="text-align: center;">4</td>
+</tr>
+<tr>
+<td>Semantic hallucinations</td>
+<td style="text-align: center;">5</td>
+<td style="text-align: center;"><strong>0</strong></td>
+</tr>
+<tr>
+<td>Training time</td>
+<td style="text-align: center;">~30 min</td>
+<td style="text-align: center;">~15 min</td>
+</tr>
+</tbody>
+</table>
+<h2>The Debugging Checklist</h2>
+<p>Distilled from 34 steps of hitting walls:</p>
+<p><strong>Before training:</strong>
+1. Tokenize ALL examples. Check <code>max(total_tokens) &lt; max_seq_length</code>
+2. Check <code>min(completion_tokens) &gt; 0</code>. If zero, system prompt is too long.
+3. Close all GPU-using processes. Check memory with <code>vm_stat</code>.
+4. Use <code>--num-layers 8</code> (not <code>--lora-layers 8</code>) on 16GB machines.</p>
+<p><strong>After training:</strong>
+5. If <code>val_loss = 0.000</code>: training is broken, not perfect.
+6. Generate 3-5 test files and inspect manually before full benchmark.
+7. Run <code>php -l</code> on all output (syntax check).</p>
+<p><strong>When bugs persist:</strong>
+8. Classify: is it a training data gap or a model capability limit?
+9. If data gap: write 10-15 targeted examples with diverse contexts.
+10. If capability limit: change the input format (structured specs).
+11. If hallucinations persist after targeted training: the problem is <strong>ontological</strong> — the model's pretraining domain model diverges from yours. Give it an explicit ontology (structured spec), don't fight with more NL examples.</p>
+<h2>What 7B Models Do Well vs Poorly</h2>
+<p><strong>Does well:</strong>
+- Individual class generation with clear patterns
+- PHP syntax (very rare errors after basic fine-tuning)
+- Following explicit rules in the system prompt
+- CRUD operations with a single model</p>
+<p><strong>Does poorly:</strong>
+- Multi-file consistency (imports across files)
+- Knowing what NOT to add (hallucinated relationships)
+- Distinguishing Laravel API versions (mixes 9.x and 13.x patterns)
+- Complex relationship traversal</p>
+<p><strong>The key insight:</strong> 7B models don't reason about code. They pattern-match against pretraining. Every persistent bug is a missing pattern. The fix is always: add examples. If that's not enough: change the input format to remove the decision from the model entirely.</p>
+<h2>Try It Yourself</h2>
+<p>Everything is open source:</p>
+<ul>
+<li><strong>Spec-trained model</strong>: <a href="https://huggingface.co/fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec">fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec</a></li>
+<li><strong>Training data</strong>: <a href="https://huggingface.co/datasets/fchis/laravel-buildspec-training">fchis/laravel-buildspec-training</a> (49 examples)</li>
+<li><strong>Full pipeline</strong>: <a href="https://github.com/florinel-chis/laravel-ai-gen">github.com/florinel-chis/laravel-ai-gen</a></li>
+</ul>
+<div class="codehilite"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>mlx-lm
+<span class="c1"># Full pipeline: NL → specs → compile → PHP files</span>
+python3<span class="w"> </span>pipeline_spec.py<span class="w"> </span><span class="s2">&quot;Create a REST API for managing blog posts with tags&quot;</span>
+<span class="c1"># Or use a spec directly</span>
+python3<span class="w"> </span>pipeline_spec.py<span class="w"> </span>--spec<span class="w"> </span>my_specs.json<span class="w"> </span>--output<span class="w"> </span>./generated
+</code></pre></div>
+<p>Runs entirely on Apple Silicon. M1/M2/M3/M4 with 16GB+ RAM.</p>
+<hr />
+<p><em>This post is an abbreviated version of: "From Hallucination to Ontology: 34 Steps Building a Domain-Specific Code Generator on Consumer Hardware" (Chis, 2026). The full paper with detailed results, bug taxonomy, and infrastructure lessons is available as a preprint.</em></p>
+</body>
+</html>