fchis commited on
Commit
22b85f7
·
verified ·
1 Parent(s): d30942d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +216 -4
README.md CHANGED
@@ -1,10 +1,222 @@
1
  ---
2
- title: 34 Steps Code Generator
3
- emoji: 👀
4
- colorFrom: gray
5
  colorTo: yellow
6
  sdk: static
7
  pinned: false
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: "I Spent 34 Steps Building a Code Generator on My MacBook — Here's What Actually Worked"
3
+ emoji: "🛠️"
4
+ colorFrom: green
5
  colorTo: yellow
6
  sdk: static
7
  pinned: false
8
+ license: mit
9
+ tags:
10
+ - code-generation
11
+ - fine-tuning
12
+ - mlx
13
+ - lora
14
+ - laravel
15
+ - php
16
+ - apple-silicon
17
+ - experience-report
18
  ---
19
 
20
+ # I Spent 34 Steps Building a Code Generator on My MacBook — Here's What Actually Worked
21
+
22
+ **Florinel Chis** · March 2026
23
+
24
+ ---
25
+
26
+ Most fine-tuning tutorials show you the happy path. This is the full path — including 6 training rounds that taught the model absolutely nothing, OOM crashes that killed my machine, and the realization that the real problem was never about the model.
27
+
28
+ **The end result:** A Laravel PHP code generator that produces 26/26 valid PHP files with 20/20 Pest tests passing. Trained on 49 examples. Runs on an Apple M2 Pro with 16GB RAM. Total cloud GPU cost: $0.
29
+
30
+ Here's how I actually got there.
31
+
32
+ ## The Hardware
33
+
34
+ - Apple M2 Pro, 16GB unified memory
35
+ - Qwen2.5-Coder-7B-Instruct, 4-bit quantized
36
+ - MLX framework with LoRA
37
+ - Target: Laravel 13.x PHP code generation
38
+
39
+ The 16GB constraint shaped every architectural decision. You can't load two 7B models. You can't train with `max_seq_length=4096`. You close LM Studio before training or your machine crashes.
40
+
41
+ ## Phase 1: Six Sprints of Nothing (The Silent Truncation Bug)
42
+
43
+ I started with 90 training examples and grew to 261 across 6 sprints. `val_loss` kept dropping. By Sprint 6, it hit **0.000**. Perfect.
44
+
45
+ Except the generated code wasn't getting better. At all.
46
+
47
+ ### The Root Cause
48
+
49
+ The system prompt (guidelines for the model) had grown organically across sprints to **2,380 tokens**. My `max_seq_length` was **1,500**.
50
+
51
+ MLX truncates training examples silently at `max_seq_length`. Every single training example was cut off before the code completion even started. The model was being trained to predict its own system prompt — and it got really good at that (hence val_loss=0.000).
52
+
53
+ **Six sprints. Hundreds of examples. Zero code learning.**
54
+
55
+ ### The Fix
56
+
57
+ ```python
58
+ # BEFORE: 2380 tokens of verbose guidelines
59
+ SYSTEM = """You are an expert Laravel developer. When writing models,
60
+ always use the HasFactory trait. The HasFactory trait enables...
61
+ [2380 tokens of examples and explanations]"""
62
+
63
+ # AFTER: 843 tokens, compressed
64
+ SYSTEM = """Laravel 13.x code generator. Output ONLY PHP.
65
+ - model: use HasFactory, add relationships from spec
66
+ - controller: import Controller, destroy() returns noContent()
67
+ ..."""
68
+ ```
69
+
70
+ And the verification I should have done from the start:
71
+
72
+ ```python
73
+ # Check that completions aren't truncated
74
+ for example in dataset:
75
+ tokens = tokenizer.encode(example["text"])
76
+ assert len(tokens) < max_seq_length, f"Truncated at {len(tokens)} tokens"
77
+ ```
78
+
79
+ **Lesson: `val_loss=0.000` means nothing is being learned, not that everything is perfect. Always verify your training data reaches the completions.**
80
+
81
+ ## Phase 2: Targeted Bug Fixing (The 10-15 Example Rule)
82
+
83
+ After fixing the truncation bug, real training started. val_loss: 0.080 (not 0.000!).
84
+
85
+ I discovered that **every systematic bug can be fixed with 10-15 targeted examples**:
86
+
87
+ | Bug | Examples needed | Result |
88
+ |-----|:-:|---|
89
+ | `'optional'` validation rule (not a Laravel rule) | 10 | Fixed — generates `'nullable'` |
90
+ | `wasRecentlyCreated` in resources | 5 | Fixed — uses correct timestamps |
91
+ | Cross-resource missing imports | 13 | Fixed — 12 bugs → 0 |
92
+ | Missing `HasFactory` trait | 20 (fixed existing) | Fixed — 5 bugs → 0 |
93
+
94
+ The model already knows PHP. You're nudging a trained distribution, not teaching from scratch. 10-15 diverse examples of the correct pattern is enough.
95
+
96
+ ### The Eval Script Trap
97
+
98
+ I built an automated bug checker. It flagged `StoreBookRequest $request` as "missing `Illuminate\Http\Request` import" because the regex `'Request $request'` matched as a substring.
99
+
100
+ **Test your eval script on correct code before trusting it.**
101
+
102
+ ### Where I Hit the Wall
103
+
104
+ After Sprint 9: 52/58 Pest tests passing. 6 failures remained. All were **semantic hallucinations**:
105
+
106
+ - Model invents a `user()` relationship that doesn't exist
107
+ - Controller uses closure-based eager loading when array format is correct
108
+ - Model generates `->withHttpStatus()` — a method that doesn't exist
109
+
110
+ Adding more NL training examples didn't help. The model was filling prompt ambiguity with its pretraining priors. The problem wasn't the model — it was the input format.
111
+
112
+ ## Phase 3: The Spec Pivot (The Real Breakthrough)
113
+
114
+ Instead of natural language:
115
+
116
+ > "Create a Post model with author relationship, fillable title and body, soft deletes"
117
+
118
+ I switched to structured JSON specs:
119
+
120
+ ```json
121
+ {
122
+ "artifact": "model",
123
+ "class": "Post",
124
+ "table": "posts",
125
+ "has_factory": true,
126
+ "soft_deletes": true,
127
+ "fillable": ["title", "body", "user_id"],
128
+ "relationships": [
129
+ {"type": "BelongsTo", "model": "User", "method": "author", "foreign_key": "user_id"}
130
+ ]
131
+ }
132
+ ```
133
+
134
+ ### First test: 28 examples, 100 iterations
135
+
136
+ Result: **26/26 eval perfect. Zero semantic hallucinations.** (Compare: 308 NL examples still had 5 hallucinations.)
137
+
138
+ The model can't invent a `user()` relationship if `relationships[]` explicitly lists only `author`. The spec removes the model's ability to hallucinate about *what* to generate. It only decides *how*.
139
+
140
+ ### The Spec Compiler
141
+
142
+ I built a compiler that validates specs before generation:
143
+
144
+ ```
145
+ $ python3 spec_compiler.py bad_spec.json
146
+
147
+ SpecCompileError: rules['venue_id'] contains conditional token
148
+ 'required_on_post'. Use 'conditional_rules' dict instead.
149
+ ```
150
+
151
+ Validation: <1ms. Generation: ~30s per file. Catch errors early.
152
+
153
+ ### Final Results: adapters_spec_v4
154
+
155
+ | Metric | NL Pipeline (308 ex) | Spec Pipeline (49 ex) |
156
+ |--------|:---:|:---:|
157
+ | PHP valid | 26/26 | 26/26 |
158
+ | Pest pass | 52/58 | **20/20** |
159
+ | Manual fixes | 5 | 4 |
160
+ | Semantic hallucinations | 5 | **0** |
161
+ | Training time | ~30 min | ~15 min |
162
+
163
+ ## The Debugging Checklist
164
+
165
+ Distilled from 34 steps of hitting walls:
166
+
167
+ **Before training:**
168
+ 1. Tokenize ALL examples. Check `max(total_tokens) < max_seq_length`
169
+ 2. Check `min(completion_tokens) > 0`. If zero, system prompt is too long.
170
+ 3. Close all GPU-using processes. Check memory with `vm_stat`.
171
+ 4. Use `--num-layers 8` (not `--lora-layers 8`) on 16GB machines.
172
+
173
+ **After training:**
174
+ 5. If `val_loss = 0.000`: training is broken, not perfect.
175
+ 6. Generate 3-5 test files and inspect manually before full benchmark.
176
+ 7. Run `php -l` on all output (syntax check).
177
+
178
+ **When bugs persist:**
179
+ 8. Classify: is it a training data gap or a model capability limit?
180
+ 9. If data gap: write 10-15 targeted examples with diverse contexts.
181
+ 10. If capability limit: change the input format (structured specs).
182
+ 11. If hallucinations persist after targeted training: the problem is **ontological** — the model's pretraining domain model diverges from yours. Give it an explicit ontology (structured spec), don't fight with more NL examples.
183
+
184
+ ## What 7B Models Do Well vs Poorly
185
+
186
+ **Does well:**
187
+ - Individual class generation with clear patterns
188
+ - PHP syntax (very rare errors after basic fine-tuning)
189
+ - Following explicit rules in the system prompt
190
+ - CRUD operations with a single model
191
+
192
+ **Does poorly:**
193
+ - Multi-file consistency (imports across files)
194
+ - Knowing what NOT to add (hallucinated relationships)
195
+ - Distinguishing Laravel API versions (mixes 9.x and 13.x patterns)
196
+ - Complex relationship traversal
197
+
198
+ **The key insight:** 7B models don't reason about code. They pattern-match against pretraining. Every persistent bug is a missing pattern. The fix is always: add examples. If that's not enough: change the input format to remove the decision from the model entirely.
199
+
200
+ ## Try It Yourself
201
+
202
+ Everything is open source:
203
+
204
+ - **Spec-trained model**: [fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec](https://huggingface.co/fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec)
205
+ - **Training data**: [fchis/laravel-buildspec-training](https://huggingface.co/datasets/fchis/laravel-buildspec-training) (49 examples)
206
+ - **Full pipeline**: [github.com/florinel-chis/laravel-ai-gen](https://github.com/florinel-chis/laravel-ai-gen)
207
+
208
+ ```bash
209
+ pip install mlx-lm
210
+
211
+ # Full pipeline: NL → specs → compile → PHP files
212
+ python3 pipeline_spec.py "Create a REST API for managing blog posts with tags"
213
+
214
+ # Or use a spec directly
215
+ python3 pipeline_spec.py --spec my_specs.json --output ./generated
216
+ ```
217
+
218
+ Runs entirely on Apple Silicon. M1/M2/M3/M4 with 16GB+ RAM.
219
+
220
+ ---
221
+
222
+ *This post is an abbreviated version of: "From Hallucination to Ontology: 34 Steps Building a Domain-Specific Code Generator on Consumer Hardware" (Chis, 2026). The full paper with detailed results, bug taxonomy, and infrastructure lessons is available as a preprint.*