Spaces:
Running
Running
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,10 +1,222 @@
|
|
| 1 |
---
|
| 2 |
-
title: 34 Steps Code Generator
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: "I Spent 34 Steps Building a Code Generator on My MacBook — Here's What Actually Worked"
|
| 3 |
+
emoji: "🛠️"
|
| 4 |
+
colorFrom: green
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
+
license: mit
|
| 9 |
+
tags:
|
| 10 |
+
- code-generation
|
| 11 |
+
- fine-tuning
|
| 12 |
+
- mlx
|
| 13 |
+
- lora
|
| 14 |
+
- laravel
|
| 15 |
+
- php
|
| 16 |
+
- apple-silicon
|
| 17 |
+
- experience-report
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# I Spent 34 Steps Building a Code Generator on My MacBook — Here's What Actually Worked
|
| 21 |
+
|
| 22 |
+
**Florinel Chis** · March 2026
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
Most fine-tuning tutorials show you the happy path. This is the full path — including 6 training rounds that taught the model absolutely nothing, OOM crashes that killed my machine, and the realization that the real problem was never about the model.
|
| 27 |
+
|
| 28 |
+
**The end result:** A Laravel PHP code generator that produces 26/26 valid PHP files with 20/20 Pest tests passing. Trained on 49 examples. Runs on an Apple M2 Pro with 16GB RAM. Total cloud GPU cost: $0.
|
| 29 |
+
|
| 30 |
+
Here's how I actually got there.
|
| 31 |
+
|
| 32 |
+
## The Hardware
|
| 33 |
+
|
| 34 |
+
- Apple M2 Pro, 16GB unified memory
|
| 35 |
+
- Qwen2.5-Coder-7B-Instruct, 4-bit quantized
|
| 36 |
+
- MLX framework with LoRA
|
| 37 |
+
- Target: Laravel 13.x PHP code generation
|
| 38 |
+
|
| 39 |
+
The 16GB constraint shaped every architectural decision. You can't load two 7B models. You can't train with `max_seq_length=4096`. You close LM Studio before training or your machine crashes.
|
| 40 |
+
|
| 41 |
+
## Phase 1: Six Sprints of Nothing (The Silent Truncation Bug)
|
| 42 |
+
|
| 43 |
+
I started with 90 training examples and grew to 261 across 6 sprints. `val_loss` kept dropping. By Sprint 6, it hit **0.000**. Perfect.
|
| 44 |
+
|
| 45 |
+
Except the generated code wasn't getting better. At all.
|
| 46 |
+
|
| 47 |
+
### The Root Cause
|
| 48 |
+
|
| 49 |
+
The system prompt (guidelines for the model) had grown organically across sprints to **2,380 tokens**. My `max_seq_length` was **1,500**.
|
| 50 |
+
|
| 51 |
+
MLX truncates training examples silently at `max_seq_length`. Every single training example was cut off before the code completion even started. The model was being trained to predict its own system prompt — and it got really good at that (hence val_loss=0.000).
|
| 52 |
+
|
| 53 |
+
**Six sprints. Hundreds of examples. Zero code learning.**
|
| 54 |
+
|
| 55 |
+
### The Fix
|
| 56 |
+
|
| 57 |
+
```python
|
| 58 |
+
# BEFORE: 2380 tokens of verbose guidelines
|
| 59 |
+
SYSTEM = """You are an expert Laravel developer. When writing models,
|
| 60 |
+
always use the HasFactory trait. The HasFactory trait enables...
|
| 61 |
+
[2380 tokens of examples and explanations]"""
|
| 62 |
+
|
| 63 |
+
# AFTER: 843 tokens, compressed
|
| 64 |
+
SYSTEM = """Laravel 13.x code generator. Output ONLY PHP.
|
| 65 |
+
- model: use HasFactory, add relationships from spec
|
| 66 |
+
- controller: import Controller, destroy() returns noContent()
|
| 67 |
+
..."""
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
And the verification I should have done from the start:
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
# Check that completions aren't truncated
|
| 74 |
+
for example in dataset:
|
| 75 |
+
tokens = tokenizer.encode(example["text"])
|
| 76 |
+
assert len(tokens) < max_seq_length, f"Truncated at {len(tokens)} tokens"
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
**Lesson: `val_loss=0.000` means nothing is being learned, not that everything is perfect. Always verify your training data reaches the completions.**
|
| 80 |
+
|
| 81 |
+
## Phase 2: Targeted Bug Fixing (The 10-15 Example Rule)
|
| 82 |
+
|
| 83 |
+
After fixing the truncation bug, real training started. val_loss: 0.080 (not 0.000!).
|
| 84 |
+
|
| 85 |
+
I discovered that **every systematic bug can be fixed with 10-15 targeted examples**:
|
| 86 |
+
|
| 87 |
+
| Bug | Examples needed | Result |
|
| 88 |
+
|-----|:-:|---|
|
| 89 |
+
| `'optional'` validation rule (not a Laravel rule) | 10 | Fixed — generates `'nullable'` |
|
| 90 |
+
| `wasRecentlyCreated` in resources | 5 | Fixed — uses correct timestamps |
|
| 91 |
+
| Cross-resource missing imports | 13 | Fixed — 12 bugs → 0 |
|
| 92 |
+
| Missing `HasFactory` trait | 20 (fixed existing) | Fixed — 5 bugs → 0 |
|
| 93 |
+
|
| 94 |
+
The model already knows PHP. You're nudging a trained distribution, not teaching from scratch. 10-15 diverse examples of the correct pattern is enough.
|
| 95 |
+
|
| 96 |
+
### The Eval Script Trap
|
| 97 |
+
|
| 98 |
+
I built an automated bug checker. It flagged `StoreBookRequest $request` as "missing `Illuminate\Http\Request` import" because the regex `'Request $request'` matched as a substring.
|
| 99 |
+
|
| 100 |
+
**Test your eval script on correct code before trusting it.**
|
| 101 |
+
|
| 102 |
+
### Where I Hit the Wall
|
| 103 |
+
|
| 104 |
+
After Sprint 9: 52/58 Pest tests passing. 6 failures remained. All were **semantic hallucinations**:
|
| 105 |
+
|
| 106 |
+
- Model invents a `user()` relationship that doesn't exist
|
| 107 |
+
- Controller uses closure-based eager loading when array format is correct
|
| 108 |
+
- Model generates `->withHttpStatus()` — a method that doesn't exist
|
| 109 |
+
|
| 110 |
+
Adding more NL training examples didn't help. The model was filling prompt ambiguity with its pretraining priors. The problem wasn't the model — it was the input format.
|
| 111 |
+
|
| 112 |
+
## Phase 3: The Spec Pivot (The Real Breakthrough)
|
| 113 |
+
|
| 114 |
+
Instead of natural language:
|
| 115 |
+
|
| 116 |
+
> "Create a Post model with author relationship, fillable title and body, soft deletes"
|
| 117 |
+
|
| 118 |
+
I switched to structured JSON specs:
|
| 119 |
+
|
| 120 |
+
```json
|
| 121 |
+
{
|
| 122 |
+
"artifact": "model",
|
| 123 |
+
"class": "Post",
|
| 124 |
+
"table": "posts",
|
| 125 |
+
"has_factory": true,
|
| 126 |
+
"soft_deletes": true,
|
| 127 |
+
"fillable": ["title", "body", "user_id"],
|
| 128 |
+
"relationships": [
|
| 129 |
+
{"type": "BelongsTo", "model": "User", "method": "author", "foreign_key": "user_id"}
|
| 130 |
+
]
|
| 131 |
+
}
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
### First test: 28 examples, 100 iterations
|
| 135 |
+
|
| 136 |
+
Result: **26/26 eval perfect. Zero semantic hallucinations.** (Compare: 308 NL examples still had 5 hallucinations.)
|
| 137 |
+
|
| 138 |
+
The model can't invent a `user()` relationship if `relationships[]` explicitly lists only `author`. The spec removes the model's ability to hallucinate about *what* to generate. It only decides *how*.
|
| 139 |
+
|
| 140 |
+
### The Spec Compiler
|
| 141 |
+
|
| 142 |
+
I built a compiler that validates specs before generation:
|
| 143 |
+
|
| 144 |
+
```
|
| 145 |
+
$ python3 spec_compiler.py bad_spec.json
|
| 146 |
+
|
| 147 |
+
SpecCompileError: rules['venue_id'] contains conditional token
|
| 148 |
+
'required_on_post'. Use 'conditional_rules' dict instead.
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
Validation: <1ms. Generation: ~30s per file. Catch errors early.
|
| 152 |
+
|
| 153 |
+
### Final Results: adapters_spec_v4
|
| 154 |
+
|
| 155 |
+
| Metric | NL Pipeline (308 ex) | Spec Pipeline (49 ex) |
|
| 156 |
+
|--------|:---:|:---:|
|
| 157 |
+
| PHP valid | 26/26 | 26/26 |
|
| 158 |
+
| Pest pass | 52/58 | **20/20** |
|
| 159 |
+
| Manual fixes | 5 | 4 |
|
| 160 |
+
| Semantic hallucinations | 5 | **0** |
|
| 161 |
+
| Training time | ~30 min | ~15 min |
|
| 162 |
+
|
| 163 |
+
## The Debugging Checklist
|
| 164 |
+
|
| 165 |
+
Distilled from 34 steps of hitting walls:
|
| 166 |
+
|
| 167 |
+
**Before training:**
|
| 168 |
+
1. Tokenize ALL examples. Check `max(total_tokens) < max_seq_length`
|
| 169 |
+
2. Check `min(completion_tokens) > 0`. If zero, system prompt is too long.
|
| 170 |
+
3. Close all GPU-using processes. Check memory with `vm_stat`.
|
| 171 |
+
4. Use `--num-layers 8` (not `--lora-layers 8`) on 16GB machines.
|
| 172 |
+
|
| 173 |
+
**After training:**
|
| 174 |
+
5. If `val_loss = 0.000`: training is broken, not perfect.
|
| 175 |
+
6. Generate 3-5 test files and inspect manually before full benchmark.
|
| 176 |
+
7. Run `php -l` on all output (syntax check).
|
| 177 |
+
|
| 178 |
+
**When bugs persist:**
|
| 179 |
+
8. Classify: is it a training data gap or a model capability limit?
|
| 180 |
+
9. If data gap: write 10-15 targeted examples with diverse contexts.
|
| 181 |
+
10. If capability limit: change the input format (structured specs).
|
| 182 |
+
11. If hallucinations persist after targeted training: the problem is **ontological** — the model's pretraining domain model diverges from yours. Give it an explicit ontology (structured spec), don't fight with more NL examples.
|
| 183 |
+
|
| 184 |
+
## What 7B Models Do Well vs Poorly
|
| 185 |
+
|
| 186 |
+
**Does well:**
|
| 187 |
+
- Individual class generation with clear patterns
|
| 188 |
+
- PHP syntax (very rare errors after basic fine-tuning)
|
| 189 |
+
- Following explicit rules in the system prompt
|
| 190 |
+
- CRUD operations with a single model
|
| 191 |
+
|
| 192 |
+
**Does poorly:**
|
| 193 |
+
- Multi-file consistency (imports across files)
|
| 194 |
+
- Knowing what NOT to add (hallucinated relationships)
|
| 195 |
+
- Distinguishing Laravel API versions (mixes 9.x and 13.x patterns)
|
| 196 |
+
- Complex relationship traversal
|
| 197 |
+
|
| 198 |
+
**The key insight:** 7B models don't reason about code. They pattern-match against pretraining. Every persistent bug is a missing pattern. The fix is always: add examples. If that's not enough: change the input format to remove the decision from the model entirely.
|
| 199 |
+
|
| 200 |
+
## Try It Yourself
|
| 201 |
+
|
| 202 |
+
Everything is open source:
|
| 203 |
+
|
| 204 |
+
- **Spec-trained model**: [fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec](https://huggingface.co/fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec)
|
| 205 |
+
- **Training data**: [fchis/laravel-buildspec-training](https://huggingface.co/datasets/fchis/laravel-buildspec-training) (49 examples)
|
| 206 |
+
- **Full pipeline**: [github.com/florinel-chis/laravel-ai-gen](https://github.com/florinel-chis/laravel-ai-gen)
|
| 207 |
+
|
| 208 |
+
```bash
|
| 209 |
+
pip install mlx-lm
|
| 210 |
+
|
| 211 |
+
# Full pipeline: NL → specs → compile → PHP files
|
| 212 |
+
python3 pipeline_spec.py "Create a REST API for managing blog posts with tags"
|
| 213 |
+
|
| 214 |
+
# Or use a spec directly
|
| 215 |
+
python3 pipeline_spec.py --spec my_specs.json --output ./generated
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
+
Runs entirely on Apple Silicon. M1/M2/M3/M4 with 16GB+ RAM.
|
| 219 |
+
|
| 220 |
+
---
|
| 221 |
+
|
| 222 |
+
*This post is an abbreviated version of: "From Hallucination to Ontology: 34 Steps Building a Domain-Specific Code Generator on Consumer Hardware" (Chis, 2026). The full paper with detailed results, bug taxonomy, and infrastructure lessons is available as a preprint.*
|