fchis commited on
Commit
3c5eda0
Β·
verified Β·
1 Parent(s): 2ba8e67

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +143 -5
README.md CHANGED
@@ -1,10 +1,148 @@
1
  ---
2
- title: Ontological Gap Code Generation
3
- emoji: 🐨
4
- colorFrom: yellow
5
- colorTo: gray
6
  sdk: static
7
  pinned: false
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: "The Ontological Gap: Why Error Type Matters More Than Error Count in AI Code Generation"
3
+ emoji: "πŸ”¬"
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: static
7
  pinned: false
8
+ license: mit
9
+ tags:
10
+ - code-generation
11
+ - fine-tuning
12
+ - ontology
13
+ - laravel
14
+ - php
15
+ - lora
16
  ---
17
 
18
+ # The Ontological Gap: Why Error Type Matters More Than Error Count in AI Code Generation
19
+
20
+ **Florinel Chis** Β· March 2026
21
+
22
+ ---
23
+
24
+ Code generation evaluation obsesses over **how often** models fail β€” pass@k, syntax validity, test pass rates. But there's a dimension nobody measures: **how** they fail.
25
+
26
+ We found that shifting from natural language prompts to structured JSON specifications didn't reduce our error count (5 β†’ 4). But it **fundamentally changed the error type** β€” from semantic hallucinations that require runtime debugging to mechanical gaps caught by a compiler in under 1 millisecond.
27
+
28
+ That shift matters more than the count.
29
+
30
+ ## The Setup
31
+
32
+ We fine-tuned [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) (4-bit quantized) with LoRA on an Apple M2 Pro with 16GB RAM to generate Laravel 13.x PHP code. Two pipelines, same model, same 3-app benchmark (26 PHP files):
33
+
34
+ | | Natural Language β†’ PHP | BuildSpec JSON β†’ PHP |
35
+ |--|--|--|
36
+ | Training examples | 308 | **49** |
37
+ | PHP syntax valid | 26/26 | 26/26 |
38
+ | Manual fixes needed | 5 | 4 |
39
+ | **Error type** | **Semantic hallucination** | **Mechanical/spec gap** |
40
+
41
+ The numbers look similar. The errors are completely different.
42
+
43
+ ## What Went Wrong with Natural Language
44
+
45
+ With NL prompts like *"Create a Post model with an author relationship and soft deletes"*, the model produced 5 bugs:
46
+
47
+ 1. **Invented a closure-based eager loading pattern** in EventController that doesn't exist in the codebase
48
+ 2. **Dropped a BelongsTo relationship** on Book model despite being explicitly asked for it
49
+ 3. **Used `->load(['user'])` on a model with no user relationship** β€” hallucinated a relationship from pretraining
50
+ 4. **Generated `->withHttpStatus()`** β€” a method that doesn't exist in Laravel
51
+ 5. **Missing JsonResource import** in SubscriberResource
52
+
53
+ Every one of these is a **semantic hallucination**: the model generated something that doesn't match the developer's intent, and the only way to catch it is to run the code and debug the failure.
54
+
55
+ ## What Went Wrong with BuildSpec
56
+
57
+ With structured specs like this:
58
+
59
+ ```json
60
+ {
61
+ "artifact": "model",
62
+ "class": "Book",
63
+ "table": "books",
64
+ "has_factory": true,
65
+ "fillable": ["title", "isbn", "year", "author_id"],
66
+ "relationships": [
67
+ {"type": "BelongsTo", "model": "Author", "method": "author"}
68
+ ]
69
+ }
70
+ ```
71
+
72
+ The model produced 4 bugs:
73
+
74
+ 1. **Used string-based `unique:books,isbn,...`** instead of `Rule::unique()->ignore()` β€” wrong PHP pattern for a correct concept
75
+ 2. **Excluded `author_id` from `Book::create()`** unnecessarily β€” wrong code pattern
76
+ 3. **Migration spec said `published_year`, model spec said `year`** β€” our spec was inconsistent
77
+ 4. **Factory included `max_attendees`** but migration didn't have that column β€” our test harness was wrong
78
+
79
+ **Zero semantic hallucinations.** The model never invented a relationship, never used a nonexistent method, never hallucinated a pattern. Bugs 1-2 are wrong *code patterns* for correct *concepts*. Bugs 3-4 are our own spec inconsistencies.
80
+
81
+ ## Why? The Ontological Gap
82
+
83
+ Here's our hypothesis:
84
+
85
+ > **Semantic hallucinations are caused by ontological misalignment.** The model has its own implicit "ontology" β€” a domain model learned from pretraining on millions of PHP files. When prompted with natural language, gaps in the prompt are filled from this implicit ontology. Where it diverges from the developer's intent, hallucinations occur.
86
+
87
+ The model's pretraining ontology says:
88
+ - "Models usually have a `user()` relationship"
89
+ - "Validation includes `'optional'`" (not a real Laravel rule)
90
+ - "Controllers use closure-based eager loading"
91
+
92
+ The developer's ontology says:
93
+ - "Posts belong to Authors, not Users"
94
+ - "Validation uses `'nullable'`"
95
+ - "Simple array-based eager loading suffices"
96
+
97
+ **BuildSpec closes this gap.** When you write `"relationships": [{"type": "BelongsTo", "model": "Author"}]`, there's no room for the model to substitute its own prior about what relationships a Post "should" have.
98
+
99
+ ## The Spec Compiler: An Ontological Reasoner
100
+
101
+ Before any code is generated, the spec compiler validates every spec in <1ms:
102
+
103
+ ```
104
+ $ python3 spec_compiler.py event_request.json
105
+
106
+ SpecCompileError: rules['venue_id'] contains conditional token
107
+ 'required_on_post'. Use 'conditional_rules' dict instead.
108
+ Example: {"conditional_rules": {"venue_id": {"POST": ["required"],
109
+ "PUT": ["sometimes"]}}}
110
+ ```
111
+
112
+ The compiler catches ontological violations β€” wrong field names, missing required properties, invalid constraint expressions β€” before the model ever sees the spec. Generation takes ~30 seconds per file. Validation takes <1ms. **Validate aggressively, generate only validated specs.**
113
+
114
+ ## Data Efficiency: 49 vs 308 Examples
115
+
116
+ The spec pipeline needed **6x fewer training examples** for equivalent results. Why?
117
+
118
+ With natural language, the model must learn two things:
119
+ 1. **What to generate** (the domain ontology β€” which entities, relationships, rules)
120
+ 2. **How to generate it** (the code mapping β€” PHP syntax, Laravel patterns)
121
+
122
+ With BuildSpec, the ontology is *given*. The model only learns the mapping. Half the learning problem is eliminated by making the input explicit.
123
+
124
+ ## The Takeaway
125
+
126
+ If you're building domain-specific code generation:
127
+
128
+ 1. **Measure error type, not just error count.** 4 mechanical bugs are better than 4 semantic hallucinations.
129
+ 2. **Make the developer's ontology explicit.** Structured specs remove the model's ability to hallucinate about *what* to generate.
130
+ 3. **Validate inputs, not just outputs.** A spec compiler catches errors 30,000x faster than generating code and running tests.
131
+ 4. **You need fewer examples than you think.** Structured input is more data-efficient because the model isn't learning domain concepts β€” just code patterns.
132
+
133
+ ## Try It
134
+
135
+ - **Model**: [fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec](https://huggingface.co/fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec)
136
+ - **Dataset**: [fchis/laravel-buildspec-training](https://huggingface.co/datasets/fchis/laravel-buildspec-training)
137
+ - **Code**: [github.com/florinel-chis/laravel-ai-gen](https://github.com/florinel-chis/laravel-ai-gen)
138
+
139
+ ```bash
140
+ pip install mlx-lm
141
+ python3 pipeline_spec.py "Create a REST API for managing blog posts with tags"
142
+ ```
143
+
144
+ Runs entirely on Apple Silicon. No cloud GPU needed.
145
+
146
+ ---
147
+
148
+ *This post summarizes findings from: "The Ontological Gap: Why Error Type Matters More Than Error Count in AI Code Generation" (Chis, 2026).*