rockypod commited on
Commit
44044f7
Β·
verified Β·
1 Parent(s): 25a2f9f

Update README for v3.1 dual-size release: surface 8B (100.00%) and 4B (99.31%) branches; correct param counts (14.8B / 8.2B / 4.0B); document Q87 grader patch

Browse files
Files changed (1) hide show
  1. README.md +105 -130
README.md CHANGED
@@ -4,7 +4,10 @@ license_name: neotoi-coder-community-license
4
  language:
5
  - en
6
  - vi
7
- base_model: Qwen/Qwen3-Coder-14B
 
 
 
8
  tags:
9
  - dioxus
10
  - rust
@@ -14,115 +17,106 @@ tags:
14
  - raft
15
  - code
16
  - server-functions
 
 
17
  pipeline_tag: text-generation
18
  ---
19
 
20
- # Neotoi Coder v3.1
21
 
22
- A Rust/Dioxus 0.7 specialist fine-tuned from Qwen3-Coder-14B using RAFT
23
- (Retrieval-Augmented Fine-Tuning). v3.1 closes the T2 RSX regression
24
- that shipped in v3.0 and broadens coverage into DaisyUI, deeper signals,
25
- router patterns, and async/server-function composition.
26
 
27
- ## What's New in v3.1
28
 
29
- - **T1 Fundamentals β†’ 100%** (+8.3 pts vs v3.0)
30
- - **T6 Hard Reasoning β†’ 100%** (+25 pts vs v3.0, clean sweep)
31
- - **T8 GlobalSignal/i18n β†’ 100%** (+12.5 pts)
32
- - **T9 Static Navigator β†’ 100%** (held perfect)
33
- - **T10 Dioxus 0.7.4 β†’ 100%** (+16.7 pts)
34
- - **New dataset topics:**
35
- - **T39** β€” v3.0 exam-gap corrections
36
- - **T40** β€” DaisyUI 5 component coverage on Tailwind v4
37
- - **T41** β€” Signals deep-dive (`use_signal`, `Signal<T>`, `GlobalSignal`,
38
- `.peek()`, `.write()`, `ReadOnlySignal`, signal composition)
39
- - **T42** β€” Router patterns (`#[derive(Routable)]`, nested routes,
40
- layout routes, route guards, query parameters)
41
- - **T43** β€” Async / server-function composition (`use_resource`
42
- three-arm match, cancellation, streaming, `ServerFnError`)
43
- - **Dataset:** **4,880 curated examples across 43 topics** (up from 4,535)
44
-
45
- ## Exam Results
46
-
47
- ### v3.1 β€” 103 Question Weighted Exam
48
-
49
- | Tier | Questions | Weight | Score | Max | Status |
50
  |---|---|---|---|---|---|
51
- | T1 Fundamentals | Q1–12 | 1.0 | 12.0/12 | 12 | βœ… Perfect |
52
- | T2 RSX Syntax | Q13–24 | 1.0 | 10.0/12 | 12 | ⚠️ 83.3% |
53
- | T3 Signal Hygiene | Q25–36 | 1.0 | 11.0/12 | 12 | βœ… 91.7% |
54
- | T4 WCAG/ARIA | Q37–50 | 1.5 | 16.5/21 | 21 | ⚠️ 78.6% |
55
- | T5 use_resource | Q51–58 | 1.5 | 12.0/12 | 12 | βœ… Perfect |
56
- | T6 Hard Reasoning | Q59–68 | 2.0 | 20.0/20 | 20 | βœ… Perfect |
57
- | T7 Primitives+CSS | Q69–80 | 1.5 | 18.0/18 | 18 | βœ… Perfect |
58
- | T8 GlobalSignal/i18n | Q81–88 | 1.5 | 12.0/12 | 12 | βœ… Perfect |
59
- | T9 Static Navigator | Q89–94 | 1.5 | 9.0/9 | 9 | βœ… Perfect |
60
- | T10 Dioxus 0.7.4 | Q95–100 | 2.0 | 12.0/12 | 12 | βœ… Perfect |
61
- | T11 Server Functions | Q101–103 | 1.5 | 4.5/4.5 | 4.5 | βœ… Perfect |
62
- | **Overall** | **Q1–103** | | **137.0/144.5** | **144.5** | **βœ… 94.81%** |
63
-
64
- **8 tiers at 100%** (T1, T5, T6, T7, T8, T9, T10, T11). Raw: 97/103.
65
- Publication threshold: 90%. v3.1 clears it with 4.81 points to spare.
66
-
67
- ### Remaining Gaps β€” v3.2 Targets
68
-
69
- All 6 failures are rsx! macro drops or cx.render carryover on RSX-heavy
70
- questions:
71
 
72
- - **Q17, Q22** (T2) β€” missing `rsx!` in RSX attribute-precision questions
73
- - **Q30** (T3) β€” `cx.render` slip on signal hygiene
74
- - **Q37, Q39, Q43** (T4) β€” `cx.render` / missing `rsx!` in WCAG answers
75
 
76
- Root cause under investigation. Targeted for v3.2.
 
 
77
 
78
- ### Version History
79
 
80
- | Version | Score | Exam | Dataset | Status |
81
- |---|---|---|---|---|
82
- | v1.0 | 51/60 (85.0%) | 60Q standard | β€” | Published |
83
- | v2.0 | 135.5/140 (96.8%) | 100Q weighted | 4,185 | Published |
84
- | v3.0 | 124.0/144.5 (85.8%) | 103Q weighted | 4,535 | Published |
85
- | v3.1 | **137.0/144.5 (94.81%)** | 103Q weighted | **4,880** | **Published** |
86
-
87
- ## Model Details
88
-
89
- - **Base model:** Qwen3-Coder-14B (fresh base β€” never fine-tune a fine-tune)
90
- - **Method:** RAFT (Retrieval-Augmented Fine-Tuning), Unsloth LoRA
91
- - **Epochs:** 4
92
- - **Training hardware:** RTX 3090 Ti (homelab)
93
- - **Dataset:** 4,880 curated examples across 43 topics
94
- - **Scope:** Rust + Dioxus 0.7.5 + Tailwind v4 + DaisyUI 5 + WCAG 2.2 AAA
95
- + fullstack server functions + router
96
- - **Quantization:** GGUF Q4_K_M (8.4 GB)
97
- - **Author:** Kevin Miller, Jr.
98
 
99
- ## Install via Ollama
 
100
 
101
- ```
102
- ollama pull rockypod/neotoi-coder
103
- # or a specific version:
104
- ollama pull rockypod/neotoi-coder:v3.1
105
  ```
106
 
107
- ## Read the Full Story
108
 
109
- **[Read the whole story on RockyPod.com β†’](https://rockypod.com/blog/neotoi-coder-v2-release)**
110
 
111
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
- ## Files
114
 
115
  | File | Format | Size | Use case |
116
  |---|---|---|---|
117
- | `neotoi-coder-v3.1-q4_k_m.gguf` | GGUF Q4_K_M | 8.4 GB | LM Studio, llama.cpp, Ollama |
118
- | `mlx-v3.1/` | MLX 4-bit (4.5 bpw) | 7.8 GB | Apple Silicon (mlx-lm) |
119
  | `neotoi-coder-v3-q4_k_m_patched.gguf` | GGUF Q4_K_M | 9 GB | v3.0 legacy |
120
- | `mlx-v3/` | MLX 4-bit (4.5 bpw) | 7.8 GB | v3.0 legacy (Apple Silicon) |
121
- | `neotoi-coder-v2-q4_k_m.gguf` | GGUF Q4_K_M | 8.4 GB | v2.0 legacy |
122
- | `mlx/` | MLX 4-bit | 7.5 GB | v2.0 legacy |
 
123
 
124
  ## Enabling Thinking Mode
125
 
 
 
126
  ### LM Studio
127
 
128
  | Field | Value |
@@ -134,9 +128,9 @@ ollama pull rockypod/neotoi-coder:v3.1
134
  | Before Assistant | `<\|im_start\|>assistant\n<think>` |
135
  | After Assistant | `<\|im_end\|>` |
136
 
137
- ### Ollama (GGUF)
138
 
139
- ```
140
  FROM neotoi-coder-v3.1-q4_k_m.gguf
141
  PARAMETER temperature 0.2
142
  PARAMETER num_ctx 16384
@@ -152,8 +146,9 @@ SYSTEM You are Neotoi, an expert Rust and Dioxus 0.7 developer.
152
  ```
153
 
154
  Or simply pull the published model:
 
155
  ```
156
- ollama pull rockypod/neotoi-coder
157
  ```
158
 
159
  ### llama.cpp
@@ -168,63 +163,43 @@ ollama pull rockypod/neotoi-coder
168
 
169
  ## What It Knows
170
 
171
- Everything v3.0 knew, plus:
172
-
173
- - **DaisyUI 5** components on Tailwind v4 β€” `btn`, `card`, `drawer`,
174
- `modal`, `navbar`, `dropdown`, with `data-theme` discipline
175
- - **Router patterns** β€” `#[derive(Routable)]`, nested layouts, query
176
- params, route guards, static navigation composition
177
- - **Signals deep-dive** β€” `.peek()` vs `.read()`, `ReadOnlySignal`,
178
- `Signal<T>` composition, `GlobalSignal::global()` init patterns
179
- - **Async composition** β€” `use_resource` cancellation, streaming
180
- results, `ServerFnError` error-variant flows
181
-
182
- Carried forward from v3.0: Native scoped CSS (`css!()`), CSS modules
183
- (`.module.css`), `onauxclick` / `onscrollend` event handlers, real
184
- WebSocket Stream+Sink (`stream.next()`, `sink.send()`), GlobalSignal
185
- cache rebuilds, T11 server functions (`#[server]` extractors, fullstack
186
- WebSocket one-liner, `ServerFnError` + HTTP status codes),
187
- `use_context_provider` / `use_context` placement discipline.
188
-
189
- Carried forward from v2.0: Dioxus 0.7 RSX brace syntax (never function-
190
- call), `use_signal`, `use_resource` three-arm match, `r#for` on labels
191
- only, `GlobalSignal` `.write()` semantics, WCAG 2.2 AAA (tooltip always
192
- in DOM, listbox/option nesting, `aria_labelledby` on role containers),
193
- dioxus-primitives discipline, `styles!()` macro, Tailwind v4 utilities
194
- and semantic tokens, EN/VI i18n via pre-rsx! let bindings, dark mode
195
- via `document::eval`, static content navigation with `use_memo`,
196
- `use_context` panic behavior, `WritableResultExt`.
197
 
198
  ## Known Limitations
199
 
200
- - **rsx! macro drops** on 6 RSX-heavy questions (Q17/22/30/37/39/43);
201
- v3.2 target
202
- - **Non-Dioxus web frameworks** β€” out of scope by design
203
- (SvelteKit coverage lives in `rockypod/svcoder`)
204
- - **Playwright / E2E testing** β€” out of scope
205
 
206
  ## Transparency
207
 
208
- Full dataset, exam questions, and per-question model outputs are
209
- published alongside the weights:
210
-
211
  - **Weights:** [HuggingFace β€” rockypod/neotoi-coder](https://huggingface.co/rockypod/neotoi-coder)
212
- - **Dataset + exam + per-question results:** [GitHub β€” rockypod/neotoi-coder](https://github.com/rockypod/neotoi-coder)
213
- - **Ollama:** `ollama pull rockypod/neotoi-coder`
 
 
214
 
215
  ## License
216
 
217
- Neotoi Coder Community License v1.0 β€” see LICENSE file.
218
  Commercial use of model outputs permitted.
219
  Weight redistribution prohibited.
220
  Mental health deployment requires written permission.
221
 
222
  ## Credits
223
 
224
- Built with:
225
- - [Unsloth](https://github.com/unslothai/unsloth) β€” 2x faster fine-tuning
226
  - [TRL](https://github.com/huggingface/trl) β€” SFTTrainer
227
- - [Qwen3-Coder-14B](https://huggingface.co/Qwen/Qwen3-Coder-14B) β€” base model
228
- - [MLX](https://github.com/ml-explore/mlx) β€” Apple Silicon inference
229
- - [Claude Code](https://claude.ai/code) β€” dataset pipeline and training infrastructure
230
  - [Dioxus](https://dioxuslabs.com) β€” the framework this model specializes in
 
 
4
  language:
5
  - en
6
  - vi
7
+ base_model:
8
+ - Qwen/Qwen3-Coder-14B
9
+ - Qwen/Qwen3-8B
10
+ - Qwen/Qwen3-4B
11
  tags:
12
  - dioxus
13
  - rust
 
17
  - raft
18
  - code
19
  - server-functions
20
+ - gguf
21
+ - qwen3
22
  pipeline_tag: text-generation
23
  ---
24
 
25
+ # Neotoi Coder
26
 
27
+ A Rust / Dioxus 0.7 specialist LLM. v3.1 ships in **three sizes** β€”
28
+ 8B, 4B, and 14B β€” all fine-tuned via RAFT (Retrieval-Augmented
29
+ Fine-Tuning) on Qwen3 base models. Optimized for production-quality
30
+ Dioxus 0.7 components with Tailwind v4 and WCAG 2.2 AAA accessibility.
31
 
32
+ ## Variants
33
 
34
+ | Variant | Base | Params | Q4_K_M | Spec exam (104Q weighted, max 144.5) | Files |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  |---|---|---|---|---|---|
36
+ | **8B** (flagship) | Qwen3-8B | 8.2B (6.95B non-embed) | 4.68 GB | **144.5 / 144.5 β€” 100.00%** | [`v3.1.0-8b` branch](https://huggingface.co/rockypod/neotoi-coder/tree/v3.1.0-8b) |
37
+ | 4B | Qwen3-4B | 4.0B (3.6B non-embed, tied) | 2.33 GB | 143.5 / 144.5 β€” 99.31% | [`v3.1.0-4b` branch](https://huggingface.co/rockypod/neotoi-coder/tree/v3.1.0-4b) |
38
+ | 14B (legacy) | Qwen3-Coder-14B | 14.8B (13.2B non-embed) | 8.40 GB | 137.0 / 144.5 β€” 94.81% | this branch (`main`) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
+ All three clear the 90% publication bar **and** the 95% release bar with all per-tier floors PASS. The 8B is the recommended default; pick the 4B if disk / RAM is tight, pick the 14B for the broadest coverage.
 
 
41
 
42
+ > **The 8B and 4B GGUFs live on separate branches** β€” switch the branch
43
+ > dropdown at the top of this page (currently showing `main`) to
44
+ > `v3.1.0-8b` or `v3.1.0-4b` to see and download them.
45
 
46
+ ## Install via Ollama
47
 
48
+ ```bash
49
+ # 8B β€” recommended default
50
+ ollama pull rockypod/neotoi-coder:8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ # 4B β€” disk / RAM constrained, ~40% faster generation
53
+ ollama pull rockypod/neotoi-coder:4b
54
 
55
+ # 14B β€” legacy, broadest coverage
56
+ ollama pull rockypod/neotoi-coder:15b
 
 
57
  ```
58
 
59
+ ## Spec-exam scorecard β€” all three variants
60
 
61
+ Re-graded 2026-04-26 with the patched `run_grade_v31.py` (Q87 now also accepts `LANG()` / `THEME()` GlobalSignal accessor calls in addition to the literal `Signal` token β€” a false-negative fix that unlocked the 8B's perfect score).
62
 
63
+ | Tier | Max wt | 8B | 4B | 14B |
64
+ |---|---|---|---|---|
65
+ | T1 Fundamentals | 12.0 | 12.0 βœ… | 11.0 ⚠️ 91.7% | 12.0 βœ… |
66
+ | T2 RSX Syntax | 12.0 | 12.0 βœ… | 12.0 βœ… | 10.0 ⚠️ 83.3% |
67
+ | T3 Signal Hygiene | 12.0 | 12.0 βœ… | 12.0 βœ… | 11.0 βœ… 91.7% |
68
+ | T4 WCAG / ARIA | 21.0 | 21.0 βœ… | 21.0 βœ… | 16.5 ⚠️ 78.6% |
69
+ | T5 use_resource | 12.0 | 12.0 βœ… | 12.0 βœ… | 12.0 βœ… |
70
+ | T6 Hard Reasoning | 20.0 | 20.0 βœ… | 20.0 βœ… | 20.0 βœ… |
71
+ | T7 Primitives + CSS | 18.0 | 18.0 βœ… | 18.0 βœ… | 18.0 βœ… |
72
+ | T8 GlobalSignal / i18n | 12.0 | 12.0 βœ… | 12.0 βœ… | 12.0 βœ… |
73
+ | T9 Static Navigator | 9.0 | 9.0 βœ… | 9.0 βœ… | 9.0 βœ… |
74
+ | T10 Dioxus 0.7.4 | 12.0 | 12.0 βœ… | 12.0 βœ… | 12.0 βœ… |
75
+ | T11 Server Functions | 4.5 | 4.5 βœ… | 4.5 βœ… | 4.5 βœ… |
76
+ | **Total weighted** | **144.5** | **144.5** | **143.5** | **137.0** |
77
+ | **Total raw (of 103)** | β€” | **103** | **102** | **97** |
78
+ | **Percent** | β€” | **100.00%** | **99.31%** | **94.81%** |
79
+
80
+ Tier floors (82% on weight-1.0 / 1.5 tiers, 88% on weight-2.0 tiers): all PASS for all three variants.
81
+
82
+ The 4B's only miss is Q8 (T1 RSX conversion) β€” generation truncated mid-`<think>` block. The 14B drops on RSX-heavy questions (Q17, Q22, Q30, Q37, Q39, Q43); v3.2 target.
83
+
84
+ ## What's new in v3.1 (vs v3.0)
85
+
86
+ - **Two new sizes**: 8B and 4B alongside the 14B base, both surpassing the 14B's score.
87
+ - **T1 Fundamentals β†’ 100%** on 8B and 14B, 91.7% on 4B (+8.3 pts vs v3.0 14B).
88
+ - **T6 Hard Reasoning β†’ 100%** clean sweep, all three variants (+25 pts vs v3.0 14B).
89
+ - **T8 GlobalSignal / i18n β†’ 100%** all three variants.
90
+ - **T10 Dioxus 0.7.4 β†’ 100%** all three variants.
91
+ - **8 tiers at 100%** on the 14B; **11 tiers at 100%** on the 8B (perfect).
92
+ - **Dataset:** 4,880 curated examples across 43 topics (up from 4,535).
93
+
94
+ ## Version History
95
+
96
+ | Version | Base (params) | Score | Exam | Dataset |
97
+ |---|---|---|---|---|
98
+ | v1.0 | Qwen3-Coder-14B (14.8B) | 51/60 (85.0%) | 60Q standard | β€” |
99
+ | v2.0 | Qwen3-Coder-14B (14.8B) | 135.5/140 (96.8%) | 100Q weighted | 4,185 |
100
+ | v3.0 | Qwen3-Coder-14B (14.8B) | 124.0/144.5 (85.8%) | 103Q weighted | 4,535 |
101
+ | v3.1 14B | Qwen3-Coder-14B (14.8B) | 137.0/144.5 (94.81%) | 103Q weighted | 4,880 |
102
+ | **v3.1 8B** | **Qwen3-8B (8.2B)** | **144.5/144.5 (100.00%)** | **103Q weighted** | **4,880** |
103
+ | v3.1 4B | Qwen3-4B (4.0B, tied) | 143.5/144.5 (99.31%) | 103Q weighted | 4,880 |
104
 
105
+ ## Files in this branch (`main`, 14B)
106
 
107
  | File | Format | Size | Use case |
108
  |---|---|---|---|
109
+ | `neotoi-coder-v3.1-q4_k_m.gguf` | GGUF Q4_K_M | 8.4 GB | LM Studio, llama.cpp, Ollama (current) |
 
110
  | `neotoi-coder-v3-q4_k_m_patched.gguf` | GGUF Q4_K_M | 9 GB | v3.0 legacy |
111
+ | `neotoi-coder-v2.0-q4_k_m.gguf` | GGUF Q4_K_M | 9 GB | v2.0 legacy |
112
+ | `neotoi-coder-v1-q4_k_m_final.gguf` | GGUF Q4_K_M | 9 GB | v1.0 legacy |
113
+
114
+ For the 8B and 4B Q4_K_M GGUFs (with and without the `qwen3.thinking=true` patch), switch to the `v3.1.0-8b` or `v3.1.0-4b` branch via the dropdown above.
115
 
116
  ## Enabling Thinking Mode
117
 
118
+ This model emits Qwen3 native `<think>...</think>` blocks. Thinking is on by default with the `_patched.gguf` quants on inference backends that honor `qwen3.thinking`.
119
+
120
  ### LM Studio
121
 
122
  | Field | Value |
 
128
  | Before Assistant | `<\|im_start\|>assistant\n<think>` |
129
  | After Assistant | `<\|im_end\|>` |
130
 
131
+ ### Ollama (custom Modelfile)
132
 
133
+ ```Modelfile
134
  FROM neotoi-coder-v3.1-q4_k_m.gguf
135
  PARAMETER temperature 0.2
136
  PARAMETER num_ctx 16384
 
146
  ```
147
 
148
  Or simply pull the published model:
149
+
150
  ```
151
+ ollama pull rockypod/neotoi-coder:15b
152
  ```
153
 
154
  ### llama.cpp
 
163
 
164
  ## What It Knows
165
 
166
+ - Dioxus 0.7 RSX brace syntax β€” never function-call style
167
+ - `use_signal`, `use_resource` with the canonical three-arm match
168
+ - `r#for` on labels only, never inputs
169
+ - WCAG 2.2 AAA: `aria_labelledby`, `aria_describedby`, live regions, `role="alert"`, `role="dialog"`
170
+ - dioxus-primitives β€” no manual ARIA on managed components
171
+ - `styles!()` macro and native CSS modules
172
+ - Tailwind v4 utility classes and semantic tokens
173
+ - DaisyUI 5 components on Tailwind v4
174
+ - `GlobalSignal` patterns (LANG / THEME), EN/VI i18n, dark-mode toggling via `document::eval`
175
+ - Router patterns (`#[derive(Routable)]`, nested layouts, query params, route guards)
176
+ - Dioxus 0.7.4 APIs: `WritableResultExt`, WebSocket Stream+Sink, server-fn extractors
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
 
178
  ## Known Limitations
179
 
180
+ - **rsx! macro drops on the 14B** for 6 RSX-heavy questions (Q17 / 22 / 30 / 37 / 39 / 43); v3.2 target. The 8B and 4B do not reproduce these misses.
181
+ - **Non-Dioxus web frameworks** β€” out of scope by design (SvelteKit coverage lives in `rockypod/svcoder`).
182
+ - **Playwright / E2E testing** β€” out of scope.
 
 
183
 
184
  ## Transparency
185
 
 
 
 
186
  - **Weights:** [HuggingFace β€” rockypod/neotoi-coder](https://huggingface.co/rockypod/neotoi-coder)
187
+ - **Exam runner, grader, per-question results:** [GitHub β€” rockypod/neotoi-coder](https://github.com/rockypod/neotoi-coder)
188
+ - **Ollama:** `ollama pull rockypod/neotoi-coder:8b` (or `:4b`, or `:15b`)
189
+
190
+ The training dataset itself is **not redistributed** β€” see the GitHub repo for the data-generation pipeline. Tailwind v4 reference material is treated as a competence input, not a shipped artifact.
191
 
192
  ## License
193
 
194
+ Neotoi Coder Community License v1.0 β€” see `LICENSE`.
195
  Commercial use of model outputs permitted.
196
  Weight redistribution prohibited.
197
  Mental health deployment requires written permission.
198
 
199
  ## Credits
200
 
201
+ - [Unsloth](https://github.com/unslothai/unsloth) β€” 2Γ— faster fine-tuning
 
202
  - [TRL](https://github.com/huggingface/trl) β€” SFTTrainer
203
+ - [Qwen3-Coder-14B](https://huggingface.co/Qwen/Qwen3-Coder-14B), [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B), [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) β€” base models
 
 
204
  - [Dioxus](https://dioxuslabs.com) β€” the framework this model specializes in
205
+ - [Claude Code](https://claude.ai/code) β€” dataset pipeline and training infrastructure