Instructions to use rockypod/neotoi-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rockypod/neotoi-coder with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rockypod/neotoi-coder", filename="neotoi-coder-v1-q4_k_m_final.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rockypod/neotoi-coder with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/neotoi-coder:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/neotoi-coder:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/neotoi-coder:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/neotoi-coder:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rockypod/neotoi-coder:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rockypod/neotoi-coder:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rockypod/neotoi-coder:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rockypod/neotoi-coder:Q4_K_M
Use Docker
docker model run hf.co/rockypod/neotoi-coder:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rockypod/neotoi-coder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockypod/neotoi-coder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/neotoi-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rockypod/neotoi-coder:Q4_K_M
- Ollama
How to use rockypod/neotoi-coder with Ollama:
ollama run hf.co/rockypod/neotoi-coder:Q4_K_M
- Unsloth Studio new
How to use rockypod/neotoi-coder with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/neotoi-coder to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/neotoi-coder to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rockypod/neotoi-coder to start chatting
- Pi new
How to use rockypod/neotoi-coder with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/neotoi-coder:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rockypod/neotoi-coder:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rockypod/neotoi-coder with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/neotoi-coder:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rockypod/neotoi-coder:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rockypod/neotoi-coder with Docker Model Runner:
docker model run hf.co/rockypod/neotoi-coder:Q4_K_M
- Lemonade
How to use rockypod/neotoi-coder with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rockypod/neotoi-coder:Q4_K_M
Run and chat with the model
lemonade run user.neotoi-coder-Q4_K_M
List all available models
lemonade list
Neotoi Coder
A Rust / Dioxus 0.7 specialist LLM fine-tuned on 5,287 curated examples covering the full Dioxus 0.7 series (0.7.0โ0.7.9), Tailwind v4, and WCAG 2.2 AAA accessibility. All three v3.2 variants are published.
All variants are fine-tuned via RAFT (Retrieval-Augmented Fine-Tuning) on Qwen3 base models using LoRA adapters (Unsloth), optimized for production-quality Dioxus 0.7 components.
Variants
| Variant | Repo | Base | Params | Q4_K_M | Spec exam |
|---|---|---|---|---|---|
| 15B v3.2 (this repo) | rockypod/neotoi-coder |
Qwen3-Coder-14B | 14.8B | 8.4 GB | 156.0 / 164.0 โ 95.12% (114Q, 13 tiers) |
| 8B v3.2 | rockypod/neotoi-coder-8b |
Qwen3-8B | 8.2B | 4.68 GB | 160.0 / 164.0 โ 97.56% (114Q, 13 tiers) |
| 4B v3.2 | rockypod/neotoi-coder-4b |
Qwen3-4B | 4.0B | 2.33 GB | 160.0 / 164.0 โ 97.56% (114Q, 13 tiers) |
All three clear the 90% publication bar and the 95% release bar.
The 8B and 4B tie at 97.56% with complementary failure patterns:
- 4B scores 100% on T13 SyncStore (8B scored 50%) and 100% on T8 GlobalSignal/i18n (8B scored 87.5%)
- 8B scores 100% on T12 Format Compliance (4B scored 66.7%)
Pick by hardware: 4B (2.3 GB) if disk/RAM is tight with perfect SyncStore; 8B (4.7 GB) for best format compliance at moderate size; 15B (8.4 GB) for the broadest Dioxus 0.7.4โ0.7.9 surface coverage.
MLX format for v3.2 is available at
mlx-v3.2/in this repo (7.7 GB, 4-bit quantized, 2 shards). v3.1 MLX remains atmlx-v3.1/.
Install via Ollama
# 15B v3.2 โ broadest Dioxus 0.7.4โ0.7.9 surface
ollama pull rockypod/neotoi-coder:latest
ollama pull rockypod/neotoi-coder:15b # explicit size tag
# 8B v3.2 โ highest raw score, ~40% faster than 15B, perfect format compliance
ollama pull rockypod/neotoi-coder:8b
# 4B v3.2 โ disk / RAM constrained, perfect SyncStore
ollama pull rockypod/neotoi-coder:4b
Tags: :latest / :15b, :8b, :4b, :v3.1 (archive). Each Modelfile
sets num_ctx 8192, temperature 0.2, and prefills <think> on the
assistant turn so Qwen3 native chain-of-thought emits by default.
v3.2 Scorecards (114Q, max 164.0)
All-variant summary
| Variant | Score | Weighted | Raw | T12 Format | T13 SyncStore |
|---|---|---|---|---|---|
| 8B | 97.56% | 160.0 / 164.0 | 111 / 114 | โ 100.0% | โ ๏ธ 50.0% |
| 4B | 97.56% | 160.0 / 164.0 | 112 / 114 | โ ๏ธ 66.7% | โ 100.0% |
| 15B | 95.12% | 156.0 / 164.0 | 109 / 114 | โ ๏ธ 83.3% | โ ๏ธ 0.0% |
15B scorecard
| Tier | Count | Max wt | Raw | Wtd | Rate | Floor | Status |
|---|---|---|---|---|---|---|---|
| T1 Fundamentals | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T2 RSX Syntax | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T3 Signal Hygiene | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T4 WCAG / ARIA | 15 | 22.5 | 15 | 22.5 | 100.0% | 82% | โ (was 78.6% in v3.1) |
| T5 use_resource | 8 | 12.0 | 8 | 12.0 | 100.0% | 82% | โ |
| T6 Hard Reasoning | 10 | 20.0 | 10 | 20.0 | 100.0% | 88% | โ |
| T7 Primitives + CSS | 13 | 19.5 | 12 | 18.0 | 92.3% | 82% | โ |
| T8 GlobalSignal / i18n | 8 | 12.0 | 7 | 10.5 | 87.5% | 82% | โ |
| T9 Static Navigator | 6 | 9.0 | 6 | 9.0 | 100.0% | 82% | โ |
| T10 Dioxus 0.7.4 | 6 | 12.0 | 6 | 12.0 | 100.0% | 88% | โ |
| T11 Server Functions | 4 | 6.0 | 4 | 6.0 | 100.0% | 82% | โ |
| T12 Format Compliance (NEW) | 6 | 12.0 | 5 | 10.0 | 83.3% | 88% | โ ๏ธ |
| T13 SyncStore (NEW) | 2 | 3.0 | 0 | 0.0 | 0.0% | 82% | โ ๏ธ |
| Total | 114 | 164.0 | 109 | 156.0 | 95.12% | โ | โ |
8B scorecard
| Tier | Count | Max wt | Raw | Wtd | Rate | Floor | Status |
|---|---|---|---|---|---|---|---|
| T1 Fundamentals | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T2 RSX Syntax | 12 | 12.0 | 11 | 11.0 | 91.7% | 82% | โ |
| T3 Signal Hygiene | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T4 WCAG / ARIA | 15 | 22.5 | 15 | 22.5 | 100.0% | 82% | โ |
| T5 use_resource | 8 | 12.0 | 8 | 12.0 | 100.0% | 82% | โ |
| T6 Hard Reasoning | 10 | 20.0 | 10 | 20.0 | 100.0% | 88% | โ |
| T7 Primitives + CSS | 13 | 19.5 | 13 | 19.5 | 100.0% | 82% | โ |
| T8 GlobalSignal / i18n | 8 | 12.0 | 7 | 10.5 | 87.5% | 82% | โ |
| T9 Static Navigator | 6 | 9.0 | 6 | 9.0 | 100.0% | 82% | โ |
| T10 Dioxus 0.7.4 | 6 | 12.0 | 6 | 12.0 | 100.0% | 88% | โ |
| T11 Server Functions | 4 | 6.0 | 4 | 6.0 | 100.0% | 82% | โ |
| T12 Format Compliance | 6 | 12.0 | 6 | 12.0 | 100.0% | 88% | โ |
| T13 SyncStore | 2 | 3.0 | 1 | 1.5 | 50.0% | 82% | โ ๏ธ |
| Total | 114 | 164.0 | 111 | 160.0 | 97.56% | โ | โ |
T13 floor failure is structural โ only 2 questions means any single miss = 50%.
4B scorecard
| Tier | Count | Max wt | Raw | Wtd | Rate | Floor | Status |
|---|---|---|---|---|---|---|---|
| T1 Fundamentals | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T2 RSX Syntax | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T3 Signal Hygiene | 12 | 12.0 | 12 | 12.0 | 100.0% | 82% | โ |
| T4 WCAG / ARIA | 15 | 22.5 | 15 | 22.5 | 100.0% | 82% | โ |
| T5 use_resource | 8 | 12.0 | 8 | 12.0 | 100.0% | 82% | โ |
| T6 Hard Reasoning | 10 | 20.0 | 10 | 20.0 | 100.0% | 88% | โ |
| T7 Primitives + CSS | 13 | 19.5 | 13 | 19.5 | 100.0% | 82% | โ |
| T8 GlobalSignal / i18n | 8 | 12.0 | 8 | 12.0 | 100.0% | 82% | โ |
| T9 Static Navigator | 6 | 9.0 | 6 | 9.0 | 100.0% | 82% | โ |
| T10 Dioxus 0.7.4 | 6 | 12.0 | 6 | 12.0 | 100.0% | 88% | โ |
| T11 Server Functions | 4 | 6.0 | 4 | 6.0 | 100.0% | 82% | โ |
| T12 Format Compliance | 6 | 12.0 | 4 | 8.0 | 66.7% | 88% | โ ๏ธ |
| T13 SyncStore | 2 | 3.0 | 2 | 3.0 | 100.0% | 82% | โ |
| Total | 114 | 164.0 | 112 | 160.0 | 97.56% | โ | โ |
T12 misses: q111 (old cx.render idiom + orphan </think>), q112 (missing rsx!).
The 4B also scores 100% on T8 GlobalSignal/i18n where the 8B scored 87.5%.
What's new in v3.2
Score deltas vs v3.1
- 15B: 94.81% โ 95.12% on a harder, longer exam (114Q vs 103Q, max 164 vs 144.5, two new tiers). T4 WCAG/ARIA: 78.6% โ 100.0%.
- 8B: 100.00% โ 97.56% โ exam is harder (two new tiers added; both are fresh weaknesses). T7 Primitives+CSS and T12 Format Compliance both hit 100% where the 15B scored 92.3% and 83.3%.
- 4B: 99.31% โ 97.56% โ same exam difficulty note. T13 SyncStore hits 100% (a new tier the 8B misses entirely).
New Dioxus 0.7 surface
v3.2 expands coverage from Dioxus 0.7.0 through Dioxus 0.7.9 (full 0.7 series). New training topics:
- T44 Scoped CSS and CSS modules (Dioxus 0.7.3)
- T45 SyncStore +
use_store_sync(Dioxus 0.7.2, cross-thread reactive state) - T46 New events:
onauxclick,onscrollend(0.7.3) - T47 Server-only extractors +
serde_qsquery string support - T48 0.7.2 bug-fix awareness โ optional callback props, child router layouts,
use_dropin prelude - T49 0.7.4 APIs:
WritableResultExt, WebSocketStream + Sink, FFI for Kotlin/Java/Swift, iOS widget bundling - T50 0.7.6 RSX additions:
inertattribute, web panic resilience,IntoAttributeValuefor&T,Action::PartialEq - T51
use_contextvsconsume_contextโ panic-on-missing-provider semantics
Eval-driven corrections (T52โT57)
- T52 Format Compliance โ fenced-code-only outputs, no prose preamble, no orphan
</think> - T53 Preserve-and-Append โ
.ftlcatalogs,Cargo.toml, route enums: add without replacing - T54 Dioxus 0.7 idiom reinforcement โ
Outlet::<Route>,t!(), DaisyUI v5 / Tailwind v4 - T55 WCAG / ARIA corrections โ drives the 78.6% โ 100% jump on the 15B
- T56
dioxus-i18n+ Fluent โLanguageIdentifier, catalog append - T57 Scope discipline โ answer exactly what was asked
Dataset
- 5,287 curated examples across 57 topics (up from 4,880 / 43 in v3.1)
- Cross-stack contamination scan removed 489 rows:
fn app(โfn App(,launch(app)โlaunch(App), threeuseEffect(โuse_effect(React leaks
Version History
| Version | Base (params) | Score | Exam | Dataset |
|---|---|---|---|---|
| v1.0 | Qwen3-Coder-14B (14.8B) | 51/60 (85.0%) | 60Q standard | โ |
| v2.0 | Qwen3-Coder-14B (14.8B) | 135.5/140 (96.8%) | 100Q weighted | 4,185 |
| v3.0 | Qwen3-Coder-14B (14.8B) | 124.0/144.5 (85.8%) | 103Q weighted, 11 tiers | 4,535 |
| v3.1 15B | Qwen3-Coder-14B (14.8B) | 137.0/144.5 (94.81%) | 103Q weighted, 11 tiers | 4,880 |
| v3.1 8B | Qwen3-8B (8.2B) | 144.5/144.5 (100.00%) | 103Q weighted, 11 tiers | 4,880 |
| v3.1 4B | Qwen3-4B (4.0B, tied) | 143.5/144.5 (99.31%) | 103Q weighted, 11 tiers | 4,880 |
| v3.2 15B | Qwen3-Coder-14B (14.8B) | 156.0/164.0 (95.12%) | 114Q weighted, 13 tiers | 5,287 |
| v3.2 8B | Qwen3-8B (8.2B) | 160.0/164.0 (97.56%) | 114Q weighted, 13 tiers | 5,287 |
| v3.2 4B | Qwen3-4B (4.0B, tied) | 160.0/164.0 (97.56%) | 114Q weighted, 13 tiers | 5,287 |
Files in this repo (15B and historical)
| File | Format | Size | Use case |
|---|---|---|---|
neotoi-coder-v3.2-q4_k_m_patched.gguf |
GGUF Q4_K_M | 8.4 GB | Current 15B v3.2 โ LM Studio, llama.cpp, Ollama |
mlx-v3.2/ |
MLX 4-bit safetensors | 7.7 GB | Current 15B v3.2 MLX โ Apple Silicon (mlx-lm) |
neotoi-coder-v3.1-q4_k_m.gguf |
GGUF Q4_K_M | 8.4 GB | v3.1 archive |
neotoi-coder-v3-q4_k_m_patched.gguf |
GGUF Q4_K_M | 9 GB | v3.0 archive |
neotoi-coder-v2.0-q4_k_m.gguf |
GGUF Q4_K_M | 9 GB | v2.0 archive |
neotoi-coder-v1-q4_k_m_final.gguf |
GGUF Q4_K_M | 9 GB | v1.0 archive |
mlx-v3.1/ |
MLX safetensors | โ | v3.1 MLX archive |
mlx-v3/ |
MLX safetensors | โ | v3.0 MLX archive |
For the 8B v3.2 and 4B v3.2 Q4_K_M GGUFs, see their dedicated repos:
Enabling Thinking Mode
This model emits Qwen3 native <think>...</think> blocks. Thinking is on
by default with the _patched.gguf quants on inference backends that
honor qwen3.thinking.
License
Fine-tuned weights: Neotoi Coder Community License v1.0 โ commercial use of outputs permitted, weight redistribution prohibited, mental health deployment requires written permission. See LICENSE.
Base model: Qwen3-Coder-14B โ Apache 2.0 ยฉ Alibaba Cloud.
Built on a homelab RTX 3090 Ti in Washington State.
- Downloads last month
- 124
4-bit