Instructions to use rockypod/neotoi-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rockypod/neotoi-coder with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rockypod/neotoi-coder",
	filename="neotoi-coder-v1-q4_k_m_final.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use rockypod/neotoi-coder with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf rockypod/neotoi-coder:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf rockypod/neotoi-coder:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rockypod/neotoi-coder:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rockypod/neotoi-coder:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rockypod/neotoi-coder:Q4_K_M

Use Docker

docker model run hf.co/rockypod/neotoi-coder:Q4_K_M

LM Studio
Jan

vLLM

How to use rockypod/neotoi-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rockypod/neotoi-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rockypod/neotoi-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rockypod/neotoi-coder:Q4_K_M

Ollama
How to use rockypod/neotoi-coder with Ollama:
```
ollama run hf.co/rockypod/neotoi-coder:Q4_K_M
```

Unsloth Studio

How to use rockypod/neotoi-coder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/neotoi-coder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/neotoi-coder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rockypod/neotoi-coder to start chatting

How to use rockypod/neotoi-coder with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf rockypod/neotoi-coder:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "rockypod/neotoi-coder:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use rockypod/neotoi-coder with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf rockypod/neotoi-coder:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default rockypod/neotoi-coder:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use rockypod/neotoi-coder with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf rockypod/neotoi-coder:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "rockypod/neotoi-coder:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use rockypod/neotoi-coder with Docker Model Runner:
```
docker model run hf.co/rockypod/neotoi-coder:Q4_K_M
```

Lemonade

How to use rockypod/neotoi-coder with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rockypod/neotoi-coder:Q4_K_M

Run and chat with the model

lemonade run user.neotoi-coder-Q4_K_M

List all available models

lemonade list

Neotoi Coder

A Rust / Dioxus 0.7 specialist LLM fine-tuned on 5,287 curated examples covering the full Dioxus 0.7 series (0.7.0–0.7.9), Tailwind v4, and WCAG 2.2 AAA accessibility. All three v3.2 variants are published.

All variants are fine-tuned via RAFT (Retrieval-Augmented Fine-Tuning) on Qwen3 base models using LoRA adapters (Unsloth), optimized for production-quality Dioxus 0.7 components.

Variants

Variant	Repo	Base	Params	Q4_K_M	Spec exam
15B v3.2 (this repo)	`rockypod/neotoi-coder`	Qwen3-Coder-14B	14.8B	8.4 GB	156.0 / 164.0 — 95.12% (114Q, 13 tiers)
8B v3.2	`rockypod/neotoi-coder-8b`	Qwen3-8B	8.2B	4.68 GB	160.0 / 164.0 — 97.56% (114Q, 13 tiers)
4B v3.2	`rockypod/neotoi-coder-4b`	Qwen3-4B	4.0B	2.33 GB	160.0 / 164.0 — 97.56% (114Q, 13 tiers)

All three clear the 90% publication bar and the 95% release bar.

The 8B and 4B tie at 97.56% with complementary failure patterns:

4B scores 100% on T13 SyncStore (8B scored 50%) and 100% on T8 GlobalSignal/i18n (8B scored 87.5%)
8B scores 100% on T12 Format Compliance (4B scored 66.7%)

Pick by hardware: 4B (2.3 GB) if disk/RAM is tight with perfect SyncStore; 8B (4.7 GB) for best format compliance at moderate size; 15B (8.4 GB) for the broadest Dioxus 0.7.4–0.7.9 surface coverage.

MLX format for v3.2 is available at mlx-v3.2/ in this repo (7.7 GB, 4-bit quantized, 2 shards). v3.1 MLX remains at mlx-v3.1/.

Install via Ollama

# 15B v3.2 — broadest Dioxus 0.7.4–0.7.9 surface
ollama pull rockypod/neotoi-coder:latest
ollama pull rockypod/neotoi-coder:15b      # explicit size tag

# 8B v3.2 — highest raw score, ~40% faster than 15B, perfect format compliance
ollama pull rockypod/neotoi-coder:8b

# 4B v3.2 — disk / RAM constrained, perfect SyncStore
ollama pull rockypod/neotoi-coder:4b

Tags: :latest / :15b, :8b, :4b, :v3.1 (archive). Each Modelfile sets num_ctx 8192, temperature 0.2, and prefills <think> on the assistant turn so Qwen3 native chain-of-thought emits by default.

v3.2 Scorecards (114Q, max 164.0)

All-variant summary

Variant	Score	Weighted	Raw	T12 Format	T13 SyncStore
8B	97.56%	160.0 / 164.0	111 / 114	✅ 100.0%	⚠️ 50.0%
4B	97.56%	160.0 / 164.0	112 / 114	⚠️ 66.7%	✅ 100.0%
15B	95.12%	156.0 / 164.0	109 / 114	⚠️ 83.3%	⚠️ 0.0%

15B scorecard

Tier	Count	Max wt	Raw	Wtd	Rate	Floor	Status
T1 Fundamentals	12	12.0	12	12.0	100.0%	82%	✅
T2 RSX Syntax	12	12.0	12	12.0	100.0%	82%	✅
T3 Signal Hygiene	12	12.0	12	12.0	100.0%	82%	✅
T4 WCAG / ARIA	15	22.5	15	22.5	100.0%	82%	✅ (was 78.6% in v3.1)
T5 use_resource	8	12.0	8	12.0	100.0%	82%	✅
T6 Hard Reasoning	10	20.0	10	20.0	100.0%	88%	✅
T7 Primitives + CSS	13	19.5	12	18.0	92.3%	82%	✅
T8 GlobalSignal / i18n	8	12.0	7	10.5	87.5%	82%	✅
T9 Static Navigator	6	9.0	6	9.0	100.0%	82%	✅
T10 Dioxus 0.7.4	6	12.0	6	12.0	100.0%	88%	✅
T11 Server Functions	4	6.0	4	6.0	100.0%	82%	✅
T12 Format Compliance (NEW)	6	12.0	5	10.0	83.3%	88%	⚠️
T13 SyncStore (NEW)	2	3.0	0	0.0	0.0%	82%	⚠️
Total	114	164.0	109	156.0	95.12%	—	—

8B scorecard

Tier	Count	Max wt	Raw	Wtd	Rate	Floor	Status
T1 Fundamentals	12	12.0	12	12.0	100.0%	82%	✅
T2 RSX Syntax	12	12.0	11	11.0	91.7%	82%	✅
T3 Signal Hygiene	12	12.0	12	12.0	100.0%	82%	✅
T4 WCAG / ARIA	15	22.5	15	22.5	100.0%	82%	✅
T5 use_resource	8	12.0	8	12.0	100.0%	82%	✅
T6 Hard Reasoning	10	20.0	10	20.0	100.0%	88%	✅
T7 Primitives + CSS	13	19.5	13	19.5	100.0%	82%	✅
T8 GlobalSignal / i18n	8	12.0	7	10.5	87.5%	82%	✅
T9 Static Navigator	6	9.0	6	9.0	100.0%	82%	✅
T10 Dioxus 0.7.4	6	12.0	6	12.0	100.0%	88%	✅
T11 Server Functions	4	6.0	4	6.0	100.0%	82%	✅
T12 Format Compliance	6	12.0	6	12.0	100.0%	88%	✅
T13 SyncStore	2	3.0	1	1.5	50.0%	82%	⚠️
Total	114	164.0	111	160.0	97.56%	—	—

T13 floor failure is structural — only 2 questions means any single miss = 50%.

4B scorecard

Tier	Count	Max wt	Raw	Wtd	Rate	Floor	Status
T1 Fundamentals	12	12.0	12	12.0	100.0%	82%	✅
T2 RSX Syntax	12	12.0	12	12.0	100.0%	82%	✅
T3 Signal Hygiene	12	12.0	12	12.0	100.0%	82%	✅
T4 WCAG / ARIA	15	22.5	15	22.5	100.0%	82%	✅
T5 use_resource	8	12.0	8	12.0	100.0%	82%	✅
T6 Hard Reasoning	10	20.0	10	20.0	100.0%	88%	✅
T7 Primitives + CSS	13	19.5	13	19.5	100.0%	82%	✅
T8 GlobalSignal / i18n	8	12.0	8	12.0	100.0%	82%	✅
T9 Static Navigator	6	9.0	6	9.0	100.0%	82%	✅
T10 Dioxus 0.7.4	6	12.0	6	12.0	100.0%	88%	✅
T11 Server Functions	4	6.0	4	6.0	100.0%	82%	✅
T12 Format Compliance	6	12.0	4	8.0	66.7%	88%	⚠️
T13 SyncStore	2	3.0	2	3.0	100.0%	82%	✅
Total	114	164.0	112	160.0	97.56%	—	—

T12 misses: q111 (old cx.render idiom + orphan </think>), q112 (missing rsx!). The 4B also scores 100% on T8 GlobalSignal/i18n where the 8B scored 87.5%.

What's new in v3.2

Score deltas vs v3.1

15B: 94.81% → 95.12% on a harder, longer exam (114Q vs 103Q, max 164 vs 144.5, two new tiers). T4 WCAG/ARIA: 78.6% → 100.0%.
8B: 100.00% → 97.56% — exam is harder (two new tiers added; both are fresh weaknesses). T7 Primitives+CSS and T12 Format Compliance both hit 100% where the 15B scored 92.3% and 83.3%.
4B: 99.31% → 97.56% — same exam difficulty note. T13 SyncStore hits 100% (a new tier the 8B misses entirely).

New Dioxus 0.7 surface

v3.2 expands coverage from Dioxus 0.7.0 through Dioxus 0.7.9 (full 0.7 series). New training topics:

T44 Scoped CSS and CSS modules (Dioxus 0.7.3)
T45 SyncStore + use_store_sync (Dioxus 0.7.2, cross-thread reactive state)
T46 New events: onauxclick, onscrollend (0.7.3)
T47 Server-only extractors + serde_qs query string support
T48 0.7.2 bug-fix awareness — optional callback props, child router layouts, use_drop in prelude
T49 0.7.4 APIs: WritableResultExt, WebSocket Stream + Sink, FFI for Kotlin/Java/Swift, iOS widget bundling
T50 0.7.6 RSX additions: inert attribute, web panic resilience, IntoAttributeValue for &T, Action::PartialEq
T51 use_context vs consume_context — panic-on-missing-provider semantics

Eval-driven corrections (T52–T57)

T52 Format Compliance — fenced-code-only outputs, no prose preamble, no orphan </think>
T53 Preserve-and-Append — .ftl catalogs, Cargo.toml, route enums: add without replacing
T54 Dioxus 0.7 idiom reinforcement — Outlet::<Route>, t!(), DaisyUI v5 / Tailwind v4
T55 WCAG / ARIA corrections — drives the 78.6% → 100% jump on the 15B
T56 dioxus-i18n + Fluent — LanguageIdentifier, catalog append
T57 Scope discipline — answer exactly what was asked

Dataset

5,287 curated examples across 57 topics (up from 4,880 / 43 in v3.1)
Cross-stack contamination scan removed 489 rows: fn app( → fn App(, launch(app) → launch(App), three useEffect( → use_effect( React leaks

Version History

Version	Base (params)	Score	Exam	Dataset
v1.0	Qwen3-Coder-14B (14.8B)	51/60 (85.0%)	60Q standard	—
v2.0	Qwen3-Coder-14B (14.8B)	135.5/140 (96.8%)	100Q weighted	4,185
v3.0	Qwen3-Coder-14B (14.8B)	124.0/144.5 (85.8%)	103Q weighted, 11 tiers	4,535
v3.1 15B	Qwen3-Coder-14B (14.8B)	137.0/144.5 (94.81%)	103Q weighted, 11 tiers	4,880
v3.1 8B	Qwen3-8B (8.2B)	144.5/144.5 (100.00%)	103Q weighted, 11 tiers	4,880
v3.1 4B	Qwen3-4B (4.0B, tied)	143.5/144.5 (99.31%)	103Q weighted, 11 tiers	4,880
v3.2 15B	Qwen3-Coder-14B (14.8B)	156.0/164.0 (95.12%)	114Q weighted, 13 tiers	5,287
v3.2 8B	Qwen3-8B (8.2B)	160.0/164.0 (97.56%)	114Q weighted, 13 tiers	5,287
v3.2 4B	Qwen3-4B (4.0B, tied)	160.0/164.0 (97.56%)	114Q weighted, 13 tiers	5,287

Files in this repo (15B and historical)

File	Format	Size	Use case
`neotoi-coder-v3.2-q4_k_m_patched.gguf`	GGUF Q4_K_M	8.4 GB	Current 15B v3.2 — LM Studio, llama.cpp, Ollama
`mlx-v3.2/`	MLX 4-bit safetensors	7.7 GB	Current 15B v3.2 MLX — Apple Silicon (mlx-lm)
`neotoi-coder-v3.1-q4_k_m.gguf`	GGUF Q4_K_M	8.4 GB	v3.1 archive
`neotoi-coder-v3-q4_k_m_patched.gguf`	GGUF Q4_K_M	9 GB	v3.0 archive
`neotoi-coder-v2.0-q4_k_m.gguf`	GGUF Q4_K_M	9 GB	v2.0 archive
`neotoi-coder-v1-q4_k_m_final.gguf`	GGUF Q4_K_M	9 GB	v1.0 archive
`mlx-v3.1/`	MLX safetensors	—	v3.1 MLX archive
`mlx-v3/`	MLX safetensors	—	v3.0 MLX archive

For the 8B v3.2 and 4B v3.2 Q4_K_M GGUFs, see their dedicated repos:

Enabling Thinking Mode

This model emits Qwen3 native <think>...</think> blocks. Thinking is on by default with the _patched.gguf quants on inference backends that honor qwen3.thinking.

License

Fine-tuned weights: Neotoi Coder Community License v1.0 — commercial use of outputs permitted, weight redistribution prohibited, mental health deployment requires written permission. See LICENSE.

Built on a homelab RTX 3090 Ti in Washington State.

Downloads last month: 31

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

4-bit