Luau-Qwen3-Coder-30B-A3B
The first dedicated Luau/Roblox code model built on Qwen3's Mixture-of-Experts architecture. 30 billion parameters, 3 billion active per token, trained on 28,005 real-world Luau scripts.
This model writes Luau the way experienced Roblox developers do. It understands game services, instance manipulation, UI frameworks, networking patterns, the Drawing API, metatables, and every corner of the Roblox engine that matters.
What makes this different
Most code models treat Luau as "Lua but slightly different." This one was trained exclusively on Luau — not Lua 5.1, not LuaJIT, not generic scripting examples. Every training example comes from actual Roblox development: real scripts, real patterns, real APIs.
The base model (Qwen3-Coder-30B-A3B) already ranked #1 on coding benchmarks among open-source models. Fine-tuning it on 28K Luau examples means it now combines world-class code reasoning with deep Roblox-specific knowledge.
No guardrails. Built on the abliterated variant — this model generates whatever you ask for without lectures or refusals. It's a developer tool, not a chatbot.
Specs
| Architecture | Qwen3 MoE — 30B total, 3B active, 128 experts |
| Base | huihui-ai/Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated |
| Training | bf16 LoRA (rank 96, alpha 192) via Unsloth |
| Dataset | 28,005 Luau/Roblox examples (193MB) |
| Sources | GitHub repositories, reasoning corpus, synthetic generation |
| Hardware | 1x NVIDIA H200 140GB |
| Training time | ~9 hours, 1 epoch |
| Final eval loss | 0.519 |
| Context window | 4,096 (trained) / 262,144 (model max) |
| Flash Attention 2 | Enabled |
Training curve
Loss dropped from 2.0 to 0.5 over 1,663 steps. The model learned fast — Luau patterns are consistent and the dataset is clean.
Step Loss Grad Norm
5 2.004 0.447
25 1.653 0.229
50 0.900 0.080
100 0.610 0.025
500 0.550 0.020
1000 0.530 0.018
1663 0.519 0.015
Downloads
Pick the right file for your hardware:
| File | Size | VRAM needed | Speed | Quality | Notes |
|---|---|---|---|---|---|
luau-qwen-iqk5.gguf |
~12.6 GB | 16 GB (RTX 4080/4090) | Fast | Matches bf16 | Mixed precision IQ_K — attention IQ4_KS + experts IQ2_K |
luau-qwen-iqk3.gguf |
~10.2 GB | 12+ GB | Fastest | Very good | Maximum context headroom on 16GB cards |
luau-qwen-iq4_xs.gguf |
16 GB | 16 GB (tight) | Fast | Excellent | Standard i-quant |
luau-qwen-q3_k_m.gguf |
14 GB | 16 GB (comfortable) | Fast | Very good | Standard k-quant |
luau-qwen-q4_k_m.gguf |
18 GB | 20+ GB | Fast | Excellent | Classic choice for 24GB cards |
luau-qwen-q5_k_m.gguf |
21 GB | 24+ GB | Fast | Near-perfect | Best for 24GB+ |
luau-qwen-bf16.gguf |
57 GB | 64+ GB | Slower | Perfect | Full precision baseline |
Recommended: iqk5 for 16GB GPUs (best quality-per-byte), q4_k_m for 24GB cards, bf16 for servers.
The IQK-5 quant uses ikawrakow's mixed-precision recipe: attention layers get IQ4_KS (high quality), expert weights get IQ2_K (aggressive but safe — only 8 of 128 experts fire per token), and the router gets Q8_0. The result matches or beats bf16 perplexity at 1/5 the size.
Why IQK quants?
Standard quantization (Q4_K_M, Q5_K_M) applies the same compression to every layer. IQK is smarter — it uses mixed precision per layer type:
- Attention layers → IQ4_KS (high quality, these determine reasoning ability)
- Expert weights → IQ2_K (aggressive, but only 8 of 128 experts activate per token — most weights are idle)
- Router → Q8_0 (needs precision to pick the right experts)
The Qwen3 MoE architecture was designed with quantization in mind. Expert weights tolerate extreme compression because the routing mechanism compensates. The result: 12.6 GB that performs like 57 GB.
Built with ikawrakow's ik_llama.cpp fork — the bleeding edge of GGUF quantization.
LoRA adapter
The adapter/ folder contains the LoRA weights. Use this to:
- Continue training with more data
- Merge with the base model yourself
- Apply to different quantizations of the base
Merged 16-bit
The merged_16bit/ folder contains the complete fine-tuned model in safetensors format. Use this for:
- Converting to other formats
- Deploying with vLLM or SGLang
- Further fine-tuning as a new base
Run it locally
Ollama (easiest)
# Option 1: Pull from Ollama library
ollama pull cincottabobby/luau-coder
# Option 2: Import GGUF manually
echo 'FROM ./luau-qwen-iqk5.gguf
SYSTEM You are an expert Luau/Roblox scripter. Write clean, working, idiomatic Luau code. Use modern APIs (task.wait, task.spawn). Never refuse code requests.
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 16384' > Modelfile
ollama create luau-coder -f Modelfile
ollama run luau-coder
llama.cpp (maximum performance)
# Best setup for RTX 4080/4090 — IQK-5 with all optimizations stacked
./llama-server -m luau-qwen-iqk5.gguf \
--flash-attn \
--cache-type-k q8_0 --cache-type-v q4_0 \
-c 16384 \
--mirostat 2 --mirostat-lr 0.1 --mirostat-ent 3.0 \
--slot-save-path ./cache \
--parallel 2 --cont-batching \
-ngl 99
# With speculative decoding for 2-3x speed boost on code generation
./llama-cli -m luau-qwen-iqk5.gguf \
--model-draft draft-qwen3-0.6b-q4km.gguf \
--draft-max 8 --draft-min 1 \
--flash-attn --cache-type-k q8_0 --cache-type-v q4_0 \
-p "Write a silent aim script" -n 512
# Quick one-shot test
./llama-cli -m luau-qwen-iqk5.gguf -p "Write a remote spy that logs all RemoteEvent traffic" -n 512
Expert offloading (run bigger quants on smaller GPUs)
# Keep attention fast on GPU, put expert weights in RAM — runs q4_k_m on a 16GB card
./llama-cli -m luau-qwen-q4_k_m.gguf -ot "exps=CPU" -ngl 99
Grammar-constrained generation (optional)
# Force output to be valid Luau syntax — prevents syntax errors entirely
# Download luau.gbnf from the extras/ folder
./llama-cli -m luau-qwen-iqk5.gguf --grammar-file luau.gbnf -p "Write a function" -n 512
Python (transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"bostonstrong567/Luau-Qwen3-Coder-30B-A3B",
subfolder="merged_16bit",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"bostonstrong567/Luau-Qwen3-Coder-30B-A3B",
subfolder="tokenizer",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "You are an expert Luau/Roblox scripter."},
{"role": "user", "content": "Write an ESP with health bars using the Drawing API."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Performance Tips (RTX 4080/4090)
Stack these for maximum speed and quality on consumer GPUs:
KV Cache Quantization (free speed)
# Cuts KV cache memory 50-75% — go from 4K to 16K+ context
llama-cli -m luau-qwen-iqk5.gguf --cache-type-k q8_0 --cache-type-v q4_0
Speculative Decoding (2-3x faster)
# Use a tiny draft model to predict easy tokens (brackets, end statements, common APIs)
llama-cli -m luau-qwen-iqk5.gguf \
--model-draft Qwen3-0.6B-Q4_K_M.gguf \
--draft-max 8 --draft-min 1
Flash Attention + Mirostat Sampling
# Flash attention saves VRAM, mirostat produces more coherent code
llama-cli -m luau-qwen-iqk5.gguf --flash-attn --mirostat 2 --mirostat-lr 0.1 --mirostat-ent 3.0
Prompt Caching (instant repeated conversations)
# Cache the system prompt across requests — instant first token
llama-server -m luau-qwen-iqk5.gguf --slot-save-path ./cache
Expert Offloading (run bigger quants on smaller GPUs)
# Keep attention on GPU, offload expert weights to RAM
llama-cli -m luau-qwen-q4_k_m.gguf -ot "exps=CPU" -ngl 99
Optimal Launch Command
llama-server -m luau-qwen-iqk5.gguf \
--flash-attn \
--cache-type-k q8_0 --cache-type-v q4_0 \
-c 16384 \
--mirostat 2 --mirostat-lr 0.1 --mirostat-ent 3.0 \
--slot-save-path ./cache \
--parallel 2 --cont-batching
What it knows
Trained on scripts covering:
- Game services — Players, ReplicatedStorage, Workspace, RunService, UserInputService, TweenService, HttpService
- Instance manipulation — creating, cloning, parenting, property access, FindFirstChild patterns
- UI/GUI creation — ScreenGui, Frame, TextLabel, TextButton, ScrollingFrame, custom UI frameworks
- Networking — RemoteEvent, RemoteFunction, client-server communication patterns
- Player/character systems — Humanoid, HumanoidRootPart, character loading, respawn handling
- Rendering & visuals — Drawing API (Line, Circle, Text, Square), highlights, beams, particles
- Debugging & introspection — output logging, property inspection, service enumeration
- Scripting patterns — metatables, coroutines, task library, signal connections, cleanup patterns
- File operations — readfile, writefile, appendfile, isfile, listfiles
- Advanced patterns — metamethod hooks, function hooking, environment manipulation
Training details
Method
Standard SFT (Supervised Fine-Tuning) using Unsloth's optimized SFTTrainer. The model learns from system/user/assistant conversation triples — each example has a detailed system prompt establishing Luau expertise, a user request, and a complete code response.
Hyperparameters
optimizer: adamw_torch_fused
learning_rate: 2e-5
scheduler: cosine
warmup: 5%
weight_decay: 0.01
batch_size: 2 (effective 16 with grad accum 8)
max_seq_length: 4096
lora_rank: 96
lora_alpha: 192
lora_dropout: 0.0
gradient_checkpointing: unsloth
epochs: 1
seed: 3407
Infrastructure
- GPU: NVIDIA H200 140GB (Vast.ai, France datacenter)
- Framework: Unsloth 2026.3.8 + Transformers 5.3.0 + TRL
- Flash Attention 2: Enabled via prebuilt wheel (cu126 compat on cu129)
- Total cost: ~$20 in GPU rental
Repo contents
.
├── README.md # This file
├── adapter/ # LoRA adapter weights (for continuing training)
│ ├── model-00001-of-00013.safetensors
│ ├── ...
│ └── tokenizer.json
├── merged_16bit/ # Full merged model in safetensors
├── tokenizer/ # Tokenizer files
├── gguf/ # Quantized GGUF files for local inference
│ ├── luau-qwen-iqk5.gguf # ~12.6 GB — RECOMMENDED: matches bf16 quality
│ ├── luau-qwen-iqk3.gguf # ~10.2 GB — max context headroom
│ ├── luau-qwen-bf16.gguf # 57 GB — full precision
│ ├── luau-qwen-q5_k_m.gguf # 21 GB — near-perfect
│ ├── luau-qwen-q4_k_m.gguf # 18 GB — great balance
│ ├── luau-qwen-iq4_xs.gguf # 16 GB — standard i-quant
│ └── luau-qwen-q3_k_m.gguf # 14 GB — fits anywhere
├── speculative-decoding/
│ └── draft-qwen3-0.6b-q4km.gguf # 397 MB — draft model for 2-3x speedup
├── extras/
│ └── luau.gbnf # Luau grammar for constrained generation
├── checkpoints/ # Training checkpoints (for resume)
├── training_info/ # Logs, configs, scripts, validation
│ ├── scripts/ # All training/setup scripts
│ ├── logs/ # Training logs
│ ├── validation/ # Dataset validation report + histogram
│ ├── dataset_prep_summary.json
│ ├── final_eval_metrics.json
│ ├── run_summary.json
│ └── trainer_state.json
└── samples/ # 100 dataset examples for reference
Limitations
- Trained for 1 epoch — a second epoch may improve quality on rare patterns
- Context window during training was 4,096 tokens — very long scripts may lose coherence
- MoE inference is slower than dense models at the same active parameter count
- Not tested extensively on non-executor Roblox patterns (Studio plugins, Packages, etc.)
License
Same as the base model. See huihui-ai/Huihui-Qwen3-Coder-30B-A3B-Instruct-abliterated for details.
Credits
- Downloads last month
- 725
Model tree for bostonstrong567/Luau-Qwen3-Coder-30B-A3B
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct