How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf 08210821iy/Qwen3-4B-Coder:Q4_K_M
Use Docker
docker model run hf.co/08210821iy/Qwen3-4B-Coder:Q4_K_M
Quick Links

Qwen3-4B-Coder

A fine-tuned Qwen3-4B model specialized for Python code generation, trained by an elementary school student on an RTX 4060 Laptop GPU (8 GB VRAM).

Qwen3-4Bใ‚’ใƒ™ใƒผใ‚นใซใ€Pythonใ‚ณใƒผใƒ‰็”Ÿๆˆใซ็‰นๅŒ–ใ—ใฆใƒ•ใ‚กใ‚คใƒณใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐใ—ใŸใƒขใƒ‡ใƒซใงใ™ใ€‚ๅฐๅญฆ็”ŸใŒRTX 4060 Laptop GPU (VRAM 8GB) ใงๅญฆ็ฟ’ใ—ใพใ—ใŸใ€‚

Benchmark Results

MBPP-sanitized (Practical Python Tasks)

Model MBPP pass@1 Condition
Qwen3-4B-Coder (this model) 69.3% (178/257) Q4_K_M, temperature=0.0
Qwen3-4B (official) 62.0% FP16, EvalPlus

+7.3 points improvement on practical coding tasks.

HumanEval (Algorithmic Tasks)

Model HumanEval pass@1 Condition
Qwen3-4B-Coder (this model) 47.6% (78/164) Q4_K_M, temperature=0.0
Qwen3-4B (official) 65.6% FP16, EvalPlus

Inference Speed

Benchmark Qwen3-4B-Coder Qwen3-4B (Q4_K_M) Speed Ratio
HumanEval (164 tasks) 793s 3623s 4.6x faster
MBPP (257 tasks) 1274s - -

Syntax error rate on HumanEval: 0% (164/164)

Key Findings

This model demonstrates that SFT for code-only output has two major benefits:

  1. Practical code generation ability improved (MBPP +7.3 points)
  2. Inference speed improved 4.6x by eliminating think blocks and explanations

Training Details

Parameter Value
Base Model Qwen/Qwen3-4B
Method SFT with LoRA (r=16, alpha=32)
Dataset PersonalAILab/AFM-CodeAgent-SFT-Dataset
Training Samples 8,869 (filtered to 512 tokens)
Epochs 3
Final Loss 0.72
MAX_SEQ 512
GPU NVIDIA RTX 4060 Laptop (8 GB VRAM)
Training Time ~5.5 hours
Quantization Q4_K_M (~2.4 GB)

Features

  • Code-only output without extra explanations
  • 4.6x faster inference than base model
  • Supports English and Japanese prompts
  • Optimized for agent pipelines
  • Syntax error rate 0% on HumanEval

AI Code Agent (CLI Tool)

An interactive CLI tool that uses this model to generate, execute, and auto-fix Python code.

git clone https://github.com/jiexiang018-tech/ai-python-agent.git
cd ai-python-agent
pip install -r requirements.txt
python setup.py
python agent.py
Downloads last month
48
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for 08210821iy/Qwen3-4B-Coder

Finetuned
Qwen/Qwen3-4B
Quantized
(218)
this model