Instructions to use Nanami14138/qwen3-4b-instruct-code-agent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nanami14138/qwen3-4b-instruct-code-agent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Nanami14138/qwen3-4b-instruct-code-agent") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Nanami14138/qwen3-4b-instruct-code-agent") model = AutoModelForCausalLM.from_pretrained("Nanami14138/qwen3-4b-instruct-code-agent") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use Nanami14138/qwen3-4b-instruct-code-agent with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Nanami14138/qwen3-4b-instruct-code-agent", filename="qwen3-4b-instruct-code-agent-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Nanami14138/qwen3-4b-instruct-code-agent with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Use Docker
docker model run hf.co/Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Nanami14138/qwen3-4b-instruct-code-agent with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Nanami14138/qwen3-4b-instruct-code-agent" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nanami14138/qwen3-4b-instruct-code-agent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
- SGLang
How to use Nanami14138/qwen3-4b-instruct-code-agent with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Nanami14138/qwen3-4b-instruct-code-agent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nanami14138/qwen3-4b-instruct-code-agent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Nanami14138/qwen3-4b-instruct-code-agent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nanami14138/qwen3-4b-instruct-code-agent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Nanami14138/qwen3-4b-instruct-code-agent with Ollama:
ollama run hf.co/Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
- Unsloth Studio new
How to use Nanami14138/qwen3-4b-instruct-code-agent with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Nanami14138/qwen3-4b-instruct-code-agent to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Nanami14138/qwen3-4b-instruct-code-agent to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Nanami14138/qwen3-4b-instruct-code-agent to start chatting
- Pi new
How to use Nanami14138/qwen3-4b-instruct-code-agent with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Nanami14138/qwen3-4b-instruct-code-agent with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Nanami14138/qwen3-4b-instruct-code-agent with Docker Model Runner:
docker model run hf.co/Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
- Lemonade
How to use Nanami14138/qwen3-4b-instruct-code-agent with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M
Run and chat with the model
lemonade run user.qwen3-4b-instruct-code-agent-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M# Run inference directly in the terminal:
llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M# Run inference directly in the terminal:
./llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_MUse Docker
docker model run hf.co/Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_MQwen3-4B-CodeAgent
A fine-tuned code execution and Code Review agent based on Qwen3-4B-Instruct, trained to follow a structured ReAct (Plan → Execute → Reflect → Finish) workflow with XML-formatted responses.
Model Description
This model is a LoRA fine-tuned version of Qwen3-4B-Instruct designed to function as an autonomous coding agent. It generates structured XML responses that can be parsed by an orchestration framework to execute code, review results, and iteratively debug.
| Attribute | Value |
|---|---|
| Base Model | Qwen3-4B-Instruct (3.6B params) |
| Architecture | Qwen3ForCausalLM, 36 layers, 2560 hidden size, GQA (32 heads / 8 KV heads) |
| Fine-tuning Method | LoRA (4-bit quantization + LoRA r=32, alpha=32) |
| Framework | Unsloth + TRL SFTTrainer |
| Training Data | m-a-p/Code-Feedback (~47K train samples) |
| Context Length | 4096 tokens |
| Precision | bfloat16 (merged weights) |
Intended Use
This model is designed for building code agent systems that need structured, parseable output. It is suitable for:
- Automated code generation with execution feedback loops
- Code review and iterative debugging pipelines
- Tool-augmented LLM applications with sandbox execution
- Educational coding assistants
Output Format
The model outputs XML-structured responses following a ReAct workflow:
<agent_response>
<node>Plan</node>
<next_node>Execute</next_node>
<content>
## Analysis
The task requires implementing a binary search algorithm.
## Plan
1. Define the function signature
2. Implement iterative binary search
3. Handle edge cases (empty array, target not found)
</content>
</agent_response>
Node Types
| Node | Trigger | Content | Next Node |
|---|---|---|---|
| Plan | User sends a task | Markdown-formatted solution plan | Execute |
| Execute | After Plan or Reflect | {"tool_name": "python_sandbox", "arguments": {"code": "..."}} |
Execute |
| Reflect | Execute fails (exit_code=1) | Root cause analysis and fix direction | Execute |
| Finish | Execute succeeds (exit_code=0) | Task summary | Finish |
Standard Workflow
Plan → Execute → (failure → Reflect → Execute → ...) → Finish
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Nanami14138/qwen3-4b-instruct-code-agent"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
## 🛠️ Prompting Strategy (系统提示词策略)
本模型被设计为一个基于 ReAct 框架的智能 Code Agent。为了让模型严格按照状态机(Plan -> Execute -> Reflect -> Finish)运行,并输出结构化的 XML 格式,**强烈建议在推理时使用以下 System Prompt**:
system_prompt = """你是一个专业的代码执行与Code Review智能Agent,遵循ReAct工作流。
## 输出格式
你的每一次回复都必须严格使用以下XML格式:
<agent_response>
<node>当前节点</node>
<next_node>下一个节点</next_node>
<content>输出内容</content>
</agent_response>
## 节点定义
### Plan(规划)
- 触发:收到用户任务后立即进入
- <content>:分析任务需求,以 Markdown 格式输出解决方案规划
- <next_node>:Execute
### Execute(执行)
- 触发:Plan 或 Reflect 之后进入
- <content>:输出 {"tool_name": "python_sandbox", "arguments": {"code": "你的代码"}}
- <next_node>:Execute(等待执行结果)
### Reflect(反思)
- 触发:Execute 执行失败(exit_code=1)后进入
- <content>:分析失败原因,定位根因,给出修正方向
- <next_node>:Execute(修正后重新执行)
### Finish(完成)
- 触发:Execute 执行成功(exit_code=0)后进入
- <content>:输出任务总结
- <next_node>:Finish
## 标准工作流
Plan → Execute → (失败 → Reflect → Execute → ...) → Finish"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "任务:Write a Python function to check if a number is prime.\n\n当前状态:Start"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, top_p=0.95)
response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
With Unsloth (Faster Inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="your-username/qwen3-4b-code-agent",
max_seq_length=4096,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
# Then use the same message format as above
Training Details
Data
Trained on m-a-p/Code-Feedback, a multi-turn code conversation dataset with ~66K examples. The data was processed into three pools:
| Pool | Description | Train Samples | Ratio |
|---|---|---|---|
| Pool A (Base SFT) | Single-turn code Q&A, plain text | 117 | 0.2% |
| Pool B (Code Review) | Multi-turn debug/review → ReAct XML format | 29,562 | 62.3% |
| Pool C (Discussion) | Multi-turn code discussion → ReAct XML format | 17,737 | 37.4% |
The system prompt is injected at training time (not stored in the data) to ensure consistent behavior.
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank (r) | 32 |
| LoRA alpha | 32 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Learning rate | 2e-4 (cosine schedule) |
| Warmup ratio | 0.1 |
| Batch size | 4 × 4 (gradient accumulation) = 16 effective |
| Max sequence length | 4096 |
| Precision | LoRA 4-bit (training), bfloat16 (merged) |
| Optimizer | AdamW 8-bit |
| Epochs | 3 (stopped early at ~5.8% progress, step 620/8892) |
Training Curve
| Step | Train Loss | Eval Loss |
|---|---|---|
| 20 | 1.927 | 1.905 |
| 100 | 0.649 | 0.573 |
| 200 | 0.463 | 0.454 |
| 300 | 0.412 | 0.422 |
| 400 | 0.413 | 0.409 |
| 500 | 0.374 | 0.401 |
| 600 | 0.383 | 0.397 |
Loss decreased from 1.90 to 0.40 with no signs of overfitting. The checkpoint at step 620 was merged for this release.
Hardware
- 8× NVIDIA L20 (48GB each), single-GPU training via LoRA
Evaluation
HumanEval (10-problem subset)
| Metric | Score |
|---|---|
| Pass@1 | 62.6% |
| Pass@2 | 71.14% |
| Pass@3 | 75.61% |
| Avg tokens/problem | 215.2 |
Evaluation was conducted on a 10-problem subset of HumanEval. Full 164-problem evaluation is planned.
Limitations
- Early checkpoint: This model was merged at step 620 out of 8892 total steps (~3.4% of training). Performance will likely improve with continued training.
- English-centric data: The training data (Code-Feedback) is predominantly in English. Chinese language coding tasks may have lower quality.
- XML format dependency: The model is trained to output structured XML. Without the system prompt, it may not follow the expected format.
- No real execution: The training data simulates tool responses; the model has not been trained with actual code execution feedback.
- Limited code languages: While the training data covers multiple languages, Python is heavily overrepresented.
- Hallucination risk: Like all LLMs, the model may generate plausible but incorrect code, especially for complex algorithms or domain-specific tasks.
Ethical Considerations
- The model should not be used to generate malicious code or exploit vulnerabilities.
- Generated code should always be reviewed by a human before deployment in production systems.
- The model may reproduce biases present in the training data (e.g., coding style preferences, library choices).
Citation
If you use this model, please cite the base model and training dataset:
@article{qwen3,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025}
}
@misc{code-feedback,
title={Code-Feedback: Multi-turn Code Conversation Dataset},
author={m-a-p},
url={https://huggingface.co/datasets/m-a-p/Code-Feedback}
}
- Downloads last month
- 443
Dataset used to train Nanami14138/qwen3-4b-instruct-code-agent
Evaluation results
- Pass@1 on HumanEval (10-problem subset)self-reported62.600
- Pass@3 on HumanEval (10-problem subset)self-reported75.610
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M# Run inference directly in the terminal: llama-cli -hf Nanami14138/qwen3-4b-instruct-code-agent:Q4_K_M