Qwen3-4B-4bit Terminal Assistant
Fine-tuned Qwen3-4B model for terminal command generation. Optimized for LocalTerm app.
Note: HuggingFace shows "0.6B params" - this is incorrect. The actual model has 4 billion parameters (4-bit quantized). HuggingFace miscalculates param count for MLX quantized safetensors files.
Model Details
- Base Model: mlx-community/Qwen3-4B-4bit
- Actual Parameters: 4 billion (same as base model)
- Quantization: 4-bit (MLX format, ~2.3GB file size)
- Fine-tuning: QLoRA on 16 layers
- Training Data: 388 examples, 74 terminal commands
- Accuracy: 98% on test set (147/150 correct)
Usage
With MLX-LM (Python)
from mlx_lm import load, generate
model, tokenizer = load("mlxstudio/qwen3-4b-4bit-terminal")
prompt = "how to create a git repository"
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
print(response)
With LocalTerm (macOS app)
Model auto-downloads on first run. See LocalTerm.
Training Details
- Method: QLoRA fine-tuning with mlx-lm
- Iterations: 300
- Learning Rate: 1e-5
- Data Format: Qwen3 chat template with /nothink tag
- Train/Test Split: Clean split (0% overlap)
Version History
- v2 (2026-01-22): Re-fused model with correct weight format
- Fixed:
.linear.prefix issue in LoRA merged weights - Now compatible with mlx-swift-lm
- Fixed:
- v1 (2026-01-21): Initial release (had loading issues in Swift)
License
Apache 2.0 (same as base model)
- Downloads last month
- 106
Model size
0.6B params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to add your hardware
4-bit