Qwen3-4B-4bit Terminal Assistant

Fine-tuned Qwen3-4B model for terminal command generation. Optimized for LocalTerm app.

Note: HuggingFace shows "0.6B params" - this is incorrect. The actual model has 4 billion parameters (4-bit quantized). HuggingFace miscalculates param count for MLX quantized safetensors files.

Model Details

  • Base Model: mlx-community/Qwen3-4B-4bit
  • Actual Parameters: 4 billion (same as base model)
  • Quantization: 4-bit (MLX format, ~2.3GB file size)
  • Fine-tuning: QLoRA on 16 layers
  • Training Data: 388 examples, 74 terminal commands
  • Accuracy: 98% on test set (147/150 correct)

Usage

With MLX-LM (Python)

from mlx_lm import load, generate

model, tokenizer = load("mlxstudio/qwen3-4b-4bit-terminal")
prompt = "how to create a git repository"
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
print(response)

With LocalTerm (macOS app)

Model auto-downloads on first run. See LocalTerm.

Training Details

  • Method: QLoRA fine-tuning with mlx-lm
  • Iterations: 300
  • Learning Rate: 1e-5
  • Data Format: Qwen3 chat template with /nothink tag
  • Train/Test Split: Clean split (0% overlap)

Version History

  • v2 (2026-01-22): Re-fused model with correct weight format
    • Fixed: .linear. prefix issue in LoRA merged weights
    • Now compatible with mlx-swift-lm
  • v1 (2026-01-21): Initial release (had loading issues in Swift)

License

Apache 2.0 (same as base model)

Downloads last month
106
Safetensors
Model size
0.6B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlxstudio/qwen3-4b-4bit-terminal

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(1)
this model