ARO Teacher (30B)

The full 30B MoE teacher model, fine-tuned on ARO training data. This model is used for:

  • Distillation — generating high-quality training data for the smaller student model
  • Iterative retraining — serving as the starting point for the next training cycle
  • High-quality inference — when maximum accuracy is needed (at the cost of speed/memory)

For deployment and everyday use, prefer the distilled 8B student model: ARO-Lang/aro-coder-4bit

Architecture Qwen3 30B MoE (3.3B active parameters)
Base model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
Quantization 4-bit (MLX)
Size ~16 GB
Training source dpo

Usage

from mlx_lm import load, generate
model, tokenizer = load("ARO-Lang/aro-teacher-30b-4bit")

Or as a base for continued fine-tuning:

python -m mlx_lm lora --model ARO-Lang/aro-teacher-30b-4bit --data ./train_data --train

Links

License

MIT License

Downloads last month
68
Safetensors
Model size
31B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ARO-Lang/aro-teacher-30b-4bit