Qwen3.5 9B - Pi mono tune

Training Data

This model was trained on badlogicgames/pi-mono agent traces at 24k context (80% of the training examples), TeichAI/Claude-Opus-4.6-Reasoning-887x was downsampled and mixed in for stability.

Furthermore the qwen3.6 chat template was used to tune this model. preserve_thinking and enable_thinking are both supported by the model now. (preserve_thinking is still experimental)

Goal

The goal was to make a model small enough to run on consumer hardware, capable of working in long context agent scenarios

How it turned out

The model seems like it picked up some stylistic qualities from both the various models, will have to try again later with some other data from the same model source.

Other than that though, overall it's ability to work in agent harnesses have improved. Code quality is potentially degraded (unsure as i haven't used the base model as a coding agent for reference).

I recommend using the model with the pi agent harness, connected via the pi lm_studio extension.

LoRA adapter can be found here


  • Developed by: armand0e
  • License: apache-2.0
  • Finetuned from model : unsloth/Qwen3.5-9B

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
26
Safetensors
Model size
10B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for armand0e/Qwen3.5-9B-Pi-Agent

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(81)
this model

Datasets used to train armand0e/Qwen3.5-9B-Pi-Agent