qwen3-4b-thinking-microagent

LoRA SFT pipeline + scripts + docs for fine-tuning Qwen/Qwen3-4B-Thinking-2507 into a terminal agent.

Target: beat 13% on Terminal-Bench 2.0 with a single A100-40GB.

What's in this repo

Path	What
`README.md`	top-level overview
`docs/PROJECT_OVERVIEW.md`	project goals + status
`docs/DATA_PIPELINE.md`	how the training corpus is built
`docs/FILTER_DESIGN.md`	filter rules deep dive
`docs/MODEL_SELECTION.md`	why Qwen3-4B-Thinking-2507 vs alternatives
`docs/HPC_PRINCIPLES.md`	single-A100 training optimization playbook
`docs/REPRODUCIBILITY.md`	step-by-step reproduction guide
`docs/VAST_AI_SETUP.md`	running on cheap rental A100s
`docs/CHANGELOG.md`	v1 → v2 changes
`scripts/run_pipeline_v2.py`	builds the training corpus
`scripts/convert_code_v2.py`	code-specific filter (recovery + give_up)
`scripts/rewrite_giveups.py`	retrospective give_up rewriter
`scripts/train_v2.py`	HPC-grade LoRA training (Unsloth + packing + FA2)
`scripts/setup_a100.sh`	one-shot A100 installer
`scripts/merge_lora.py`	adapter → merged model for vLLM serving
`data/pipeline_v2_log.txt`	full v2 pipeline run log

Training corpus

Lives in a separate repo: prometheus04/microagent-train-v2 (26,627 trajectories, ~1 GB).

Why this exists

There's a lot of public commentary about training small agents on terminal-style data. There's much less executable code you can run. This repo is the end-to-end recipe — corpus build, filter design rationale, HPC-optimized training, and the reasoning behind every choice.

Headline numbers (corpus)

26,627 trajectories, ~244M training tokens
81.7% multi-turn (≥6 turns), avg ~8.5 assistant turns
5.1% <give_up> examples for honest failure handling
Math content: 0% (deliberately dropped)
Code content: 48.4%

Headline numbers (training, projected)

A100-40GB single-GPU
4–5 hours wall time for 1 epoch
~$5 cost on Vast.ai
~80MB final LoRA adapter

How to run

See docs/REPRODUCIBILITY.md for the full step-by-step.

Short version:

git clone https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent
cd qwen3-4b-thinking-microagent
huggingface-cli download prometheus04/microagent-train-v2 \
  --repo-type dataset --local-dir data
bash scripts/setup_a100.sh
python scripts/train_v2.py --output-dir runs/v1 --epochs 1.0

Format the model learns

<think>brief reasoning</think>
<bash>shell commands</bash>

Or to end:

<think>verification</think>
<finish>one-line summary</finish>

Or honest stop:

<think>three approaches all failed; out of turns</think>
<give_up>tried 3 distinct approaches; last failure: NameError: name 'x' is not defined</give_up>

License

MIT for code. Base model is Apache 2.0. Training corpus derived from Nvidia's Nemotron-Terminal-Corpus (NVIDIA Open Model License).

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prometheus04/qwen3-4b-thinking-microagent

Base model

Qwen/Qwen3-4B-Thinking-2507

Adapter

(25)

this model

prometheus04
/

qwen3-4b-thinking-microagent