qwen3-4b-thinking-microagent

LoRA SFT pipeline + scripts + docs for fine-tuning Qwen/Qwen3-4B-Thinking-2507 into a terminal agent.

Target: beat 13% on Terminal-Bench 2.0 with a single A100-40GB.

What's in this repo

Path What
README.md top-level overview
docs/PROJECT_OVERVIEW.md project goals + status
docs/DATA_PIPELINE.md how the training corpus is built
docs/FILTER_DESIGN.md filter rules deep dive
docs/MODEL_SELECTION.md why Qwen3-4B-Thinking-2507 vs alternatives
docs/HPC_PRINCIPLES.md single-A100 training optimization playbook
docs/REPRODUCIBILITY.md step-by-step reproduction guide
docs/VAST_AI_SETUP.md running on cheap rental A100s
docs/CHANGELOG.md v1 → v2 changes
scripts/run_pipeline_v2.py builds the training corpus
scripts/convert_code_v2.py code-specific filter (recovery + give_up)
scripts/rewrite_giveups.py retrospective give_up rewriter
scripts/train_v2.py HPC-grade LoRA training (Unsloth + packing + FA2)
scripts/setup_a100.sh one-shot A100 installer
scripts/merge_lora.py adapter → merged model for vLLM serving
data/pipeline_v2_log.txt full v2 pipeline run log

Training corpus

Lives in a separate repo: prometheus04/microagent-train-v2 (26,627 trajectories, ~1 GB).

Why this exists

There's a lot of public commentary about training small agents on terminal-style data. There's much less executable code you can run. This repo is the end-to-end recipe — corpus build, filter design rationale, HPC-optimized training, and the reasoning behind every choice.

Headline numbers (corpus)

  • 26,627 trajectories, ~244M training tokens
  • 81.7% multi-turn (≥6 turns), avg ~8.5 assistant turns
  • 5.1% <give_up> examples for honest failure handling
  • Math content: 0% (deliberately dropped)
  • Code content: 48.4%

Headline numbers (training, projected)

  • A100-40GB single-GPU
  • 4–5 hours wall time for 1 epoch
  • ~$5 cost on Vast.ai
  • ~80MB final LoRA adapter

How to run

See docs/REPRODUCIBILITY.md for the full step-by-step.

Short version:

git clone https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent
cd qwen3-4b-thinking-microagent
huggingface-cli download prometheus04/microagent-train-v2 \
  --repo-type dataset --local-dir data
bash scripts/setup_a100.sh
python scripts/train_v2.py --output-dir runs/v1 --epochs 1.0

Format the model learns

<think>brief reasoning</think>
<bash>shell commands</bash>

Or to end:

<think>verification</think>
<finish>one-line summary</finish>

Or honest stop:

<think>three approaches all failed; out of turns</think>
<give_up>tried 3 distinct approaches; last failure: NameError: name 'x' is not defined</give_up>

License

MIT for code. Base model is Apache 2.0. Training corpus derived from Nvidia's Nemotron-Terminal-Corpus (NVIDIA Open Model License).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prometheus04/qwen3-4b-thinking-microagent

Adapter
(25)
this model

Dataset used to train prometheus04/qwen3-4b-thinking-microagent