Instructions to use prometheus04/qwen3-4b-thinking-microagent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use prometheus04/qwen3-4b-thinking-microagent with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
qwen3-4b-thinking-microagent
LoRA SFT pipeline + scripts + docs for fine-tuning
Qwen/Qwen3-4B-Thinking-2507
into a terminal agent.
Target: beat 13% on Terminal-Bench 2.0 with a single A100-40GB.
What's in this repo
| Path | What |
|---|---|
README.md |
top-level overview |
docs/PROJECT_OVERVIEW.md |
project goals + status |
docs/DATA_PIPELINE.md |
how the training corpus is built |
docs/FILTER_DESIGN.md |
filter rules deep dive |
docs/MODEL_SELECTION.md |
why Qwen3-4B-Thinking-2507 vs alternatives |
docs/HPC_PRINCIPLES.md |
single-A100 training optimization playbook |
docs/REPRODUCIBILITY.md |
step-by-step reproduction guide |
docs/VAST_AI_SETUP.md |
running on cheap rental A100s |
docs/CHANGELOG.md |
v1 → v2 changes |
scripts/run_pipeline_v2.py |
builds the training corpus |
scripts/convert_code_v2.py |
code-specific filter (recovery + give_up) |
scripts/rewrite_giveups.py |
retrospective give_up rewriter |
scripts/train_v2.py |
HPC-grade LoRA training (Unsloth + packing + FA2) |
scripts/setup_a100.sh |
one-shot A100 installer |
scripts/merge_lora.py |
adapter → merged model for vLLM serving |
data/pipeline_v2_log.txt |
full v2 pipeline run log |
Training corpus
Lives in a separate repo:
prometheus04/microagent-train-v2
(26,627 trajectories, ~1 GB).
Why this exists
There's a lot of public commentary about training small agents on terminal-style data. There's much less executable code you can run. This repo is the end-to-end recipe — corpus build, filter design rationale, HPC-optimized training, and the reasoning behind every choice.
Headline numbers (corpus)
- 26,627 trajectories, ~244M training tokens
- 81.7% multi-turn (≥6 turns), avg ~8.5 assistant turns
- 5.1%
<give_up>examples for honest failure handling - Math content: 0% (deliberately dropped)
- Code content: 48.4%
Headline numbers (training, projected)
- A100-40GB single-GPU
- 4–5 hours wall time for 1 epoch
- ~$5 cost on Vast.ai
- ~80MB final LoRA adapter
How to run
See docs/REPRODUCIBILITY.md
for the full step-by-step.
Short version:
git clone https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent
cd qwen3-4b-thinking-microagent
huggingface-cli download prometheus04/microagent-train-v2 \
--repo-type dataset --local-dir data
bash scripts/setup_a100.sh
python scripts/train_v2.py --output-dir runs/v1 --epochs 1.0
Format the model learns
<think>brief reasoning</think>
<bash>shell commands</bash>
Or to end:
<think>verification</think>
<finish>one-line summary</finish>
Or honest stop:
<think>three approaches all failed; out of turns</think>
<give_up>tried 3 distinct approaches; last failure: NameError: name 'x' is not defined</give_up>
License
MIT for code. Base model is Apache 2.0. Training corpus derived from Nvidia's Nemotron-Terminal-Corpus (NVIDIA Open Model License).
- Downloads last month
- -
Model tree for prometheus04/qwen3-4b-thinking-microagent
Base model
Qwen/Qwen3-4B-Thinking-2507