Qwen3-4B-Agent-v6-DPO

This is the final version of the agent model, optimized using DPO (Direct Preference Optimization) after SFT and merging.

Improvements

  • Better SQL error recovery (Self-correction)
  • More stable ALFWorld trajectories
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sei0621/Qwen3-4B-Agent-DB-ALFWorld-v6-DPO

Finetuned
(1447)
this model