sei0621
/

Qwen3-4B-Agent-DB-ALFWorld-v6-DPO

Model card Files Files and versions

Qwen3-4B-Agent-v6-DPO

This is the final version of the agent model, optimized using DPO (Direct Preference Optimization) after SFT and merging.

Improvements

Better SQL error recovery (Self-correction)
More stable ALFWorld trajectories

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sei0621/Qwen3-4B-Agent-DB-ALFWorld-v6-DPO

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1823)

this model