centmount
/

llm-sft-lora-intermediate20260215

structured-output

Model card Files Files and versions

llm-sft-lora-intermediate20260215 (SFT Intermediate Adapter)

This is the SFT-only intermediate adapter, before DPO training. Use this as the starting point for DPO experiments.

Base Model

Qwen/Qwen3-4B-Instruct-2507

Usage

Set DPO_SFT_SOURCE="centmount/llm-sft-lora-intermediate20260215" and RUN_MODE="dpo_only" to run DPO from this checkpoint.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for centmount/llm-sft-lora-intermediate20260215

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1816)

this model