tmdoi
/

lora-structeval-sft-0204-merged

Text Generation

Model card Files Files and versions

tmdoi commited on 26 days ago

Commit

b7ef716

·

verified ·

1 Parent(s): 719063d

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -20,6 +20,8 @@ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Di
 This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
 You need to load the base model and then load this adapter using PEFT.
 ## Training Objective
 This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.

 This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
 You need to load the base model and then load this adapter using PEFT.
+The model before DPO can be viewed at the following URL.
+https://huggingface.co/tmdoi/lora-structeval-sft-0204
 ## Training Objective
 This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.