tmdoi
/

lora-structeval-sft-0204-merged

Text Generation

Model card Files Files and versions

tmdoi commited on Feb 6

Commit

d7f3af1

·

verified ·

1 Parent(s): b7ef716

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -20,8 +20,9 @@ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Di
 This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
 You need to load the base model and then load this adapter using PEFT.
 The model before DPO can be viewed at the following URL.
-https://huggingface.co/tmdoi/lora-structeval-sft-0204
 ## Training Objective
 This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.

 This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
 You need to load the base model and then load this adapter using PEFT.
 The model before DPO can be viewed at the following URL.
+- https://huggingface.co/tmdoi/lora-structeval-sft-0204
 ## Training Objective
 This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.