tmdoi commited on
Commit
b7ef716
·
verified ·
1 Parent(s): 719063d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -20,6 +20,8 @@ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Di
20
 
21
  This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
22
  You need to load the base model and then load this adapter using PEFT.
 
 
23
 
24
  ## Training Objective
25
  This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
 
20
 
21
  This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
22
  You need to load the base model and then load this adapter using PEFT.
23
+ The model before DPO can be viewed at the following URL.
24
+ https://huggingface.co/tmdoi/lora-structeval-sft-0204
25
 
26
  ## Training Objective
27
  This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.