tmdoi commited on
Commit
d7f3af1
·
verified ·
1 Parent(s): b7ef716

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -20,8 +20,9 @@ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Di
20
 
21
  This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
22
  You need to load the base model and then load this adapter using PEFT.
 
23
  The model before DPO can be viewed at the following URL.
24
- https://huggingface.co/tmdoi/lora-structeval-sft-0204
25
 
26
  ## Training Objective
27
  This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
 
20
 
21
  This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
22
  You need to load the base model and then load this adapter using PEFT.
23
+
24
  The model before DPO can be viewed at the following URL.
25
+ - https://huggingface.co/tmdoi/lora-structeval-sft-0204
26
 
27
  ## Training Objective
28
  This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.