Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,8 @@ This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Di
|
|
| 20 |
|
| 21 |
This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
|
| 22 |
You need to load the base model and then load this adapter using PEFT.
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Training Objective
|
| 25 |
This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
|
|
|
|
| 20 |
|
| 21 |
This repository contains a **LoRA adapter** trained with DPO on top of the SFT adapter.
|
| 22 |
You need to load the base model and then load this adapter using PEFT.
|
| 23 |
+
The model before DPO can be viewed at the following URL.
|
| 24 |
+
https://huggingface.co/tmdoi/lora-structeval-sft-0204
|
| 25 |
|
| 26 |
## Training Objective
|
| 27 |
This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
|