LLM2026_DPO_SFT19_v2
This model is a fine-tuned LoRA adapter of makotonlo/LLM2026_SFT_finalv19_7B using Direct Preference Optimization (DPO).
Training Configuration
- Base SFT Model: makotonlo/LLM2026_SFT_finalv19_7B
- Method: DPO
- Epochs: 1
- Learning rate: 1e-06
- Beta: 0.1
- Max sequence length: 1024
Usage
Load via the evaluation script's adapter_merge mode.
Model tree for makotonlo/LLM2026_DPO_SFT19_v2
Base model
Qwen/Qwen2.5-7B Finetuned
Qwen/Qwen2.5-7B-Instruct Quantized
unsloth/Qwen2.5-7B-Instruct-bnb-4bit