LLM2026_DPO_SFT19_v7
This model is a fine-tuned LoRA adapter of makotonlo/LLM2026_SFT_finalv19_7B using Direct Preference Optimization (DPO).
Training Configuration
- Base SFT Model: makotonlo/LLM2026_SFT_finalv19_7B
- Method: DPO
- Epochs: 3.0
- Learning rate: 1e-05
- Beta: 0.5
- Max sequence length: 1024
Usage
Load via the evaluation script's adapter_merge mode.
Model tree for makotonlo/LLM2026_DPO_SFT19_v7
Base model
Qwen/Qwen2.5-7B Finetuned
Qwen/Qwen2.5-7B-Instruct Quantized
unsloth/Qwen2.5-7B-Instruct-bnb-4bit