LLM2026_DPO_SFT19_v2

This model is a fine-tuned LoRA adapter of makotonlo/LLM2026_SFT_finalv19_7B using Direct Preference Optimization (DPO).

Training Configuration

  • Base SFT Model: makotonlo/LLM2026_SFT_finalv19_7B
  • Method: DPO
  • Epochs: 1
  • Learning rate: 1e-06
  • Beta: 0.1
  • Max sequence length: 1024

Usage

Load via the evaluation script's adapter_merge mode.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for makotonlo/LLM2026_DPO_SFT19_v2

Dataset used to train makotonlo/LLM2026_DPO_SFT19_v2