makotonlo
/

LLM2026_DPO_SFT19_v13

@@ -6,26 +6,27 @@ tags:
 - silent-expert
 - lora
 - adapter
 ---
 # LLM2026_DPO_SFT19_v13 (Silent Expert Version)
-This model is a LoRA adapter evolved from the SFT model **makotonlo/LLM2026_SFT_finalv19_7B**.
 It has been fine-tuned using **Direct Preference Optimization (DPO)** to eliminate conversational chatter and enforce strict raw data output.
 ## 🎯 Optimization Goal (Strict No-Preamble)
-The primary objective of this version is to ensure the model outputs **ONLY** raw data (JSON, XML, YAML, CSV) without any preambles (e.g., "Certainly!"), markdown backticks (```), or explanations.
 ## 🛠 Training Configuration
 - **Base Intelligence**: makotonlo/LLM2026_SFT_finalv19_7B (v19)
 - **Method**: DPO (Direct Preference Optimization)
 - **Learning Rate**: 5e-06
-- **Beta**: 0.1 (Strong penalty for conversational fillers)
 - **Max Steps**: 500
 - **LoRA Config**: r=64, alpha=64
 ## ⚠️ Important: Usage Note
-Please use the ChatML template for inference. The model is trained to start its response directly with data-starting characters like `{`, `[`, or `<`.
 ## Framework versions
 - PEFT 0.13.2

 - silent-expert
 - lora
 - adapter
+- structured-output
 ---
 # LLM2026_DPO_SFT19_v13 (Silent Expert Version)
+This model is a LoRA adapter evolved from the highly intelligent SFT model **makotonlo/LLM2026_SFT_finalv19_7B (v19)**.
 It has been fine-tuned using **Direct Preference Optimization (DPO)** to eliminate conversational chatter and enforce strict raw data output.
 ## 🎯 Optimization Goal (Strict No-Preamble)
+The primary objective of this version is to ensure the model outputs **ONLY** raw data (JSON, XML, YAML, CSV) without any preambles (e.g., "Certainly!"), markdown backticks (```), or explanations, to comply with strict competition rules.
 ## 🛠 Training Configuration
 - **Base Intelligence**: makotonlo/LLM2026_SFT_finalv19_7B (v19)
 - **Method**: DPO (Direct Preference Optimization)
 - **Learning Rate**: 5e-06
+- **Beta**: 0.1 (Strong penalty for conversational responses)
 - **Max Steps**: 500
 - **LoRA Config**: r=64, alpha=64
 ## ⚠️ Important: Usage Note
+When using this model, please use the **ChatML** prompt template. The model is trained to ensure the output starts directly with `{`, `[`, or `<`.
 ## Framework versions
 - PEFT 0.13.2