Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -6,26 +6,27 @@ tags:
|
|
| 6 |
- silent-expert
|
| 7 |
- lora
|
| 8 |
- adapter
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
# LLM2026_DPO_SFT19_v13 (Silent Expert Version)
|
| 12 |
|
| 13 |
-
This model is a LoRA adapter evolved from the SFT model **makotonlo/LLM2026_SFT_finalv19_7B**.
|
| 14 |
It has been fine-tuned using **Direct Preference Optimization (DPO)** to eliminate conversational chatter and enforce strict raw data output.
|
| 15 |
|
| 16 |
## 🎯 Optimization Goal (Strict No-Preamble)
|
| 17 |
-
The primary objective of this version is to ensure the model outputs **ONLY** raw data (JSON, XML, YAML, CSV) without any preambles (e.g., "Certainly!"), markdown backticks (```), or explanations.
|
| 18 |
|
| 19 |
## 🛠 Training Configuration
|
| 20 |
- **Base Intelligence**: makotonlo/LLM2026_SFT_finalv19_7B (v19)
|
| 21 |
- **Method**: DPO (Direct Preference Optimization)
|
| 22 |
- **Learning Rate**: 5e-06
|
| 23 |
-
- **Beta**: 0.1 (Strong penalty for conversational
|
| 24 |
- **Max Steps**: 500
|
| 25 |
- **LoRA Config**: r=64, alpha=64
|
| 26 |
|
| 27 |
## ⚠️ Important: Usage Note
|
| 28 |
-
|
| 29 |
|
| 30 |
## Framework versions
|
| 31 |
- PEFT 0.13.2
|
|
|
|
| 6 |
- silent-expert
|
| 7 |
- lora
|
| 8 |
- adapter
|
| 9 |
+
- structured-output
|
| 10 |
---
|
| 11 |
|
| 12 |
# LLM2026_DPO_SFT19_v13 (Silent Expert Version)
|
| 13 |
|
| 14 |
+
This model is a LoRA adapter evolved from the highly intelligent SFT model **makotonlo/LLM2026_SFT_finalv19_7B (v19)**.
|
| 15 |
It has been fine-tuned using **Direct Preference Optimization (DPO)** to eliminate conversational chatter and enforce strict raw data output.
|
| 16 |
|
| 17 |
## 🎯 Optimization Goal (Strict No-Preamble)
|
| 18 |
+
The primary objective of this version is to ensure the model outputs **ONLY** raw data (JSON, XML, YAML, CSV) without any preambles (e.g., "Certainly!"), markdown backticks (```), or explanations, to comply with strict competition rules.
|
| 19 |
|
| 20 |
## 🛠 Training Configuration
|
| 21 |
- **Base Intelligence**: makotonlo/LLM2026_SFT_finalv19_7B (v19)
|
| 22 |
- **Method**: DPO (Direct Preference Optimization)
|
| 23 |
- **Learning Rate**: 5e-06
|
| 24 |
+
- **Beta**: 0.1 (Strong penalty for conversational responses)
|
| 25 |
- **Max Steps**: 500
|
| 26 |
- **LoRA Config**: r=64, alpha=64
|
| 27 |
|
| 28 |
## ⚠️ Important: Usage Note
|
| 29 |
+
When using this model, please use the **ChatML** prompt template. The model is trained to ensure the output starts directly with `{`, `[`, or `<`.
|
| 30 |
|
| 31 |
## Framework versions
|
| 32 |
- PEFT 0.13.2
|