makotonlo commited on
Commit
21fb4ac
·
verified ·
1 Parent(s): 09069e8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -6,26 +6,27 @@ tags:
6
  - silent-expert
7
  - lora
8
  - adapter
 
9
  ---
10
 
11
  # LLM2026_DPO_SFT19_v13 (Silent Expert Version)
12
 
13
- This model is a LoRA adapter evolved from the SFT model **makotonlo/LLM2026_SFT_finalv19_7B**.
14
  It has been fine-tuned using **Direct Preference Optimization (DPO)** to eliminate conversational chatter and enforce strict raw data output.
15
 
16
  ## 🎯 Optimization Goal (Strict No-Preamble)
17
- The primary objective of this version is to ensure the model outputs **ONLY** raw data (JSON, XML, YAML, CSV) without any preambles (e.g., "Certainly!"), markdown backticks (```), or explanations.
18
 
19
  ## 🛠 Training Configuration
20
  - **Base Intelligence**: makotonlo/LLM2026_SFT_finalv19_7B (v19)
21
  - **Method**: DPO (Direct Preference Optimization)
22
  - **Learning Rate**: 5e-06
23
- - **Beta**: 0.1 (Strong penalty for conversational fillers)
24
  - **Max Steps**: 500
25
  - **LoRA Config**: r=64, alpha=64
26
 
27
  ## ⚠️ Important: Usage Note
28
- Please use the ChatML template for inference. The model is trained to start its response directly with data-starting characters like `{`, `[`, or `<`.
29
 
30
  ## Framework versions
31
  - PEFT 0.13.2
 
6
  - silent-expert
7
  - lora
8
  - adapter
9
+ - structured-output
10
  ---
11
 
12
  # LLM2026_DPO_SFT19_v13 (Silent Expert Version)
13
 
14
+ This model is a LoRA adapter evolved from the highly intelligent SFT model **makotonlo/LLM2026_SFT_finalv19_7B (v19)**.
15
  It has been fine-tuned using **Direct Preference Optimization (DPO)** to eliminate conversational chatter and enforce strict raw data output.
16
 
17
  ## 🎯 Optimization Goal (Strict No-Preamble)
18
+ The primary objective of this version is to ensure the model outputs **ONLY** raw data (JSON, XML, YAML, CSV) without any preambles (e.g., "Certainly!"), markdown backticks (```), or explanations, to comply with strict competition rules.
19
 
20
  ## 🛠 Training Configuration
21
  - **Base Intelligence**: makotonlo/LLM2026_SFT_finalv19_7B (v19)
22
  - **Method**: DPO (Direct Preference Optimization)
23
  - **Learning Rate**: 5e-06
24
+ - **Beta**: 0.1 (Strong penalty for conversational responses)
25
  - **Max Steps**: 500
26
  - **LoRA Config**: r=64, alpha=64
27
 
28
  ## ⚠️ Important: Usage Note
29
+ When using this model, please use the **ChatML** prompt template. The model is trained to ensure the output starts directly with `{`, `[`, or `<`.
30
 
31
  ## Framework versions
32
  - PEFT 0.13.2