tenyyprn
/

qwen3-4b-structeval-exp13

@@ -1,72 +1,104 @@
 ---
-library_name: peft
-model_name: lora_exp13_dpo
 tags:
-- base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
 - dpo
 - lora
-- transformers
-- trl
-licence: license
-base_model: Qwen/Qwen3-4B-Instruct-2507
-pipeline_tag: text-generation
 ---
-# Model Card for lora_exp13_dpo
-This model is a fine-tuned version of [None](https://huggingface.co/None).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
-## Training procedure
-This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
-### Framework versions
-- PEFT 0.18.1
-- TRL: 0.28.0
-- Transformers: 5.1.0
-- Pytorch: 2.10.0+cu128
-- Datasets: 4.5.0
-- Tokenizers: 0.22.2
-## Citations
-Cite DPO as:
 ```bibtex
 @inproceedings{rafailov2023direct,
     title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
     author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
     year         = 2023,
-    booktitle    = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
     url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
-    editor       = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
 }
 ```
-Cite TRL as:
-```bibtex
-@software{vonwerra2020trl,
-  title   = {{TRL: Transformers Reinforcement Learning}},
-  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
-  license = {Apache-2.0},
-  url     = {https://github.com/huggingface/trl},
-  year    = {2020}
-}
-```

 ---
+base_model: Qwen/Qwen3-4B-Instruct-2507
+datasets:
+- u-10bei/dpo-dataset-qwen-cot
+language:
+- en
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 tags:
 - dpo
 - lora
+- peft
+- qwen
+- structured-data
+- alignment
 ---
+# Qwen3-4B Structured Data Expert (Exp13 - DPO with System Prompt)
+This model is a fine-tuned version of **[Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)** using **Direct Preference Optimization (DPO)**.
+This repository contains a **LoRA adapter** trained for structured data generation tasks (JSON, YAML, TOML, XML, CSV, etc.).
+## Key Feature
+Training and inference formats are **fully aligned** by embedding the system prompt into DPO training data, which significantly improves output quality.
+## Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Base model | Qwen/Qwen3-4B-Instruct-2507 + SFT (Exp5) |
+| Method | DPO (Direct Preference Optimization) |
+| Dataset | u-10bei/dpo-dataset-qwen-cot |
+| LoRA rank (r) | 16 |
+| LoRA alpha | 32 |
+| Learning rate | 5e-7 |
+| Epochs | 2 |
+| Batch size | 4 (grad accum: 2) |
+| Beta | 0.1 |
+| Max length | 1024 |
+| Max prompt length | 512 |
+| Optimizer | AdamW |
+| Warmup ratio | 0.1 |
+| Seed | 3407 |
+## System Prompt (used at inference)
+```
+You are a structured data expert. Output the requested format directly without any explanation, preamble, or markdown code blocks. Do not write ```json, ```yaml, ```toml, ```xml, ```csv or similar. Output only the raw structured data.
 ```
+## Key Improvements over baseline
+- **System prompt embedded in DPO training**: Training and inference formats are fully consistent
+- **Clean chosen responses**: Only the structured data portion extracted (no code blocks, no preamble)
+- **Code block suppression**: 0% code block usage at inference (vs ~70% in base DPO)
+## Inference Example
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+BASE_MODEL_ID = "Qwen/Qwen3-4B-Instruct-2507"
+ADAPTER_PATH = "tenyyprn/qwen3-4b-structeval-exp13"
+SYSTEM_PROMPT = (
+    "You are a structured data expert. "
+    "Output the requested format directly without any explanation, "
+    "preamble, or markdown code blocks. "
+    "Do not write ```json, ```yaml, ```toml, ```xml, ```csv or similar. "
+    "Output only the raw structured data."
+)
+tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(BASE_MODEL_ID, torch_dtype=torch.float16, device_map="auto")
+model = PeftModel.from_pretrained(model, ADAPTER_PATH)
+model = model.merge_and_unload()
+model.eval()
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {"role": "user", "content": "Convert to JSON: name=Alice, age=30, city=Tokyo"},
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
+print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
+```
+## Citations
 ```bibtex
 @inproceedings{rafailov2023direct,
     title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
     author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
     year         = 2023,
+    booktitle    = {Advances in Neural Information Processing Systems 36},
     url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
 }
 ```