sonodd commited on
Commit
51b6c40
·
verified ·
1 Parent(s): c769357

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -12
README.md CHANGED
@@ -1,21 +1,71 @@
1
  ---
2
  base_model: sonodd/qwen3-4b-structeval-sft-v6c-yaml-xml-focus-merged
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** sonodd
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** sonodd/qwen3-4b-structeval-sft-v6c-yaml-xml-focus-merged
 
 
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
1
  ---
2
  base_model: sonodd/qwen3-4b-structeval-sft-v6c-yaml-xml-focus-merged
3
+ datasets:
4
+ - u-10bei/dpo-dataset-qwen-cot
 
 
 
 
5
  language:
6
  - en
7
+ license: apache-2.0
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - dpo
12
+ - unsloth
13
+ - qwen
14
+ - alignment
15
+ - structured-output
16
  ---
17
 
18
+ # Qwen3-4B StructEval qwen3-4b-structeval-dpo-v6c
19
+
20
+ This model is a fine-tuned version of **sonodd/qwen3-4b-structeval-sft-v6c-yaml-xml-focus-merged** using **Direct Preference Optimization (DPO)**
21
+ via the **Unsloth** library.
22
+
23
+ This repository contains the **full-merged 16-bit weights**. No adapter loading is required.
24
+
25
+ ## Training Objective
26
+
27
+ This model has been optimized using DPO to align its responses with preferred outputs,
28
+ focusing on improving structured output quality (JSON, YAML, XML, TOML, CSV).
29
+
30
+ ## Training Configuration
31
+
32
+ - **Base model**: sonodd/qwen3-4b-structeval-sft-v6c-yaml-xml-focus-merged
33
+ - **SFT Adapter**: None (merged SFT used as base)
34
+ - **Method**: DPO (Direct Preference Optimization)
35
+ - **Epochs**: 1
36
+ - **Learning rate**: 1e-07
37
+ - **Beta**: 0.1
38
+ - **Max sequence length**: 1024
39
+ - **LoRA Config**: r=8, alpha=16 (merged into base)
40
+
41
+ ## Usage
42
+
43
+ Since this is a merged model, you can use it directly with `transformers`.
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+ import torch
48
+
49
+ model_id = "sonodd/qwen3-4b-structeval-dpo-v6c"
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ model_id,
54
+ torch_dtype=torch.float16,
55
+ device_map="auto"
56
+ )
57
+ ```
58
+
59
+ ## Inference with Standard Code 2
60
 
61
+ For inference using the competition's standard code 2, set:
62
+ ```python
63
+ MODEL_SOURCE = "merged"
64
+ MERGED_MODEL_ID_OR_PATH = "sonodd/qwen3-4b-structeval-dpo-v6c"
65
+ ```
66
 
67
+ ## Sources & License (IMPORTANT)
68
 
69
+ * **Training Data**: [u-10bei/dpo-dataset-qwen-cot](https://huggingface.co/datasets/u-10bei/dpo-dataset-qwen-cot)
70
+ * **License**: MIT License (as per dataset terms)
71
+ * **Compliance**: Users must follow the original base model's license terms.