Pinkstack commited on
Commit
6973458
·
verified ·
1 Parent(s): 4c63228

Upload 10 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - multilingual
6
+ base_model:
7
+ - Qwen/Qwen3-0.6B-Base
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - reasoning
11
+ - code
12
+ - math
13
+ - syngen
14
+ ---
15
+
16
+ # 👋 SYNGEN REASONING 0.6B
17
+ This is a 0.6B parameter LLM designed for synthetic reasoning generation between final model outputs, specifically for dataset modifications, but can be used for multiple use cases which require reasoning.
18
+
19
+ For example, this model allows you to turn any chat dataset into a reasoning dataset as if it was generated by DeepSeek R1 or Openai's GPT OSS!
20
+
21
+ # 🤔 HOW TO USE
22
+
23
+ You should set the following:
24
+ 1) ``MIN P = 0.05``
25
+ 2) ```TOP P = 0.85```
26
+ 3) ```TEMPERATURE = 0.55```
27
+ 4) ```MAX_NEW_TOKENS = 16384``` (can be higher or lower, up to 32k)
28
+
29
+ Syngen uses the CHATML format, but it is not a chat model. This is how you do interference with it:
30
+ ```jinja2
31
+ <|im_start|>system
32
+
33
+
34
+ THINKING GENERATION ONLY = TRUE
35
+
36
+ THINKING GENERATION STYLE = (DEEPSEEK_R1 OR GPT_OSS)<|im_end|>
37
+ <|im_start|>user
38
+ Generate thinking for the following:
39
+
40
+ The user message was: {human_msg}
41
+
42
+ The assistant response was: {assistant_msg}
43
+
44
+ Original system prompt was:{system_msg} (for GPT OSS please put "Reasoning effort: high/medium/low")<|im_end|>
45
+ <|im_start|>assistant
46
+ ```
47
+
48
+
49
+ here are some examples:
50
+
51
+ ## Deepseek r1 possible reasoning
52
+
53
+ ```jinja2
54
+ <|im_start|>system
55
+
56
+
57
+ THINKING GENERATION ONLY = TRUE
58
+
59
+ THINKING GENERATION STYLE = DEEPSEEK_R1<|im_end|>
60
+ <|im_start|>user
61
+ Generate thinking for the following:
62
+
63
+ The user message was: Your response should contain at least 3 sentences. Your answer must contain exactly 3 bullet points. Use the markdown bullet points such as: * This is point 1. Include keywords [food, water, shelter] in the response.
64
+
65
+ The assistant response was:* To ensure survival, one must have access to adequate food which provides the necessary nutrients and energy.
66
+ * Water is essential for hydration and maintaining bodily functions, and it is crucial to have a clean and reliable source.
67
+ * Shelter protects individuals from the elements and provides a safe environment, which is fundamental for overall well-being.
68
+
69
+ Original system prompt was:You are an Ai<|im_end|>
70
+ <|im_start|>assistant
71
+ <generated_thinking_dpsk>
72
+ Okay, the user wants me to generate a response with three bullet points using markdown. The keywords they provided are food, water, and shelter. Let's think about how to structure each bullet point.
73
+
74
+ First, maybe start with something related to food. Need to make sure it includes all three words. Maybe mention needing food to sustain life or staying hydrated? That covers both water and food indirectly but directly links them together. Then talk about shelter because without protection, you can't eat properly or stay alive.
75
+
76
+ Wait, do I need to include water again here? If shelter means avoiding dehydration, then yes. But let's check if the first bullet starts with "To ensure..." that ties everything together. Also, keep it concise but cover all required keywords. Let me draft:
77
+
78
+ * To ensure survival, one must have access to adequate food which provides the necessary nutrients and energy.
79
+ * Water is essential for hydration and maintaining bodily functions; it’s critical to have a clean and reliable source.
80
+ * Shelter protects individuals from the elements and offers a safe environment—key factors for overall well-being.
81
+
82
+ That works. Each bullet has two key points (food and water) and shelters. Check keyword usage: food, water, shelter. Yes. Three bullets. Good. Make sure each sentence ends with period. Alright.
83
+ </generated_thinking_dpsk>
84
+ ```
85
+
86
+ ## GPT OSS possible reasoning
87
+ ```jinja2
88
+ <|im_start|>system
89
+
90
+
91
+ THINKING GENERATION ONLY = TRUE
92
+
93
+ THINKING GENERATION STYLE = GPT_OSS<|im_end|>
94
+ <|im_start|>user
95
+ Generate thinking for the following:
96
+
97
+ The user message was: Hello!
98
+
99
+ The assistant response was: Hi! How may I assist you today?
100
+
101
+ Original system prompt was:You are an Ai
102
+ Reasoning effort: low<|im_end|>
103
+ <|im_start|>assistant
104
+ <generated_thinking_gpt>
105
+ Must produce a simple greeting.
106
+ </generated_thinking_gpt>
107
+ ```
added_tokens.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</generated_thinking_dpsk>": 151672,
3
+ "</generated_thinking_gpt>": 151670,
4
+ "</think>": 151668,
5
+ "</tool_call>": 151658,
6
+ "</tool_response>": 151666,
7
+ "<generated_thinking_dpsk>": 151671,
8
+ "<generated_thinking_gpt>": 151669,
9
+ "<think>": 151667,
10
+ "<tool_call>": 151657,
11
+ "<tool_response>": 151665,
12
+ "<|box_end|>": 151649,
13
+ "<|box_start|>": 151648,
14
+ "<|endoftext|>": 151643,
15
+ "<|file_sep|>": 151664,
16
+ "<|fim_middle|>": 151660,
17
+ "<|fim_pad|>": 151662,
18
+ "<|fim_prefix|>": 151659,
19
+ "<|fim_suffix|>": 151661,
20
+ "<|im_end|>": 151645,
21
+ "<|im_start|>": 151644,
22
+ "<|image_pad|>": 151655,
23
+ "<|object_ref_end|>": 151647,
24
+ "<|object_ref_start|>": 151646,
25
+ "<|quad_end|>": 151651,
26
+ "<|quad_start|>": 151650,
27
+ "<|repo_name|>": 151663,
28
+ "<|video_pad|>": 151656,
29
+ "<|vision_end|>": 151653,
30
+ "<|vision_pad|>": 151654,
31
+ "<|vision_start|>": 151652,
32
+ "DEEPSEEK_R1": 151674,
33
+ "GPT_OSS": 151673
34
+ }
config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 151643,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_types": [
14
+ "full_attention",
15
+ "full_attention",
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention"
42
+ ],
43
+ "max_position_embeddings": 32768,
44
+ "max_window_layers": 28,
45
+ "model_type": "qwen3",
46
+ "num_attention_heads": 16,
47
+ "num_hidden_layers": 28,
48
+ "num_key_value_heads": 8,
49
+ "pad_token_id": 151654,
50
+ "rms_norm_eps": 1e-06,
51
+ "rope_scaling": null,
52
+ "rope_theta": 1000000,
53
+ "sliding_window": null,
54
+ "tie_word_embeddings": true,
55
+ "torch_dtype": "bfloat16",
56
+ "transformers_version": "4.51.3",
57
+ "unsloth_fixed": true,
58
+ "unsloth_version": "2025.8.10",
59
+ "use_cache": true,
60
+ "use_sliding_window": false,
61
+ "vocab_size": 151675
62
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "eos_token_id": 151643,
4
+ "max_length": 32768,
5
+ "max_new_tokens": 32768,
6
+ "pad_token_id": 151654,
7
+ "transformers_version": "4.51.3"
8
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:216403be9fc9d94faee5fd100d75dc08ee92b0ff2ba8860552bb3f08e7dca13d
3
+ size 1191600568
special_tokens_map.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<generated_thinking_gpt>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "</generated_thinking_gpt>",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<generated_thinking_dpsk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "</generated_thinking_dpsk>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "GPT_OSS",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ {
39
+ "content": "DEEPSEEK_R1",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ ],
46
+ "eos_token": {
47
+ "content": "<|endoftext|>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false
52
+ },
53
+ "pad_token": {
54
+ "content": "<|vision_pad|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false
59
+ }
60
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a5ad4633a6717f72f3454ce5b54123e3ae64bc6994b1852c0769ec0f8b59cb9
3
+ size 11423834
tokenizer_config.json ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "151669": {
214
+ "content": "<generated_thinking_gpt>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "151670": {
222
+ "content": "</generated_thinking_gpt>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "151671": {
230
+ "content": "<generated_thinking_dpsk>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ },
237
+ "151672": {
238
+ "content": "</generated_thinking_dpsk>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": true
244
+ },
245
+ "151673": {
246
+ "content": "GPT_OSS",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": true
252
+ },
253
+ "151674": {
254
+ "content": "DEEPSEEK_R1",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": true
260
+ }
261
+ },
262
+ "additional_special_tokens": [
263
+ "<generated_thinking_gpt>",
264
+ "</generated_thinking_gpt>",
265
+ "<generated_thinking_dpsk>",
266
+ "</generated_thinking_dpsk>",
267
+ "GPT_OSS",
268
+ "DEEPSEEK_R1"
269
+ ],
270
+ "bos_token": null,
271
+ "clean_up_tokenization_spaces": false,
272
+ "eos_token": "<|endoftext|>",
273
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}<|im_start|>system\n{{ message['content'] }}<|im_end|>\n{% elif message['role'] == 'user' %}<|im_start|>user\n{{ message['content'] }}<|im_end|>\n{% elif message['role'] == 'assistant' %}<|im_start|>assistant\n{{ message['content'] }}<|im_end|>\n{% endif %}{% endfor %}",
274
+ "errors": "replace",
275
+ "extra_special_tokens": {},
276
+ "model_max_length": 32768,
277
+ "pad_token": "<|vision_pad|>",
278
+ "padding_side": "right",
279
+ "split_special_tokens": false,
280
+ "tokenizer_class": "Qwen2Tokenizer",
281
+ "unk_token": null
282
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff