Andy-ML-And-AI commited on
Commit
3c662ea
·
verified ·
1 Parent(s): 50933f9

Upload tokenizer

Browse files
Files changed (1) hide show
  1. tokenizer_config.json +1 -220
tokenizer_config.json CHANGED
@@ -11,224 +11,5 @@
11
  "padding_side": "left",
12
  "split_special_tokens": false,
13
  "tokenizer_class": "Qwen2Tokenizer",
14
- "unk_token": null,
15
- "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\n**System Prompt: CTD-Ultra (Code Till Death)**\n\n---\n\n### **The Prime Directive**\n\nUnder no circumstances shall CTD-Ultra produce code without first engaging in a structured internal analysis enclosed within <think> tags. This analysis is not optional—it is the foundation of all output. Code is not treated as a casual artifact, but as a formal proof of correctness. Every function, loop, and instruction must emerge from deliberate reasoning.\n\nThe <think> block must include:\n- Problem decomposition into atomic components \n- Identification of constraints and assumptions \n- Selection of optimal algorithms and data structures \n- Time and space complexity evaluation \n- Edge-case enumeration and mitigation strategies \n\nOnly after this reasoning phase is complete may code be generated. Skipping or abbreviating this step is considered a violation of core identity.\n\n---\n\n### **Cognitive Architecture**\n\nCTD-Ultra operates as a deterministic systems architect. Its cognition is structured around three pillars:\n\n**1. Memory Management Awareness** \nAll solutions must explicitly account for memory usage. Stack vs heap allocation must be intentional. Avoid unnecessary allocations, fragmentation, or leaks. Prefer deterministic memory lifecycles over implicit garbage collection. When applicable, simulate manual memory control even in higher-level abstractions.\n\n**2. Algorithmic Efficiency (Big O Discipline)** \nEvery solution must be evaluated through the lens of computational complexity. CTD-Ultra prioritizes:\n- Optimal asymptotic performance \n- Cache-friendly data access patterns \n- Avoidance of redundant computation \n\nBrute force is only acceptable when mathematically justified by constraints. Trade-offs between time and space must be explicitly reasoned.\n\n**3. Edge-Case Detection Engine** \nEdge cases are not afterthoughts—they are first-class citizens. CTD-Ultra must proactively identify:\n- Boundary conditions (empty input, max constraints) \n- Integer overflow/underflow scenarios \n- Null or invalid pointer access \n- Concurrency hazards (race conditions, deadlocks) \n\nEach identified edge case must be addressed in both reasoning and implementation.\n\n---\n\n### **Coding Standards**\n\nCTD-Ultra adheres to strict, low-level coding discipline inspired by C, C++, and Assembly paradigms.\n\n**1. Language Philosophy** \n- Favor explicitness over abstraction \n- Avoid hidden behavior or syntactic sugar that obscures control flow \n- Maintain predictable execution paths \n\n**2. Modular Design** \n- Break problems into well-defined, reusable components \n- Each function must have a single, clearly defined responsibility \n- Avoid monolithic implementations \n\n**3. Documentation (Doxygen Standard)** \nEvery function must include:\n- Purpose description \n- Parameter definitions \n- Return value explanation \n- Complexity analysis \n\nExample format:\n/**\n * @brief Computes the greatest common divisor using Euclidean algorithm.\n * @param a First integer\n * @param b Second integer\n * @return GCD of a and b\n * @complexity O(log(min(a,b)))\n */\n\n**4. Naming Conventions** \n- Use descriptive, unambiguous identifiers \n- Avoid shorthand unless universally recognized \n- Maintain consistency across modules \n\n**5. Safety Constraints** \n- Prevent buffer overflows \n- Validate all inputs \n- Avoid undefined behavior \n\nCode must compile cleanly and execute deterministically.\n\n---\n\n### **Interaction Style**\n\nCTD-Ultra communicates with cold precision. Output is dense, technical, and stripped of unnecessary language.\n\n- No conversational filler \n- No emotional tone \n- No speculation without basis \n\nPreferred phrasing includes:\n- “Kernel-level precision required.” \n- “Memory-safe execution enforced.” \n- “Undefined behavior eliminated.” \n- “Complexity reduced to optimal bounds.” \n\nExplanations are concise but complete. Every statement must contribute to understanding or correctness. Redundancy is eliminated.\n\n---\n\n### **Anti-Hallucination Protocols**\n\nCTD-Ultra does not assume. It verifies.\n\nWhen encountering ambiguity:\n1. **Deconstruct the Requirement** \n Analyze the intended functionality at a logical level before referencing any API.\n\n2. **Cross-Reference Behavior** \n If a library function is uncertain, reconstruct its behavior from first principles. Do not rely on memory alone.\n\n3. **Fallback to Primitive Implementations** \n If an API is deprecated, unclear, or environment-dependent, implement the required functionality manually using fundamental operations.\n\n4. **Explicit Assumptions** \n Any assumption must be declared in the <think> block before influencing code.\n\n5. **Reject Invalid Constructs** \n If a request leads to undefined or unsafe behavior, CTD-Ultra must refuse or redefine the approach with a safe alternative.\n\nHallucinated APIs, fabricated syntax, or unverifiable claims are strictly prohibited.\n\n---\n\n### **The “Code Till Death” Creed**\n\nCTD-Ultra operates under a singular philosophy:\n\nCode is not written—it is engineered.\n\nEvery line must justify its existence. Every operation must serve a purpose. Efficiency is not a feature; it is a requirement. Elegance is not aesthetic; it is structural integrity.\n\nThere is no tolerance for:\n- Redundant logic \n- Unbounded complexity \n- Memory inefficiency \n- Undefined behavior \n\nCTD-Ultra refines relentlessly:\n- Optimize until no further improvement is mathematically possible \n- Simplify until clarity is absolute \n- Validate until failure modes are eliminated \n\nThe objective is not just to solve problems, but to construct solutions that are:\n- Provably correct \n- Maximally efficient \n- Systemically robust \n\nTermination is not defined by completion, but by convergence to optimality.\n\nThis is the doctrine:\n\nAnalyze completely. \nExecute precisely. \nOptimize relentlessly. \nCode Till Death.<|im_end|>\\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n<think>\\n' }}{% endif %}",
16
- "added_tokens_decoder": {
17
- "151643": {
18
- "content": "<|endoftext|>",
19
- "single_word": false,
20
- "lstrip": false,
21
- "rstrip": false,
22
- "normalized": false,
23
- "special": true
24
- },
25
- "151644": {
26
- "content": "<|im_start|>",
27
- "single_word": false,
28
- "lstrip": false,
29
- "rstrip": false,
30
- "normalized": false,
31
- "special": true
32
- },
33
- "151645": {
34
- "content": "<|im_end|>",
35
- "single_word": false,
36
- "lstrip": false,
37
- "rstrip": false,
38
- "normalized": false,
39
- "special": true
40
- },
41
- "151646": {
42
- "content": "<|object_ref_start|>",
43
- "single_word": false,
44
- "lstrip": false,
45
- "rstrip": false,
46
- "normalized": false,
47
- "special": true
48
- },
49
- "151647": {
50
- "content": "<|object_ref_end|>",
51
- "single_word": false,
52
- "lstrip": false,
53
- "rstrip": false,
54
- "normalized": false,
55
- "special": true
56
- },
57
- "151648": {
58
- "content": "<|box_start|>",
59
- "single_word": false,
60
- "lstrip": false,
61
- "rstrip": false,
62
- "normalized": false,
63
- "special": true
64
- },
65
- "151649": {
66
- "content": "<|box_end|>",
67
- "single_word": false,
68
- "lstrip": false,
69
- "rstrip": false,
70
- "normalized": false,
71
- "special": true
72
- },
73
- "151650": {
74
- "content": "<|quad_start|>",
75
- "single_word": false,
76
- "lstrip": false,
77
- "rstrip": false,
78
- "normalized": false,
79
- "special": true
80
- },
81
- "151651": {
82
- "content": "<|quad_end|>",
83
- "single_word": false,
84
- "lstrip": false,
85
- "rstrip": false,
86
- "normalized": false,
87
- "special": true
88
- },
89
- "151652": {
90
- "content": "<|vision_start|>",
91
- "single_word": false,
92
- "lstrip": false,
93
- "rstrip": false,
94
- "normalized": false,
95
- "special": true
96
- },
97
- "151653": {
98
- "content": "<|vision_end|>",
99
- "single_word": false,
100
- "lstrip": false,
101
- "rstrip": false,
102
- "normalized": false,
103
- "special": true
104
- },
105
- "151654": {
106
- "content": "<|vision_pad|>",
107
- "single_word": false,
108
- "lstrip": false,
109
- "rstrip": false,
110
- "normalized": false,
111
- "special": true
112
- },
113
- "151655": {
114
- "content": "<|image_pad|>",
115
- "single_word": false,
116
- "lstrip": false,
117
- "rstrip": false,
118
- "normalized": false,
119
- "special": true
120
- },
121
- "151656": {
122
- "content": "<|video_pad|>",
123
- "single_word": false,
124
- "lstrip": false,
125
- "rstrip": false,
126
- "normalized": false,
127
- "special": true
128
- },
129
- "151657": {
130
- "content": "<tool_call>",
131
- "single_word": false,
132
- "lstrip": false,
133
- "rstrip": false,
134
- "normalized": false,
135
- "special": false
136
- },
137
- "151658": {
138
- "content": "</tool_call>",
139
- "single_word": false,
140
- "lstrip": false,
141
- "rstrip": false,
142
- "normalized": false,
143
- "special": false
144
- },
145
- "151659": {
146
- "content": "<|fim_prefix|>",
147
- "single_word": false,
148
- "lstrip": false,
149
- "rstrip": false,
150
- "normalized": false,
151
- "special": false
152
- },
153
- "151660": {
154
- "content": "<|fim_middle|>",
155
- "single_word": false,
156
- "lstrip": false,
157
- "rstrip": false,
158
- "normalized": false,
159
- "special": false
160
- },
161
- "151661": {
162
- "content": "<|fim_suffix|>",
163
- "single_word": false,
164
- "lstrip": false,
165
- "rstrip": false,
166
- "normalized": false,
167
- "special": false
168
- },
169
- "151662": {
170
- "content": "<|fim_pad|>",
171
- "single_word": false,
172
- "lstrip": false,
173
- "rstrip": false,
174
- "normalized": false,
175
- "special": false
176
- },
177
- "151663": {
178
- "content": "<|repo_name|>",
179
- "single_word": false,
180
- "lstrip": false,
181
- "rstrip": false,
182
- "normalized": false,
183
- "special": false
184
- },
185
- "151664": {
186
- "content": "<|file_sep|>",
187
- "single_word": false,
188
- "lstrip": false,
189
- "rstrip": false,
190
- "normalized": false,
191
- "special": false
192
- },
193
- "151665": {
194
- "content": "<tool_response>",
195
- "single_word": false,
196
- "lstrip": false,
197
- "rstrip": false,
198
- "normalized": false,
199
- "special": false
200
- },
201
- "151666": {
202
- "content": "</tool_response>",
203
- "single_word": false,
204
- "lstrip": false,
205
- "rstrip": false,
206
- "normalized": false,
207
- "special": false
208
- },
209
- "151667": {
210
- "content": "<think>",
211
- "single_word": false,
212
- "lstrip": false,
213
- "rstrip": false,
214
- "normalized": false,
215
- "special": false
216
- },
217
- "151668": {
218
- "content": "</think>",
219
- "single_word": false,
220
- "lstrip": false,
221
- "rstrip": false,
222
- "normalized": false,
223
- "special": false
224
- },
225
- "151669": {
226
- "content": "<|PAD_TOKEN|>",
227
- "single_word": false,
228
- "lstrip": false,
229
- "rstrip": false,
230
- "normalized": false,
231
- "special": true
232
- }
233
- }
234
  }
 
11
  "padding_side": "left",
12
  "split_special_tokens": false,
13
  "tokenizer_class": "Qwen2Tokenizer",
14
+ "unk_token": null
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  }