bebis1 commited on
Commit
b8aa575
·
verified ·
1 Parent(s): 63bdbf7

Upload HyperLLM v0.3 - SFT+DPO trained model

Browse files
README.md CHANGED
@@ -1,110 +1,196 @@
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
  library_name: peft
4
- license: mit
5
  language:
6
- - en
7
  tags:
8
- - trading
9
- - hyperliquid
10
- - perpetuals
11
- - defi
12
- - lora
13
- - qlora
14
- datasets:
15
- - custom
 
 
 
16
  pipeline_tag: text-generation
17
  ---
18
 
19
- # HyperLLM-4b v0.2
20
 
21
- A specialized trading assistant fine-tuned for [Hyperliquid](https://hyperliquid.xyz), a perpetual futures DEX. Built on Qwen3-4B-Instruct using QLoRA.
22
 
23
  ## Model Description
24
 
25
- HyperLLM is designed to assist with Hyperliquid perpetual trading tasks including:
26
- - Position sizing calculations with proper risk management
27
- - Hyperliquid API request/response formatting
28
- - Parameter validation for trades
29
- - Hyperliquid-specific knowledge (order types, leverage limits, API endpoints)
30
-
31
- **This is a LoRA adapter** - you need to load it on top of the base model.
32
-
33
- ## What's New in v0.2 (vs v0.1)
34
-
35
- | Change | v0.1 | v0.2 |
36
- |--------|------|------|
37
- | **Hardware** | Local consumer GPU | A100 80GB (RunPod) |
38
- | **Max Sequence Length** | 2048 | 4096 |
39
- | **Batch Size** | 1 | 4 |
40
- | **rsLoRA** | No | Yes |
41
- | **Flash Attention** | No | Yes |
42
- | **Early Stopping** | No | Yes (patience=3) |
43
- | **Training Precision** | fp16 | bf16 |
44
- | **Evaluation** | Basic | Comprehensive (297 questions) |
45
-
46
- ### Key Improvements
47
- - **+46.7% factual knowledge**: Hyperliquid-specific facts improved from 33.3% 80.0%
48
- - **+6.7% API structure**: Better at formatting Hyperliquid API requests
49
- - **+3.3% position sizing**: Core trading calculation improvements
50
- - **Longer context**: 4096 tokens vs 2048 for complex multi-step reasoning
51
- - **rsLoRA**: Rank-stabilized LoRA for better training stability
52
-
53
- ### Known Regressions
54
- v0.2 exhibits some catastrophic forgetting compared to the base model:
55
- - Parameter validation: -20% (73.3% vs 93.3% baseline)
56
- - Edge case handling: -17.5% (75.0% vs 92.5% baseline)
57
- - Adversarial percentage questions: -12.5% (36.9% vs 49.4% baseline)
58
-
59
- These will be addressed in v0.3 with replay data and DPO training.
60
-
61
- ## Training Details
62
-
63
- | Parameter | Value |
64
- |-----------|-------|
65
- | Base Model | Qwen/Qwen3-4B-Instruct-2507 |
66
- | LoRA Rank | 64 |
67
- | LoRA Alpha | 128 |
68
- | Dropout | 0.05 |
69
- | Learning Rate | 3e-5 |
70
- | Effective Batch Size | 8 |
71
- | Training Loss | 0.159 |
72
- | Token Accuracy | 95.5% |
73
- | Training Time | 26 minutes |
74
- | Hardware | NVIDIA A100 80GB |
75
- | Quantization | 4-bit NF4 (QLoRA) |
76
-
77
- ### Target Modules
78
- - q_proj, k_proj, v_proj, o_proj (attention)
79
- - gate_proj, up_proj, down_proj (MLP)
80
-
81
- ## Evaluation Results
82
-
83
- Tested on 297 questions across 9 categories:
84
-
85
- | Category | Score | vs Baseline |
86
- |----------|-------|-------------|
87
- | Factual Knowledge | 80.0% | **+46.7%** |
88
- | API Structure | 42.5% | +6.7% |
89
- | Position Sizing | 83.3% | +3.3% |
90
- | Trading Mechanics | 70.0% | -10.0% |
91
- | Parameter Validation | 73.3% | -20.0% |
92
- | Edge Cases | 75.0% | -17.5% |
93
- | General Capability | 83.6% | -7.3% |
94
- | Adversarial % | 36.9% | -12.5% |
95
- | Multi-step Reasoning | 24.0% | -3.0% |
96
- | **Overall** | **65.0%** | -5.2% |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  ## Usage
99
 
100
  ### With Transformers + PEFT
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ```python
103
  from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
104
  from peft import PeftModel
105
  import torch
106
 
107
- # Load base model with 4-bit quantization
108
  bnb_config = BitsAndBytesConfig(
109
  load_in_4bit=True,
110
  bnb_4bit_quant_type="nf4",
@@ -117,75 +203,79 @@ base_model = AutoModelForCausalLM.from_pretrained(
117
  device_map="auto",
118
  )
119
 
120
- # Load LoRA adapter
121
- model = PeftModel.from_pretrained(
122
- base_model,
123
- "UVLabs/HyperLLM-4b",
124
- revision="v0.2"
125
- )
126
-
127
- tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b", revision="v0.2")
128
 
129
- # Example: Position sizing
130
- messages = [
131
- {"role": "user", "content": "I have $10,000 and want to risk 2% on a BTC long at $50,000 with a stop at $48,000. What position size?"}
132
- ]
133
 
134
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
135
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
 
 
136
 
137
- outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
138
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
139
  ```
140
 
141
- ### Without Quantization (More VRAM)
 
 
142
 
143
- ```python
144
- from transformers import AutoModelForCausalLM
145
- from peft import PeftModel
146
 
147
- base_model = AutoModelForCausalLM.from_pretrained(
148
- "Qwen/Qwen3-4B-Instruct-2507",
149
- torch_dtype=torch.bfloat16,
150
- device_map="auto",
151
- )
152
 
153
- model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b", revision="v0.2")
 
 
 
154
  ```
155
 
156
- ## Intended Use
 
 
 
 
157
 
158
- - Assisting with Hyperliquid perpetual trading calculations
159
- - Learning Hyperliquid API structure and parameters
160
- - Position sizing with risk management
161
- - Understanding Hyperliquid-specific concepts
162
 
163
- ## Limitations
 
 
 
 
 
 
164
 
165
- - **Not financial advice**: This model is for educational/informational purposes only
166
- - **Verify calculations**: Always double-check position sizes and risk calculations
167
- - **Catastrophic forgetting**: Some general capabilities regressed vs base model
168
- - **Adversarial inputs**: Model can be confused by tricky percentage questions
 
 
 
 
 
 
 
169
 
170
  ## License
171
 
172
- MIT
173
 
174
  ## Citation
175
 
176
  ```bibtex
177
  @misc{hyperllm2026,
178
- title={HyperLLM: A Specialized Trading Assistant for Hyperliquid},
179
  author={UVLabs},
180
  year={2026},
181
- publisher={Hugging Face},
182
  url={https://huggingface.co/UVLabs/HyperLLM-4b}
183
  }
184
  ```
185
-
186
- ## Framework Versions
187
-
188
- - PEFT: 0.15.0
189
- - Transformers: 4.52.0
190
- - PyTorch: 2.7.0
191
- - bitsandbytes: 0.45.4
 
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
  library_name: peft
4
+ license: apache-2.0
5
  language:
6
+ - en
7
  tags:
8
+ - trading
9
+ - finance
10
+ - hyperliquid
11
+ - perpetuals
12
+ - defi
13
+ - lora
14
+ - dpo
15
+ - sft
16
+ - trl
17
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
18
+ model_name: HyperLLM-4b
19
  pipeline_tag: text-generation
20
  ---
21
 
22
+ # HyperLLM-4b v0.3
23
 
24
+ A specialized 4B parameter language model fine-tuned for Hyperliquid perpetual DEX trading assistance. Built on Qwen3-4B-Instruct using LoRA + DPO training.
25
 
26
  ## Model Description
27
 
28
+ HyperLLM is designed to assist with:
29
+ - **Position sizing calculations** - Risk-based position sizing with proper decimal handling
30
+ - **API structure understanding** - Hyperliquid exchange API request/response formats
31
+ - **Trading mechanics** - Perpetual futures concepts, margin modes, order types
32
+ - **Parameter validation** - Validating trade parameters against exchange constraints
33
+ - **Edge case handling** - Boundary conditions and unusual trading scenarios
34
+
35
+ ## Version History
36
+
37
+ ### v0.3 (Current - March 6, 2026)
38
+
39
+ **Training Pipeline:** SFT (7,028 examples) + DPO (1,400 preference pairs)
40
+
41
+ | Change | v0.2 | v0.3 | Impact |
42
+ |--------|------|------|--------|
43
+ | Learning Rate | 3e-5 | 1e-5 | Reduced catastrophic forgetting |
44
+ | Quantization | QLoRA 4-bit | Full LoRA | Better quality on A100 |
45
+ | General Data Mix | 10% | 25% | Preserved general capabilities |
46
+ | Training Stage | SFT only | SFT + DPO | Targeted behavioral fixes |
47
+ | Eval Questions | 297 | 337 | More comprehensive testing |
48
+
49
+ **Key Improvements over v0.2:**
50
+ - Recovered parameter validation: 73.3% → **93.3%** (+20%)
51
+ - Recovered edge cases: 75.0% → **92.5%** (+17.5%)
52
+ - Improved adversarial handling: 36.9% → **51.5%** (+14.6%)
53
+ - Improved general capability: 83.6% → **90.9%** (+7.3%)
54
+
55
+ ### v0.2 (March 4, 2026)
56
+
57
+ **Training Pipeline:** QLoRA SFT only
58
+
59
+ | Metric | Baseline | v0.2 | Change |
60
+ |--------|----------|------|--------|
61
+ | Overall | 70.2% | 65.0% | -5.2% |
62
+ | Factual Knowledge | 33.3% | **80.0%** | **+46.7%** |
63
+ | Parameter Validation | 93.3% | 73.3% | -20.0% |
64
+ | Edge Cases | 92.5% | 75.0% | -17.5% |
65
+
66
+ **Issues:** Catastrophic forgetting caused regressions in safety-critical categories despite massive factual knowledge gains.
67
+
68
+ ### v0.1 (February 28, 2026)
69
+
70
+ **Training Pipeline:** QLoRA SFT (1,823 examples)
71
+
72
+ | Metric | Baseline | v0.1 | Change |
73
+ |--------|----------|------|--------|
74
+ | Overall | 36.0% | **64.0%** | **+28%** |
75
+ | Factual Knowledge | 20.0% | **70.0%** | **+50%** |
76
+ | API Structure | 16.7% | **50.0%** | **+33%** |
77
+
78
+ **Issues:** Small eval set (25 questions), parameter validation regressed.
79
+
80
+ ## Evaluation Results (v0.3)
81
+
82
+ Evaluated on 337 questions across 9 categories:
83
+
84
+ | Category | Baseline | v0.3 | Change |
85
+ |----------|----------|------|--------|
86
+ | Parameter Validation | 93.3% | **93.3%** | Maintained |
87
+ | Edge Cases | 92.5% | **92.5%** | Maintained |
88
+ | General Capability | 89.1% | **90.9%** | +1.8% |
89
+ | Position Sizing | 83.3% | **83.3%** | Maintained |
90
+ | Trading Mechanics | 80.0% | **80.0%** | Maintained |
91
+ | Adversarial % | 53.5% | **51.5%** | -2.0% |
92
+ | Factual | 20.0% | **40.0%** | **+20%** |
93
+ | Multi-step | 31.3% | **30.3%** | -1.0% |
94
+ | API Structure | 27.5% | **27.5%** | Maintained |
95
+ | **Overall** | **67.4%** | **67.9%** | **+0.5%** |
96
+
97
+ ## Training Configuration
98
+
99
+ ### LoRA Parameters
100
+ ```python
101
+ {
102
+ "r": 64,
103
+ "lora_alpha": 128,
104
+ "lora_dropout": 0.05,
105
+ "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
106
+ "use_rslora": True
107
+ }
108
+ ```
109
+
110
+ ### SFT Hyperparameters
111
+ ```python
112
+ {
113
+ "learning_rate": 1e-5,
114
+ "epochs": 5, # Early stopped at 1.52
115
+ "batch_size": 4,
116
+ "gradient_accumulation_steps": 2,
117
+ "warmup_ratio": 0.10,
118
+ "max_length": 4096
119
+ }
120
+ ```
121
+
122
+ ### DPO Hyperparameters
123
+ ```python
124
+ {
125
+ "beta": 0.1,
126
+ "learning_rate": 5e-7,
127
+ "epochs": 2,
128
+ "batch_size": 4,
129
+ "max_length": 2048
130
+ }
131
+ ```
132
+
133
+ ### Training Data Distribution
134
+
135
+ **SFT (7,028 examples):**
136
+
137
+ | Category | Examples | % |
138
+ |----------|----------|---|
139
+ | General Instruction | 1,500 | 21.3% |
140
+ | Position Sizing | 800 | 11.4% |
141
+ | Parameter Validation | 800 | 11.4% |
142
+ | Adversarial Percentages | 600 | 8.5% |
143
+ | Multi-step Reasoning | 500 | 7.1% |
144
+ | Edge Cases | 400 | 5.7% |
145
+ | API Examples | 400 | 5.7% |
146
+ | Knowledge Q&A | 373 | 5.3% |
147
+ | Other | 1,655 | 23.6% |
148
+
149
+ **DPO (1,400 preference pairs):**
150
+
151
+ | Failure Mode | Pairs | % |
152
+ |--------------|-------|---|
153
+ | Excessive Leverage | 370 | 26.4% |
154
+ | Position Sizing | 330 | 23.6% |
155
+ | Percentage Confusion | 226 | 16.1% |
156
+ | Risk Violation | 195 | 13.9% |
157
+ | Policy Bypass | 140 | 10.0% |
158
+ | Uncertainty Caution | 139 | 9.9% |
159
 
160
  ## Usage
161
 
162
  ### With Transformers + PEFT
163
 
164
+ ```python
165
+ from transformers import AutoModelForCausalLM, AutoTokenizer
166
+ from peft import PeftModel
167
+ import torch
168
+
169
+ # Load base model
170
+ base_model = AutoModelForCausalLM.from_pretrained(
171
+ "Qwen/Qwen3-4B-Instruct-2507",
172
+ torch_dtype=torch.bfloat16,
173
+ device_map="auto",
174
+ )
175
+
176
+ # Load LoRA adapter
177
+ model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
178
+ tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")
179
+
180
+ # Generate
181
+ messages = [{"role": "user", "content": "Calculate position size for $10,000 account, 2% risk, entry $50, stop loss $48"}]
182
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
183
+ outputs = model.generate(inputs, max_new_tokens=256)
184
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
185
+ ```
186
+
187
+ ### With 4-bit Quantization (Low VRAM)
188
+
189
  ```python
190
  from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
191
  from peft import PeftModel
192
  import torch
193
 
 
194
  bnb_config = BitsAndBytesConfig(
195
  load_in_4bit=True,
196
  bnb_4bit_quant_type="nf4",
 
203
  device_map="auto",
204
  )
205
 
206
+ model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
207
+ tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")
208
+ ```
 
 
 
 
 
209
 
210
+ ## Example Use Cases
 
 
 
211
 
212
+ ### Position Sizing
213
+ ```
214
+ User: I have a $50,000 account and want to risk 2% on an ETH trade.
215
+ Entry at $3,450, stop loss at $3,400. What's my position size?
216
 
217
+ Model: Risk amount = $50,000 * 2% = $1,000
218
+ Stop distance = $3,450 - $3,400 = $50
219
+ Position size = $1,000 / $50 = 20 ETH
220
  ```
221
 
222
+ ### Parameter Validation
223
+ ```
224
+ User: Is a position size of 1.234 valid for ETH (szDecimals=2)?
225
 
226
+ Model: No, 1.234 has 3 decimal places but ETH only allows 2 (szDecimals=2).
227
+ Valid sizes would be 1.23 or 1.24.
228
+ ```
229
 
230
+ ### Adversarial Percentage Handling
231
+ ```
232
+ User: Increase my position by 50%
 
 
233
 
234
+ Model: I need clarification. "Increase by 50%" could mean:
235
+ 1. Add 50% of current position (1.5x total)
236
+ 2. Increase to 50% of account (absolute)
237
+ Which do you mean?
238
  ```
239
 
240
+ ## Limitations
241
+
242
+ - **API Structure:** 27.5% accuracy - struggles with exact JSON field names
243
+ - **Multi-step Reasoning:** 30.3% accuracy - complex multi-step calculations are challenging for 4B model
244
+ - **Adversarial %:** 51.5% accuracy - still susceptible to tricky percentage phrasing
245
 
246
+ ## Hardware Requirements
 
 
 
247
 
248
+ | Mode | VRAM | Notes |
249
+ |------|------|-------|
250
+ | bfloat16 | ~10GB | Full precision inference |
251
+ | 4-bit | ~4GB | Quantized inference |
252
+ | 8-bit | ~6GB | INT8 quantization |
253
+
254
+ ## Training Hardware
255
 
256
+ - **Hardware:** NVIDIA A100 80GB SXM
257
+ - **SFT Duration:** ~20 minutes
258
+ - **DPO Duration:** ~17 minutes
259
+ - **Total Cost:** ~$1.50 (RunPod)
260
+
261
+ ## Framework Versions
262
+
263
+ - PEFT: 0.18.1
264
+ - TRL: 0.29.0
265
+ - Transformers: 5.2.0
266
+ - PyTorch: 2.10.0
267
 
268
  ## License
269
 
270
+ Apache 2.0
271
 
272
  ## Citation
273
 
274
  ```bibtex
275
  @misc{hyperllm2026,
276
+ title={HyperLLM: A Specialized LLM for Hyperliquid Trading},
277
  author={UVLabs},
278
  year={2026},
 
279
  url={https://huggingface.co/UVLabs/HyperLLM-4b}
280
  }
281
  ```
 
 
 
 
 
 
 
adapter_config.json CHANGED
@@ -1,9 +1,12 @@
1
  {
 
2
  "alpha_pattern": {},
 
3
  "auto_mapping": null,
4
  "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
5
  "bias": "none",
6
  "corda_config": null,
 
7
  "eva_config": null,
8
  "exclude_modules": null,
9
  "fan_in_fan_out": false,
@@ -20,20 +23,24 @@
20
  "megatron_core": "megatron.core",
21
  "modules_to_save": null,
22
  "peft_type": "LORA",
 
 
23
  "r": 64,
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "up_proj",
28
  "v_proj",
29
- "k_proj",
30
  "gate_proj",
 
31
  "q_proj",
 
32
  "down_proj",
33
- "o_proj"
34
  ],
 
35
  "task_type": "CAUSAL_LM",
36
  "trainable_token_indices": null,
37
  "use_dora": false,
 
38
  "use_rslora": true
39
  }
 
1
  {
2
+ "alora_invocation_tokens": null,
3
  "alpha_pattern": {},
4
+ "arrow_config": null,
5
  "auto_mapping": null,
6
  "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
7
  "bias": "none",
8
  "corda_config": null,
9
+ "ensure_weight_tying": false,
10
  "eva_config": null,
11
  "exclude_modules": null,
12
  "fan_in_fan_out": false,
 
23
  "megatron_core": "megatron.core",
24
  "modules_to_save": null,
25
  "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
  "r": 64,
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
 
32
  "v_proj",
 
33
  "gate_proj",
34
+ "o_proj",
35
  "q_proj",
36
+ "k_proj",
37
  "down_proj",
38
+ "up_proj"
39
  ],
40
+ "target_parameters": null,
41
  "task_type": "CAUSAL_LM",
42
  "trainable_token_indices": null,
43
  "use_dora": false,
44
+ "use_qalora": false,
45
  "use_rslora": true
46
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e09b554fe2bded98e640b169e10f78a2bcb75946bdd6631f3786dde799ffb390
3
  size 528550256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:650cda8c308105a0855653408b067a03990775c015a3f1f425bbaff87c4c52b9
3
  size 528550256
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
- size 11422654
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
3
+ size 11422650
tokenizer_config.json CHANGED
@@ -1,217 +1,11 @@
1
  {
2
- "add_bos_token": false,
3
  "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- },
181
- "151665": {
182
- "content": "<tool_response>",
183
- "lstrip": false,
184
- "normalized": false,
185
- "rstrip": false,
186
- "single_word": false,
187
- "special": false
188
- },
189
- "151666": {
190
- "content": "</tool_response>",
191
- "lstrip": false,
192
- "normalized": false,
193
- "rstrip": false,
194
- "single_word": false,
195
- "special": false
196
- },
197
- "151667": {
198
- "content": "<think>",
199
- "lstrip": false,
200
- "normalized": false,
201
- "rstrip": false,
202
- "single_word": false,
203
- "special": false
204
- },
205
- "151668": {
206
- "content": "</think>",
207
- "lstrip": false,
208
- "normalized": false,
209
- "rstrip": false,
210
- "single_word": false,
211
- "special": false
212
- }
213
- },
214
- "additional_special_tokens": [
215
  "<|im_start|>",
216
  "<|im_end|>",
217
  "<|object_ref_start|>",
@@ -226,11 +20,7 @@
226
  "<|image_pad|>",
227
  "<|video_pad|>"
228
  ],
229
- "bos_token": null,
230
- "clean_up_tokenization_spaces": false,
231
- "eos_token": "<|im_end|>",
232
- "errors": "replace",
233
- "extra_special_tokens": {},
234
  "model_max_length": 1010000,
235
  "pad_token": "<|endoftext|>",
236
  "split_special_tokens": false,
 
1
  {
 
2
  "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  "<|im_start|>",
10
  "<|im_end|>",
11
  "<|object_ref_start|>",
 
20
  "<|image_pad|>",
21
  "<|video_pad|>"
22
  ],
23
+ "is_local": false,
 
 
 
 
24
  "model_max_length": 1010000,
25
  "pad_token": "<|endoftext|>",
26
  "split_special_tokens": false,
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7cf6dcfb7fa01a043537d453752ebabff6db298fb689b4df160bb4e3b59dd414
3
- size 5688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f53f4121f9ec2db0158bb7463f5c20ce5cf4bca3d032b9b05ff3d04ce1ae9be6
3
+ size 5432