KaiyueWen commited on
Commit
867babb
·
verified ·
1 Parent(s): 55d8185

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-1.7B
4
+ tags:
5
+ - scaling-laws
6
+ - neural-scaling
7
+ - performance-prediction
8
+ - configuration-to-performance
9
+ - pytorch
10
+ library_name: transformers
11
+ ---
12
+
13
+ # NCPL-final: Neural Configuration to Performance Scaling Law
14
+
15
+ This model predicts the final performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast final performance metrics based on model configurations.
16
+
17
+ ## Model Description
18
+
19
+ **NCPL-final** (Neural Configuration to Performance Scaling Law - Final) is a specialized forecasting model that:
20
+
21
+ - Takes pretraining configurations as input
22
+ - Predicts final performance metrics using learned scaling law patterns
23
+ - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
24
+ - Supports multiple scaling law formulations (Marin, StepLaw)
25
+ - **Focuses on final performance only** (unlike NCPL-intermediate which predicts intermediate checkpoints)
26
+
27
+ ### Architecture
28
+
29
+ The model consists of:
30
+
31
+ 1. **Base Model**: Qwen/Qwen3-1.7B
32
+ - Provides contextual embeddings for text tokens
33
+
34
+ 2. **Numeric MLP**:
35
+ - Processes numeric values (performance metrics, configuration parameters)
36
+ - Projects numeric inputs to the same hidden dimension as text embeddings
37
+ - Architecture: Linear(1 → 2*hidden_size) → ReLU → Linear(2*hidden_size → hidden_size)
38
+
39
+ 3. **Prediction Head**:
40
+ - Linear layer mapping from hidden_size to scalar predictions
41
+ - Outputs performance forecasts for each token position
42
+
43
+ ## Training Data
44
+
45
+ The model was trained on:
46
+
47
+ - **Datasets**: Marin and StepLaw scaling law datasets (final performance only)
48
+ - **Training configuration**:
49
+ - Stage 1: 20 epochs with learning rate 5e-5 (frozen base model)
50
+ - Stage 2: 1000 epochs with learning rate 1e-5 (full fine-tuning)
51
+ - Batch size: 480 (across 8 GPUs)
52
+ - Weight decay: 0.01
53
+ - Loss: MSE (Mean Squared Error)
54
+
55
+ ## Usage
56
+
57
+ ```python
58
+ import torch
59
+ from transformers import AutoTokenizer
60
+ from model import ScalingLawForecaster # Make sure to import the model class
61
+
62
+ # Load model
63
+ model = ScalingLawForecaster(
64
+ base_model_name="Qwen/Qwen3-1.7B",
65
+ init_from_pretrained=True,
66
+ force_fp32=True
67
+ )
68
+
69
+ # Load checkpoint
70
+ checkpoint = torch.load("pytorch_model.bin")
71
+ model.load_state_dict(checkpoint["model_state_dict"])
72
+ model.eval()
73
+
74
+ # Load tokenizer
75
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
76
+
77
+ # Prepare inputs
78
+ # input_ids: tokenized text sequence
79
+ # is_number_mask: boolean mask indicating which tokens are numeric
80
+ # number_values_filled: actual numeric values (0 for non-numeric tokens)
81
+
82
+ with torch.no_grad():
83
+ predictions = model(
84
+ input_ids=input_ids,
85
+ is_number_mask=is_number_mask,
86
+ number_values_filled=number_values_filled,
87
+ attention_mask=attention_mask
88
+ )
89
+ ```
90
+
91
+ ## Input Format
92
+
93
+ The model expects three key inputs:
94
+
95
+ 1. **input_ids** (torch.LongTensor): Tokenized sequence with special numeric tokens
96
+ 2. **is_number_mask** (torch.BoolTensor): Boolean mask marking numeric token positions
97
+ 3. **number_values_filled** (torch.FloatTensor): Actual numeric values at marked positions
98
+
99
+ ## Intended Use
100
+
101
+ This model is designed for:
102
+
103
+ - **Scaling law research**: Understanding how neural network performance scales with configuration
104
+ - **Final performance forecasting**: Predicting model performance at the end of training
105
+ - **Configuration optimization**: Finding optimal hyperparameters based on scaling patterns
106
+ - **Resource planning**: Estimating computational requirements for different model sizes
107
+
108
+ ## Limitations
109
+
110
+ - Trained specifically on Marin and StepLaw datasets; generalization to other settings likely require at least finetuning
111
+ - Requires properly formatted inputs with numeric tokens replaced and masked
112
+ - Predicts only final performance, not intermediate checkpoints
113
+
114
+ ## Differences from NCPL-intermediate
115
+
116
+ - **NCPL-final**: Predicts only final performance metrics after full training
117
+ - **NCPL-intermediate**: Predicts performance at intermediate training checkpoints
118
+
119
+ NCPL-final is trained with more epochs (20 + 1000 vs 10 + 400) and focuses exclusively on final performance prediction.
120
+
121
+ ## Citation
122
+
123
+ If you use this model in your research, please cite:
124
+
125
+ ```bibtex
126
+ @article{ncpl2026,
127
+ title = {Neural Configuration to Performance Scaling Law},
128
+ author = {Huaqing Zhang and Kaiyue Wen and Tengyu Ma},
129
+ journal = {arXiv preprint arXiv:2602.10300},
130
+ year = {2026},
131
+ url = {https://www.arxiv.org/abs/2602.10300}
132
+ }
133
+ ```
134
+
135
+ ## Model Card Authors
136
+
137
+ OptimizerStudy Team
138
+
139
+ ## Model Card Contact
140
+
141
+ For questions or issues, please open an issue in the [repository](https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law).
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "scaling_law_forecaster",
3
+ "base_model_name": "Qwen/Qwen3-1.7B",
4
+ "architectures": [
5
+ "ScalingLawForecaster"
6
+ ],
7
+ "hidden_size": 2048,
8
+ "auto_map": {
9
+ "AutoModel": "model.ScalingLawForecaster"
10
+ }
11
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ from transformers import AutoModel, AutoConfig
4
+
5
+
6
+ class ScalingLawForecaster(nn.Module):
7
+ def __init__(
8
+ self,
9
+ base_model_name: str = "HuggingFaceTB/SmolLM2-135M",
10
+ init_from_pretrained: bool = True,
11
+ force_fp32: bool = False,
12
+ ):
13
+ super().__init__()
14
+ self.config = AutoConfig.from_pretrained(base_model_name)
15
+ if force_fp32:
16
+ self.config.torch_dtype = torch.float32
17
+ if init_from_pretrained:
18
+ if force_fp32:
19
+ self.base = AutoModel.from_pretrained(
20
+ base_model_name,
21
+ config=self.config,
22
+ torch_dtype=torch.float32,
23
+ )
24
+ else:
25
+ self.base = AutoModel.from_pretrained(base_model_name, config=self.config)
26
+ else:
27
+ self.base = AutoModel.from_config(self.config)
28
+
29
+ hidden_size = self.config.hidden_size
30
+
31
+ act_cls = nn.ReLU
32
+ self.num_mlp = nn.Sequential(
33
+ nn.Linear(1, hidden_size * 2),
34
+ act_cls(),
35
+ nn.Linear(hidden_size * 2, hidden_size)
36
+ )
37
+
38
+ self.head = nn.Linear(hidden_size, 1)
39
+
40
+ def forward(
41
+ self,
42
+ input_ids: torch.LongTensor,
43
+ is_number_mask: torch.BoolTensor,
44
+ number_values_filled: torch.FloatTensor,
45
+ attention_mask: torch.BoolTensor = None
46
+ ) -> torch.FloatTensor:
47
+ """
48
+ Args:
49
+ input_ids: (batch, seq_len)
50
+ is_number_mask: (batch, seq_len) bool mask for numeric tokens
51
+ number_values_filled:(batch, seq_len) float values (0 for non-numeric)
52
+ attention_mask: (batch, seq_len) optional
53
+ Returns:
54
+ logits: (batch, seq_len) scalar predictions per token
55
+ """
56
+ # Text embeddings
57
+ input_ids[input_ids == 49152] = 0
58
+ text_emb = self.base.get_input_embeddings()(input_ids)
59
+
60
+ # Numeric MLP embeddings
61
+ flat_vals = number_values_filled.view(-1, 1)
62
+ mlp_out = self.num_mlp(flat_vals)
63
+ mlp_out = mlp_out.view_as(text_emb)
64
+
65
+ mask = is_number_mask.unsqueeze(-1)
66
+ inputs_embeds = torch.where(mask, mlp_out, text_emb)
67
+
68
+ outputs = self.base(
69
+ inputs_embeds=inputs_embeds,
70
+ attention_mask=attention_mask,
71
+ return_dict=True
72
+ )
73
+ hidden = outputs.last_hidden_state
74
+
75
+ # Final scalar head
76
+ logits = self.head(hidden).squeeze(-1)
77
+ return logits
78
+
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d6d4f3b5c94d9d284a63c00a047158ac6dd9b637f5576b20d54fae1b4fa8905
3
+ size 6916029463
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff