KublaiKhan1 commited on
Commit
744d283
·
verified ·
1 Parent(s): c807c6e

Delete limo_filtered_combined

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. limo_filtered_combined/checkpoint-1122/added_tokens.json +0 -24
  2. limo_filtered_combined/checkpoint-1122/chat_template.jinja +0 -54
  3. limo_filtered_combined/checkpoint-1122/config.json +0 -58
  4. limo_filtered_combined/checkpoint-1122/generation_config.json +0 -9
  5. limo_filtered_combined/checkpoint-1122/merges.txt +0 -0
  6. limo_filtered_combined/checkpoint-1122/model.safetensors.index.json +0 -347
  7. limo_filtered_combined/checkpoint-1122/special_tokens_map.json +0 -31
  8. limo_filtered_combined/checkpoint-1122/tokenizer_config.json +0 -208
  9. limo_filtered_combined/checkpoint-1122/trainer_state.json +0 -0
  10. limo_filtered_combined/checkpoint-1122/vocab.json +0 -0
  11. limo_filtered_combined/checkpoint-1309/added_tokens.json +0 -24
  12. limo_filtered_combined/checkpoint-1309/chat_template.jinja +0 -54
  13. limo_filtered_combined/checkpoint-1309/config.json +0 -58
  14. limo_filtered_combined/checkpoint-1309/generation_config.json +0 -9
  15. limo_filtered_combined/checkpoint-1309/merges.txt +0 -0
  16. limo_filtered_combined/checkpoint-1309/model.safetensors.index.json +0 -347
  17. limo_filtered_combined/checkpoint-1309/special_tokens_map.json +0 -31
  18. limo_filtered_combined/checkpoint-1309/tokenizer_config.json +0 -208
  19. limo_filtered_combined/checkpoint-1309/trainer_state.json +0 -0
  20. limo_filtered_combined/checkpoint-1309/vocab.json +0 -0
  21. limo_filtered_combined/checkpoint-1496/added_tokens.json +0 -24
  22. limo_filtered_combined/checkpoint-1496/chat_template.jinja +0 -54
  23. limo_filtered_combined/checkpoint-1496/config.json +0 -58
  24. limo_filtered_combined/checkpoint-1496/generation_config.json +0 -9
  25. limo_filtered_combined/checkpoint-1496/merges.txt +0 -0
  26. limo_filtered_combined/checkpoint-1496/model.safetensors.index.json +0 -347
  27. limo_filtered_combined/checkpoint-1496/special_tokens_map.json +0 -31
  28. limo_filtered_combined/checkpoint-1496/tokenizer_config.json +0 -208
  29. limo_filtered_combined/checkpoint-1496/trainer_state.json +0 -0
  30. limo_filtered_combined/checkpoint-1496/vocab.json +0 -0
  31. limo_filtered_combined/checkpoint-1683/added_tokens.json +0 -24
  32. limo_filtered_combined/checkpoint-1683/chat_template.jinja +0 -54
  33. limo_filtered_combined/checkpoint-1683/config.json +0 -58
  34. limo_filtered_combined/checkpoint-1683/generation_config.json +0 -9
  35. limo_filtered_combined/checkpoint-1683/merges.txt +0 -0
  36. limo_filtered_combined/checkpoint-1683/model.safetensors.index.json +0 -347
  37. limo_filtered_combined/checkpoint-1683/special_tokens_map.json +0 -31
  38. limo_filtered_combined/checkpoint-1683/tokenizer_config.json +0 -208
  39. limo_filtered_combined/checkpoint-1683/trainer_state.json +0 -0
  40. limo_filtered_combined/checkpoint-1683/vocab.json +0 -0
  41. limo_filtered_combined/checkpoint-187/added_tokens.json +0 -24
  42. limo_filtered_combined/checkpoint-187/chat_template.jinja +0 -54
  43. limo_filtered_combined/checkpoint-187/config.json +0 -58
  44. limo_filtered_combined/checkpoint-187/generation_config.json +0 -9
  45. limo_filtered_combined/checkpoint-187/merges.txt +0 -0
  46. limo_filtered_combined/checkpoint-187/model.safetensors.index.json +0 -347
  47. limo_filtered_combined/checkpoint-187/special_tokens_map.json +0 -31
  48. limo_filtered_combined/checkpoint-187/tokenizer_config.json +0 -208
  49. limo_filtered_combined/checkpoint-187/trainer_state.json +0 -1343
  50. limo_filtered_combined/checkpoint-187/vocab.json +0 -0
limo_filtered_combined/checkpoint-1122/added_tokens.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "</tool_call>": 151658,
3
- "<tool_call>": 151657,
4
- "<|box_end|>": 151649,
5
- "<|box_start|>": 151648,
6
- "<|endoftext|>": 151643,
7
- "<|file_sep|>": 151664,
8
- "<|fim_middle|>": 151660,
9
- "<|fim_pad|>": 151662,
10
- "<|fim_prefix|>": 151659,
11
- "<|fim_suffix|>": 151661,
12
- "<|im_end|>": 151645,
13
- "<|im_start|>": 151644,
14
- "<|image_pad|>": 151655,
15
- "<|object_ref_end|>": 151647,
16
- "<|object_ref_start|>": 151646,
17
- "<|quad_end|>": 151651,
18
- "<|quad_start|>": 151650,
19
- "<|repo_name|>": 151663,
20
- "<|video_pad|>": 151656,
21
- "<|vision_end|>": 151653,
22
- "<|vision_pad|>": 151654,
23
- "<|vision_start|>": 151652
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/chat_template.jinja DELETED
@@ -1,54 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0]['role'] == 'system' %}
4
- {{- messages[0]['content'] }}
5
- {%- else %}
6
- {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
7
- {%- endif %}
8
- {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
- {%- for tool in tools %}
10
- {{- "\n" }}
11
- {{- tool | tojson }}
12
- {%- endfor %}
13
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
- {%- else %}
15
- {%- if messages[0]['role'] == 'system' %}
16
- {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
- {%- else %}
18
- {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
19
- {%- endif %}
20
- {%- endif %}
21
- {%- for message in messages %}
22
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
- {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
- {%- elif message.role == "assistant" %}
25
- {{- '<|im_start|>' + message.role }}
26
- {%- if message.content %}
27
- {{- '\n' + message.content }}
28
- {%- endif %}
29
- {%- for tool_call in message.tool_calls %}
30
- {%- if tool_call.function is defined %}
31
- {%- set tool_call = tool_call.function %}
32
- {%- endif %}
33
- {{- '\n<tool_call>\n{"name": "' }}
34
- {{- tool_call.name }}
35
- {{- '", "arguments": ' }}
36
- {{- tool_call.arguments | tojson }}
37
- {{- '}\n</tool_call>' }}
38
- {%- endfor %}
39
- {{- '<|im_end|>\n' }}
40
- {%- elif message.role == "tool" %}
41
- {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
- {{- '<|im_start|>user' }}
43
- {%- endif %}
44
- {{- '\n<tool_response>\n' }}
45
- {{- message.content }}
46
- {{- '\n</tool_response>' }}
47
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
- {{- '<|im_end|>\n' }}
49
- {%- endif %}
50
- {%- endif %}
51
- {%- endfor %}
52
- {%- if add_generation_prompt %}
53
- {{- '<|im_start|>assistant\n' }}
54
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/config.json DELETED
@@ -1,58 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen2ForCausalLM"
4
- ],
5
- "attention_dropout": 0.0,
6
- "bos_token_id": 151643,
7
- "eos_token_id": 151645,
8
- "hidden_act": "silu",
9
- "hidden_size": 3584,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 18944,
12
- "layer_types": [
13
- "full_attention",
14
- "full_attention",
15
- "full_attention",
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention"
41
- ],
42
- "max_position_embeddings": 4096,
43
- "max_window_layers": 28,
44
- "model_type": "qwen2",
45
- "num_attention_heads": 28,
46
- "num_hidden_layers": 28,
47
- "num_key_value_heads": 4,
48
- "rms_norm_eps": 1e-06,
49
- "rope_scaling": null,
50
- "rope_theta": 10000.0,
51
- "sliding_window": null,
52
- "tie_word_embeddings": false,
53
- "torch_dtype": "float32",
54
- "transformers_version": "4.55.0",
55
- "use_cache": false,
56
- "use_sliding_window": false,
57
- "vocab_size": 152064
58
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 151643,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151643,
8
- "transformers_version": "4.55.0"
9
- }
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1122/model.safetensors.index.json DELETED
@@ -1,347 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_parameters": 1903904128,
4
- "total_size": 30462466048
5
- },
6
- "weight_map": {
7
- "lm_head.weight": "model-00007-of-00007.safetensors",
8
- "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
9
- "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
10
- "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
11
- "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
12
- "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
13
- "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
14
- "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
15
- "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
16
- "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
17
- "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
18
- "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
19
- "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
20
- "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
21
- "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
22
- "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
23
- "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
24
- "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
25
- "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
26
- "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
27
- "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
28
- "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
29
- "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
30
- "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
31
- "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
32
- "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
33
- "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
34
- "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
35
- "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
36
- "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
37
- "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
38
- "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
39
- "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
40
- "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
41
- "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
42
- "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
- "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
44
- "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
45
- "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
46
- "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
47
- "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
48
- "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
49
- "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
50
- "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
51
- "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
52
- "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
53
- "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
54
- "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
55
- "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
56
- "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
57
- "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
58
- "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
59
- "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
60
- "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
61
- "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
62
- "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
63
- "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
64
- "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
65
- "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
66
- "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
67
- "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
68
- "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
69
- "model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
70
- "model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
71
- "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
72
- "model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
73
- "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
74
- "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
75
- "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
76
- "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
77
- "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
78
- "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
- "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
80
- "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
81
- "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
82
- "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
83
- "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
84
- "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
85
- "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
86
- "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
87
- "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
88
- "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
89
- "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
90
- "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
91
- "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
92
- "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
93
- "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
94
- "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
95
- "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
96
- "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
97
- "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
98
- "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
99
- "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
100
- "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
101
- "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
102
- "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
103
- "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
104
- "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
105
- "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
106
- "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
107
- "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
108
- "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
109
- "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
110
- "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
111
- "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
112
- "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
113
- "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
114
- "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
- "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
116
- "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
117
- "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
118
- "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
119
- "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
120
- "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
121
- "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
122
- "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
123
- "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
124
- "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
125
- "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
126
- "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
127
- "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
128
- "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
129
- "model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
130
- "model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
131
- "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
132
- "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
133
- "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
134
- "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
135
- "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
136
- "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
137
- "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
138
- "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
139
- "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
140
- "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
141
- "model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
142
- "model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
143
- "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
144
- "model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
145
- "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
146
- "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
147
- "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
148
- "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
149
- "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
150
- "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
- "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
152
- "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
153
- "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
154
- "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
155
- "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
156
- "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
157
- "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
158
- "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
159
- "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
160
- "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
161
- "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
162
- "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
163
- "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
164
- "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
165
- "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
166
- "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
167
- "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
168
- "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
169
- "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
170
- "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
171
- "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
172
- "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
173
- "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
174
- "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
175
- "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
176
- "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
177
- "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
178
- "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
179
- "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
180
- "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
181
- "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
182
- "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
183
- "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
184
- "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
185
- "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
186
- "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
187
- "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
188
- "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
189
- "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
190
- "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
191
- "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
192
- "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
193
- "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
194
- "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
195
- "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
196
- "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
197
- "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
198
- "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
199
- "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
200
- "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
201
- "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
202
- "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
203
- "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
204
- "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
205
- "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
206
- "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
207
- "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
208
- "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
209
- "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
210
- "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
211
- "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
212
- "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
213
- "model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
214
- "model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
215
- "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
216
- "model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
217
- "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
218
- "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
219
- "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
220
- "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
221
- "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
222
- "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
223
- "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
224
- "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
225
- "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
226
- "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
227
- "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
228
- "model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
229
- "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
230
- "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
231
- "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
232
- "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
233
- "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
234
- "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
235
- "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
236
- "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
237
- "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
238
- "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
239
- "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
240
- "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
241
- "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
242
- "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
243
- "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
244
- "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
245
- "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
246
- "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
247
- "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
248
- "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
249
- "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
250
- "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
251
- "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
252
- "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
253
- "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
254
- "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
255
- "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
256
- "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
257
- "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
258
- "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
259
- "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
260
- "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
261
- "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
262
- "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
263
- "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
264
- "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
265
- "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
266
- "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
267
- "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
268
- "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
269
- "model.layers.3.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
270
- "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
271
- "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
272
- "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
273
- "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
274
- "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
275
- "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
276
- "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
277
- "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
278
- "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
279
- "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
280
- "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
281
- "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
282
- "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
283
- "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
284
- "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
285
- "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
286
- "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
287
- "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
288
- "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
289
- "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
290
- "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
291
- "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
292
- "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
293
- "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
294
- "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
295
- "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
296
- "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
297
- "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
298
- "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
299
- "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
300
- "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
301
- "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
302
- "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
303
- "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
304
- "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
305
- "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
306
- "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
307
- "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
308
- "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
309
- "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
310
- "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
311
- "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
312
- "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
313
- "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
314
- "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
315
- "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
316
- "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
317
- "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
318
- "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
319
- "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
320
- "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
321
- "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
322
- "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
323
- "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
324
- "model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
325
- "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
326
- "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
327
- "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
328
- "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
329
- "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
330
- "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
331
- "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
332
- "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
333
- "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
334
- "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
335
- "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
336
- "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
337
- "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
338
- "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
339
- "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
340
- "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
341
- "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
342
- "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
343
- "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
344
- "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
345
- "model.norm.weight": "model-00006-of-00007.safetensors"
346
- }
347
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/special_tokens_map.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>"
16
- ],
17
- "eos_token": {
18
- "content": "<|im_end|>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- "pad_token": {
25
- "content": "<|endoftext|>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- }
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/tokenizer_config.json DELETED
@@ -1,208 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- }
181
- },
182
- "additional_special_tokens": [
183
- "<|im_start|>",
184
- "<|im_end|>",
185
- "<|object_ref_start|>",
186
- "<|object_ref_end|>",
187
- "<|box_start|>",
188
- "<|box_end|>",
189
- "<|quad_start|>",
190
- "<|quad_end|>",
191
- "<|vision_start|>",
192
- "<|vision_end|>",
193
- "<|vision_pad|>",
194
- "<|image_pad|>",
195
- "<|video_pad|>"
196
- ],
197
- "bos_token": null,
198
- "clean_up_tokenization_spaces": false,
199
- "eos_token": "<|im_end|>",
200
- "errors": "replace",
201
- "extra_special_tokens": {},
202
- "model_max_length": 131072,
203
- "pad_token": "<|endoftext|>",
204
- "padding_side": "right",
205
- "split_special_tokens": false,
206
- "tokenizer_class": "Qwen2Tokenizer",
207
- "unk_token": null
208
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1122/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1122/vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1309/added_tokens.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "</tool_call>": 151658,
3
- "<tool_call>": 151657,
4
- "<|box_end|>": 151649,
5
- "<|box_start|>": 151648,
6
- "<|endoftext|>": 151643,
7
- "<|file_sep|>": 151664,
8
- "<|fim_middle|>": 151660,
9
- "<|fim_pad|>": 151662,
10
- "<|fim_prefix|>": 151659,
11
- "<|fim_suffix|>": 151661,
12
- "<|im_end|>": 151645,
13
- "<|im_start|>": 151644,
14
- "<|image_pad|>": 151655,
15
- "<|object_ref_end|>": 151647,
16
- "<|object_ref_start|>": 151646,
17
- "<|quad_end|>": 151651,
18
- "<|quad_start|>": 151650,
19
- "<|repo_name|>": 151663,
20
- "<|video_pad|>": 151656,
21
- "<|vision_end|>": 151653,
22
- "<|vision_pad|>": 151654,
23
- "<|vision_start|>": 151652
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/chat_template.jinja DELETED
@@ -1,54 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0]['role'] == 'system' %}
4
- {{- messages[0]['content'] }}
5
- {%- else %}
6
- {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
7
- {%- endif %}
8
- {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
- {%- for tool in tools %}
10
- {{- "\n" }}
11
- {{- tool | tojson }}
12
- {%- endfor %}
13
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
- {%- else %}
15
- {%- if messages[0]['role'] == 'system' %}
16
- {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
- {%- else %}
18
- {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
19
- {%- endif %}
20
- {%- endif %}
21
- {%- for message in messages %}
22
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
- {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
- {%- elif message.role == "assistant" %}
25
- {{- '<|im_start|>' + message.role }}
26
- {%- if message.content %}
27
- {{- '\n' + message.content }}
28
- {%- endif %}
29
- {%- for tool_call in message.tool_calls %}
30
- {%- if tool_call.function is defined %}
31
- {%- set tool_call = tool_call.function %}
32
- {%- endif %}
33
- {{- '\n<tool_call>\n{"name": "' }}
34
- {{- tool_call.name }}
35
- {{- '", "arguments": ' }}
36
- {{- tool_call.arguments | tojson }}
37
- {{- '}\n</tool_call>' }}
38
- {%- endfor %}
39
- {{- '<|im_end|>\n' }}
40
- {%- elif message.role == "tool" %}
41
- {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
- {{- '<|im_start|>user' }}
43
- {%- endif %}
44
- {{- '\n<tool_response>\n' }}
45
- {{- message.content }}
46
- {{- '\n</tool_response>' }}
47
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
- {{- '<|im_end|>\n' }}
49
- {%- endif %}
50
- {%- endif %}
51
- {%- endfor %}
52
- {%- if add_generation_prompt %}
53
- {{- '<|im_start|>assistant\n' }}
54
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/config.json DELETED
@@ -1,58 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen2ForCausalLM"
4
- ],
5
- "attention_dropout": 0.0,
6
- "bos_token_id": 151643,
7
- "eos_token_id": 151645,
8
- "hidden_act": "silu",
9
- "hidden_size": 3584,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 18944,
12
- "layer_types": [
13
- "full_attention",
14
- "full_attention",
15
- "full_attention",
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention"
41
- ],
42
- "max_position_embeddings": 4096,
43
- "max_window_layers": 28,
44
- "model_type": "qwen2",
45
- "num_attention_heads": 28,
46
- "num_hidden_layers": 28,
47
- "num_key_value_heads": 4,
48
- "rms_norm_eps": 1e-06,
49
- "rope_scaling": null,
50
- "rope_theta": 10000.0,
51
- "sliding_window": null,
52
- "tie_word_embeddings": false,
53
- "torch_dtype": "float32",
54
- "transformers_version": "4.55.0",
55
- "use_cache": false,
56
- "use_sliding_window": false,
57
- "vocab_size": 152064
58
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 151643,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151643,
8
- "transformers_version": "4.55.0"
9
- }
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1309/model.safetensors.index.json DELETED
@@ -1,347 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_parameters": 1903904128,
4
- "total_size": 30462466048
5
- },
6
- "weight_map": {
7
- "lm_head.weight": "model-00007-of-00007.safetensors",
8
- "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
9
- "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
10
- "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
11
- "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
12
- "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
13
- "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
14
- "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
15
- "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
16
- "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
17
- "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
18
- "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
19
- "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
20
- "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
21
- "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
22
- "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
23
- "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
24
- "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
25
- "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
26
- "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
27
- "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
28
- "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
29
- "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
30
- "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
31
- "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
32
- "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
33
- "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
34
- "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
35
- "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
36
- "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
37
- "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
38
- "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
39
- "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
40
- "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
41
- "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
42
- "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
- "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
44
- "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
45
- "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
46
- "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
47
- "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
48
- "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
49
- "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
50
- "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
51
- "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
52
- "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
53
- "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
54
- "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
55
- "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
56
- "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
57
- "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
58
- "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
59
- "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
60
- "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
61
- "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
62
- "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
63
- "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
64
- "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
65
- "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
66
- "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
67
- "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
68
- "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
69
- "model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
70
- "model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
71
- "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
72
- "model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
73
- "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
74
- "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
75
- "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
76
- "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
77
- "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
78
- "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
- "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
80
- "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
81
- "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
82
- "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
83
- "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
84
- "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
85
- "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
86
- "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
87
- "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
88
- "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
89
- "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
90
- "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
91
- "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
92
- "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
93
- "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
94
- "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
95
- "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
96
- "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
97
- "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
98
- "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
99
- "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
100
- "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
101
- "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
102
- "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
103
- "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
104
- "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
105
- "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
106
- "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
107
- "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
108
- "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
109
- "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
110
- "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
111
- "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
112
- "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
113
- "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
114
- "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
- "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
116
- "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
117
- "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
118
- "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
119
- "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
120
- "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
121
- "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
122
- "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
123
- "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
124
- "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
125
- "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
126
- "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
127
- "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
128
- "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
129
- "model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
130
- "model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
131
- "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
132
- "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
133
- "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
134
- "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
135
- "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
136
- "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
137
- "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
138
- "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
139
- "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
140
- "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
141
- "model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
142
- "model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
143
- "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
144
- "model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
145
- "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
146
- "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
147
- "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
148
- "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
149
- "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
150
- "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
- "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
152
- "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
153
- "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
154
- "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
155
- "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
156
- "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
157
- "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
158
- "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
159
- "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
160
- "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
161
- "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
162
- "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
163
- "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
164
- "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
165
- "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
166
- "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
167
- "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
168
- "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
169
- "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
170
- "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
171
- "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
172
- "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
173
- "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
174
- "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
175
- "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
176
- "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
177
- "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
178
- "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
179
- "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
180
- "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
181
- "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
182
- "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
183
- "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
184
- "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
185
- "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
186
- "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
187
- "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
188
- "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
189
- "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
190
- "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
191
- "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
192
- "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
193
- "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
194
- "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
195
- "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
196
- "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
197
- "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
198
- "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
199
- "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
200
- "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
201
- "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
202
- "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
203
- "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
204
- "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
205
- "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
206
- "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
207
- "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
208
- "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
209
- "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
210
- "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
211
- "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
212
- "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
213
- "model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
214
- "model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
215
- "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
216
- "model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
217
- "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
218
- "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
219
- "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
220
- "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
221
- "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
222
- "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
223
- "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
224
- "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
225
- "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
226
- "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
227
- "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
228
- "model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
229
- "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
230
- "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
231
- "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
232
- "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
233
- "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
234
- "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
235
- "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
236
- "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
237
- "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
238
- "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
239
- "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
240
- "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
241
- "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
242
- "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
243
- "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
244
- "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
245
- "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
246
- "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
247
- "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
248
- "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
249
- "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
250
- "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
251
- "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
252
- "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
253
- "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
254
- "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
255
- "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
256
- "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
257
- "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
258
- "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
259
- "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
260
- "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
261
- "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
262
- "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
263
- "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
264
- "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
265
- "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
266
- "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
267
- "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
268
- "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
269
- "model.layers.3.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
270
- "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
271
- "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
272
- "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
273
- "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
274
- "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
275
- "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
276
- "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
277
- "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
278
- "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
279
- "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
280
- "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
281
- "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
282
- "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
283
- "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
284
- "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
285
- "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
286
- "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
287
- "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
288
- "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
289
- "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
290
- "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
291
- "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
292
- "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
293
- "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
294
- "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
295
- "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
296
- "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
297
- "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
298
- "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
299
- "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
300
- "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
301
- "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
302
- "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
303
- "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
304
- "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
305
- "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
306
- "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
307
- "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
308
- "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
309
- "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
310
- "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
311
- "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
312
- "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
313
- "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
314
- "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
315
- "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
316
- "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
317
- "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
318
- "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
319
- "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
320
- "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
321
- "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
322
- "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
323
- "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
324
- "model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
325
- "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
326
- "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
327
- "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
328
- "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
329
- "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
330
- "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
331
- "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
332
- "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
333
- "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
334
- "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
335
- "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
336
- "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
337
- "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
338
- "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
339
- "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
340
- "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
341
- "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
342
- "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
343
- "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
344
- "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
345
- "model.norm.weight": "model-00006-of-00007.safetensors"
346
- }
347
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/special_tokens_map.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>"
16
- ],
17
- "eos_token": {
18
- "content": "<|im_end|>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- "pad_token": {
25
- "content": "<|endoftext|>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- }
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/tokenizer_config.json DELETED
@@ -1,208 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- }
181
- },
182
- "additional_special_tokens": [
183
- "<|im_start|>",
184
- "<|im_end|>",
185
- "<|object_ref_start|>",
186
- "<|object_ref_end|>",
187
- "<|box_start|>",
188
- "<|box_end|>",
189
- "<|quad_start|>",
190
- "<|quad_end|>",
191
- "<|vision_start|>",
192
- "<|vision_end|>",
193
- "<|vision_pad|>",
194
- "<|image_pad|>",
195
- "<|video_pad|>"
196
- ],
197
- "bos_token": null,
198
- "clean_up_tokenization_spaces": false,
199
- "eos_token": "<|im_end|>",
200
- "errors": "replace",
201
- "extra_special_tokens": {},
202
- "model_max_length": 131072,
203
- "pad_token": "<|endoftext|>",
204
- "padding_side": "right",
205
- "split_special_tokens": false,
206
- "tokenizer_class": "Qwen2Tokenizer",
207
- "unk_token": null
208
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1309/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1309/vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1496/added_tokens.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "</tool_call>": 151658,
3
- "<tool_call>": 151657,
4
- "<|box_end|>": 151649,
5
- "<|box_start|>": 151648,
6
- "<|endoftext|>": 151643,
7
- "<|file_sep|>": 151664,
8
- "<|fim_middle|>": 151660,
9
- "<|fim_pad|>": 151662,
10
- "<|fim_prefix|>": 151659,
11
- "<|fim_suffix|>": 151661,
12
- "<|im_end|>": 151645,
13
- "<|im_start|>": 151644,
14
- "<|image_pad|>": 151655,
15
- "<|object_ref_end|>": 151647,
16
- "<|object_ref_start|>": 151646,
17
- "<|quad_end|>": 151651,
18
- "<|quad_start|>": 151650,
19
- "<|repo_name|>": 151663,
20
- "<|video_pad|>": 151656,
21
- "<|vision_end|>": 151653,
22
- "<|vision_pad|>": 151654,
23
- "<|vision_start|>": 151652
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/chat_template.jinja DELETED
@@ -1,54 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0]['role'] == 'system' %}
4
- {{- messages[0]['content'] }}
5
- {%- else %}
6
- {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
7
- {%- endif %}
8
- {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
- {%- for tool in tools %}
10
- {{- "\n" }}
11
- {{- tool | tojson }}
12
- {%- endfor %}
13
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
- {%- else %}
15
- {%- if messages[0]['role'] == 'system' %}
16
- {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
- {%- else %}
18
- {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
19
- {%- endif %}
20
- {%- endif %}
21
- {%- for message in messages %}
22
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
- {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
- {%- elif message.role == "assistant" %}
25
- {{- '<|im_start|>' + message.role }}
26
- {%- if message.content %}
27
- {{- '\n' + message.content }}
28
- {%- endif %}
29
- {%- for tool_call in message.tool_calls %}
30
- {%- if tool_call.function is defined %}
31
- {%- set tool_call = tool_call.function %}
32
- {%- endif %}
33
- {{- '\n<tool_call>\n{"name": "' }}
34
- {{- tool_call.name }}
35
- {{- '", "arguments": ' }}
36
- {{- tool_call.arguments | tojson }}
37
- {{- '}\n</tool_call>' }}
38
- {%- endfor %}
39
- {{- '<|im_end|>\n' }}
40
- {%- elif message.role == "tool" %}
41
- {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
- {{- '<|im_start|>user' }}
43
- {%- endif %}
44
- {{- '\n<tool_response>\n' }}
45
- {{- message.content }}
46
- {{- '\n</tool_response>' }}
47
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
- {{- '<|im_end|>\n' }}
49
- {%- endif %}
50
- {%- endif %}
51
- {%- endfor %}
52
- {%- if add_generation_prompt %}
53
- {{- '<|im_start|>assistant\n' }}
54
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/config.json DELETED
@@ -1,58 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen2ForCausalLM"
4
- ],
5
- "attention_dropout": 0.0,
6
- "bos_token_id": 151643,
7
- "eos_token_id": 151645,
8
- "hidden_act": "silu",
9
- "hidden_size": 3584,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 18944,
12
- "layer_types": [
13
- "full_attention",
14
- "full_attention",
15
- "full_attention",
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention"
41
- ],
42
- "max_position_embeddings": 4096,
43
- "max_window_layers": 28,
44
- "model_type": "qwen2",
45
- "num_attention_heads": 28,
46
- "num_hidden_layers": 28,
47
- "num_key_value_heads": 4,
48
- "rms_norm_eps": 1e-06,
49
- "rope_scaling": null,
50
- "rope_theta": 10000.0,
51
- "sliding_window": null,
52
- "tie_word_embeddings": false,
53
- "torch_dtype": "float32",
54
- "transformers_version": "4.55.0",
55
- "use_cache": false,
56
- "use_sliding_window": false,
57
- "vocab_size": 152064
58
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 151643,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151643,
8
- "transformers_version": "4.55.0"
9
- }
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1496/model.safetensors.index.json DELETED
@@ -1,347 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_parameters": 1903904128,
4
- "total_size": 30462466048
5
- },
6
- "weight_map": {
7
- "lm_head.weight": "model-00007-of-00007.safetensors",
8
- "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
9
- "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
10
- "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
11
- "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
12
- "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
13
- "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
14
- "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
15
- "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
16
- "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
17
- "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
18
- "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
19
- "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
20
- "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
21
- "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
22
- "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
23
- "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
24
- "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
25
- "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
26
- "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
27
- "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
28
- "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
29
- "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
30
- "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
31
- "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
32
- "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
33
- "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
34
- "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
35
- "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
36
- "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
37
- "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
38
- "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
39
- "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
40
- "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
41
- "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
42
- "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
- "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
44
- "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
45
- "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
46
- "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
47
- "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
48
- "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
49
- "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
50
- "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
51
- "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
52
- "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
53
- "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
54
- "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
55
- "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
56
- "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
57
- "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
58
- "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
59
- "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
60
- "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
61
- "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
62
- "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
63
- "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
64
- "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
65
- "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
66
- "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
67
- "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
68
- "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
69
- "model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
70
- "model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
71
- "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
72
- "model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
73
- "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
74
- "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
75
- "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
76
- "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
77
- "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
78
- "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
- "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
80
- "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
81
- "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
82
- "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
83
- "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
84
- "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
85
- "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
86
- "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
87
- "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
88
- "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
89
- "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
90
- "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
91
- "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
92
- "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
93
- "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
94
- "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
95
- "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
96
- "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
97
- "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
98
- "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
99
- "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
100
- "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
101
- "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
102
- "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
103
- "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
104
- "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
105
- "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
106
- "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
107
- "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
108
- "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
109
- "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
110
- "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
111
- "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
112
- "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
113
- "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
114
- "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
- "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
116
- "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
117
- "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
118
- "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
119
- "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
120
- "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
121
- "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
122
- "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
123
- "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
124
- "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
125
- "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
126
- "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
127
- "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
128
- "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
129
- "model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
130
- "model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
131
- "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
132
- "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
133
- "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
134
- "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
135
- "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
136
- "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
137
- "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
138
- "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
139
- "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
140
- "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
141
- "model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
142
- "model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
143
- "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
144
- "model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
145
- "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
146
- "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
147
- "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
148
- "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
149
- "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
150
- "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
- "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
152
- "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
153
- "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
154
- "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
155
- "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
156
- "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
157
- "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
158
- "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
159
- "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
160
- "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
161
- "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
162
- "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
163
- "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
164
- "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
165
- "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
166
- "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
167
- "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
168
- "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
169
- "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
170
- "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
171
- "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
172
- "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
173
- "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
174
- "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
175
- "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
176
- "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
177
- "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
178
- "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
179
- "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
180
- "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
181
- "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
182
- "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
183
- "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
184
- "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
185
- "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
186
- "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
187
- "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
188
- "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
189
- "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
190
- "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
191
- "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
192
- "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
193
- "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
194
- "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
195
- "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
196
- "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
197
- "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
198
- "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
199
- "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
200
- "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
201
- "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
202
- "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
203
- "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
204
- "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
205
- "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
206
- "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
207
- "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
208
- "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
209
- "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
210
- "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
211
- "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
212
- "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
213
- "model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
214
- "model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
215
- "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
216
- "model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
217
- "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
218
- "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
219
- "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
220
- "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
221
- "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
222
- "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
223
- "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
224
- "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
225
- "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
226
- "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
227
- "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
228
- "model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
229
- "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
230
- "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
231
- "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
232
- "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
233
- "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
234
- "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
235
- "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
236
- "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
237
- "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
238
- "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
239
- "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
240
- "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
241
- "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
242
- "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
243
- "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
244
- "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
245
- "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
246
- "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
247
- "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
248
- "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
249
- "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
250
- "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
251
- "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
252
- "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
253
- "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
254
- "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
255
- "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
256
- "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
257
- "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
258
- "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
259
- "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
260
- "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
261
- "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
262
- "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
263
- "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
264
- "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
265
- "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
266
- "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
267
- "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
268
- "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
269
- "model.layers.3.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
270
- "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
271
- "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
272
- "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
273
- "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
274
- "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
275
- "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
276
- "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
277
- "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
278
- "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
279
- "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
280
- "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
281
- "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
282
- "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
283
- "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
284
- "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
285
- "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
286
- "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
287
- "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
288
- "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
289
- "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
290
- "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
291
- "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
292
- "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
293
- "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
294
- "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
295
- "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
296
- "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
297
- "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
298
- "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
299
- "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
300
- "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
301
- "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
302
- "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
303
- "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
304
- "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
305
- "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
306
- "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
307
- "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
308
- "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
309
- "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
310
- "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
311
- "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
312
- "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
313
- "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
314
- "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
315
- "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
316
- "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
317
- "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
318
- "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
319
- "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
320
- "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
321
- "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
322
- "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
323
- "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
324
- "model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
325
- "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
326
- "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
327
- "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
328
- "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
329
- "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
330
- "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
331
- "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
332
- "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
333
- "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
334
- "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
335
- "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
336
- "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
337
- "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
338
- "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
339
- "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
340
- "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
341
- "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
342
- "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
343
- "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
344
- "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
345
- "model.norm.weight": "model-00006-of-00007.safetensors"
346
- }
347
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/special_tokens_map.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>"
16
- ],
17
- "eos_token": {
18
- "content": "<|im_end|>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- "pad_token": {
25
- "content": "<|endoftext|>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- }
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/tokenizer_config.json DELETED
@@ -1,208 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- }
181
- },
182
- "additional_special_tokens": [
183
- "<|im_start|>",
184
- "<|im_end|>",
185
- "<|object_ref_start|>",
186
- "<|object_ref_end|>",
187
- "<|box_start|>",
188
- "<|box_end|>",
189
- "<|quad_start|>",
190
- "<|quad_end|>",
191
- "<|vision_start|>",
192
- "<|vision_end|>",
193
- "<|vision_pad|>",
194
- "<|image_pad|>",
195
- "<|video_pad|>"
196
- ],
197
- "bos_token": null,
198
- "clean_up_tokenization_spaces": false,
199
- "eos_token": "<|im_end|>",
200
- "errors": "replace",
201
- "extra_special_tokens": {},
202
- "model_max_length": 131072,
203
- "pad_token": "<|endoftext|>",
204
- "padding_side": "right",
205
- "split_special_tokens": false,
206
- "tokenizer_class": "Qwen2Tokenizer",
207
- "unk_token": null
208
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1496/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1496/vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1683/added_tokens.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "</tool_call>": 151658,
3
- "<tool_call>": 151657,
4
- "<|box_end|>": 151649,
5
- "<|box_start|>": 151648,
6
- "<|endoftext|>": 151643,
7
- "<|file_sep|>": 151664,
8
- "<|fim_middle|>": 151660,
9
- "<|fim_pad|>": 151662,
10
- "<|fim_prefix|>": 151659,
11
- "<|fim_suffix|>": 151661,
12
- "<|im_end|>": 151645,
13
- "<|im_start|>": 151644,
14
- "<|image_pad|>": 151655,
15
- "<|object_ref_end|>": 151647,
16
- "<|object_ref_start|>": 151646,
17
- "<|quad_end|>": 151651,
18
- "<|quad_start|>": 151650,
19
- "<|repo_name|>": 151663,
20
- "<|video_pad|>": 151656,
21
- "<|vision_end|>": 151653,
22
- "<|vision_pad|>": 151654,
23
- "<|vision_start|>": 151652
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/chat_template.jinja DELETED
@@ -1,54 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0]['role'] == 'system' %}
4
- {{- messages[0]['content'] }}
5
- {%- else %}
6
- {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
7
- {%- endif %}
8
- {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
- {%- for tool in tools %}
10
- {{- "\n" }}
11
- {{- tool | tojson }}
12
- {%- endfor %}
13
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
- {%- else %}
15
- {%- if messages[0]['role'] == 'system' %}
16
- {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
- {%- else %}
18
- {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
19
- {%- endif %}
20
- {%- endif %}
21
- {%- for message in messages %}
22
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
- {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
- {%- elif message.role == "assistant" %}
25
- {{- '<|im_start|>' + message.role }}
26
- {%- if message.content %}
27
- {{- '\n' + message.content }}
28
- {%- endif %}
29
- {%- for tool_call in message.tool_calls %}
30
- {%- if tool_call.function is defined %}
31
- {%- set tool_call = tool_call.function %}
32
- {%- endif %}
33
- {{- '\n<tool_call>\n{"name": "' }}
34
- {{- tool_call.name }}
35
- {{- '", "arguments": ' }}
36
- {{- tool_call.arguments | tojson }}
37
- {{- '}\n</tool_call>' }}
38
- {%- endfor %}
39
- {{- '<|im_end|>\n' }}
40
- {%- elif message.role == "tool" %}
41
- {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
- {{- '<|im_start|>user' }}
43
- {%- endif %}
44
- {{- '\n<tool_response>\n' }}
45
- {{- message.content }}
46
- {{- '\n</tool_response>' }}
47
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
- {{- '<|im_end|>\n' }}
49
- {%- endif %}
50
- {%- endif %}
51
- {%- endfor %}
52
- {%- if add_generation_prompt %}
53
- {{- '<|im_start|>assistant\n' }}
54
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/config.json DELETED
@@ -1,58 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen2ForCausalLM"
4
- ],
5
- "attention_dropout": 0.0,
6
- "bos_token_id": 151643,
7
- "eos_token_id": 151645,
8
- "hidden_act": "silu",
9
- "hidden_size": 3584,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 18944,
12
- "layer_types": [
13
- "full_attention",
14
- "full_attention",
15
- "full_attention",
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention"
41
- ],
42
- "max_position_embeddings": 4096,
43
- "max_window_layers": 28,
44
- "model_type": "qwen2",
45
- "num_attention_heads": 28,
46
- "num_hidden_layers": 28,
47
- "num_key_value_heads": 4,
48
- "rms_norm_eps": 1e-06,
49
- "rope_scaling": null,
50
- "rope_theta": 10000.0,
51
- "sliding_window": null,
52
- "tie_word_embeddings": false,
53
- "torch_dtype": "float32",
54
- "transformers_version": "4.55.0",
55
- "use_cache": false,
56
- "use_sliding_window": false,
57
- "vocab_size": 152064
58
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 151643,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151643,
8
- "transformers_version": "4.55.0"
9
- }
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1683/model.safetensors.index.json DELETED
@@ -1,347 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_parameters": 1903904128,
4
- "total_size": 30462466048
5
- },
6
- "weight_map": {
7
- "lm_head.weight": "model-00007-of-00007.safetensors",
8
- "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
9
- "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
10
- "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
11
- "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
12
- "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
13
- "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
14
- "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
15
- "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
16
- "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
17
- "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
18
- "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
19
- "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
20
- "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
21
- "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
22
- "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
23
- "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
24
- "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
25
- "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
26
- "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
27
- "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
28
- "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
29
- "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
30
- "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
31
- "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
32
- "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
33
- "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
34
- "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
35
- "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
36
- "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
37
- "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
38
- "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
39
- "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
40
- "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
41
- "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
42
- "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
- "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
44
- "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
45
- "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
46
- "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
47
- "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
48
- "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
49
- "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
50
- "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
51
- "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
52
- "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
53
- "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
54
- "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
55
- "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
56
- "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
57
- "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
58
- "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
59
- "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
60
- "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
61
- "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
62
- "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
63
- "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
64
- "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
65
- "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
66
- "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
67
- "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
68
- "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
69
- "model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
70
- "model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
71
- "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
72
- "model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
73
- "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
74
- "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
75
- "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
76
- "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
77
- "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
78
- "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
- "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
80
- "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
81
- "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
82
- "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
83
- "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
84
- "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
85
- "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
86
- "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
87
- "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
88
- "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
89
- "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
90
- "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
91
- "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
92
- "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
93
- "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
94
- "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
95
- "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
96
- "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
97
- "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
98
- "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
99
- "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
100
- "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
101
- "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
102
- "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
103
- "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
104
- "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
105
- "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
106
- "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
107
- "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
108
- "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
109
- "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
110
- "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
111
- "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
112
- "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
113
- "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
114
- "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
- "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
116
- "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
117
- "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
118
- "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
119
- "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
120
- "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
121
- "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
122
- "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
123
- "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
124
- "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
125
- "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
126
- "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
127
- "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
128
- "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
129
- "model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
130
- "model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
131
- "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
132
- "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
133
- "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
134
- "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
135
- "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
136
- "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
137
- "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
138
- "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
139
- "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
140
- "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
141
- "model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
142
- "model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
143
- "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
144
- "model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
145
- "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
146
- "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
147
- "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
148
- "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
149
- "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
150
- "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
- "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
152
- "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
153
- "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
154
- "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
155
- "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
156
- "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
157
- "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
158
- "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
159
- "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
160
- "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
161
- "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
162
- "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
163
- "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
164
- "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
165
- "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
166
- "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
167
- "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
168
- "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
169
- "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
170
- "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
171
- "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
172
- "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
173
- "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
174
- "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
175
- "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
176
- "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
177
- "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
178
- "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
179
- "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
180
- "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
181
- "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
182
- "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
183
- "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
184
- "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
185
- "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
186
- "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
187
- "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
188
- "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
189
- "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
190
- "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
191
- "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
192
- "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
193
- "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
194
- "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
195
- "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
196
- "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
197
- "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
198
- "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
199
- "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
200
- "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
201
- "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
202
- "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
203
- "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
204
- "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
205
- "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
206
- "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
207
- "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
208
- "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
209
- "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
210
- "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
211
- "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
212
- "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
213
- "model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
214
- "model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
215
- "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
216
- "model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
217
- "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
218
- "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
219
- "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
220
- "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
221
- "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
222
- "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
223
- "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
224
- "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
225
- "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
226
- "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
227
- "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
228
- "model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
229
- "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
230
- "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
231
- "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
232
- "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
233
- "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
234
- "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
235
- "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
236
- "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
237
- "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
238
- "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
239
- "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
240
- "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
241
- "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
242
- "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
243
- "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
244
- "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
245
- "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
246
- "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
247
- "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
248
- "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
249
- "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
250
- "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
251
- "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
252
- "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
253
- "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
254
- "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
255
- "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
256
- "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
257
- "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
258
- "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
259
- "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
260
- "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
261
- "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
262
- "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
263
- "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
264
- "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
265
- "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
266
- "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
267
- "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
268
- "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
269
- "model.layers.3.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
270
- "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
271
- "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
272
- "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
273
- "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
274
- "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
275
- "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
276
- "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
277
- "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
278
- "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
279
- "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
280
- "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
281
- "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
282
- "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
283
- "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
284
- "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
285
- "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
286
- "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
287
- "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
288
- "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
289
- "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
290
- "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
291
- "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
292
- "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
293
- "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
294
- "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
295
- "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
296
- "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
297
- "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
298
- "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
299
- "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
300
- "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
301
- "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
302
- "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
303
- "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
304
- "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
305
- "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
306
- "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
307
- "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
308
- "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
309
- "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
310
- "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
311
- "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
312
- "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
313
- "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
314
- "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
315
- "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
316
- "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
317
- "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
318
- "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
319
- "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
320
- "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
321
- "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
322
- "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
323
- "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
324
- "model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
325
- "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
326
- "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
327
- "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
328
- "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
329
- "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
330
- "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
331
- "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
332
- "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
333
- "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
334
- "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
335
- "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
336
- "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
337
- "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
338
- "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
339
- "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
340
- "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
341
- "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
342
- "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
343
- "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
344
- "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
345
- "model.norm.weight": "model-00006-of-00007.safetensors"
346
- }
347
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/special_tokens_map.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>"
16
- ],
17
- "eos_token": {
18
- "content": "<|im_end|>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- "pad_token": {
25
- "content": "<|endoftext|>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- }
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/tokenizer_config.json DELETED
@@ -1,208 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- }
181
- },
182
- "additional_special_tokens": [
183
- "<|im_start|>",
184
- "<|im_end|>",
185
- "<|object_ref_start|>",
186
- "<|object_ref_end|>",
187
- "<|box_start|>",
188
- "<|box_end|>",
189
- "<|quad_start|>",
190
- "<|quad_end|>",
191
- "<|vision_start|>",
192
- "<|vision_end|>",
193
- "<|vision_pad|>",
194
- "<|image_pad|>",
195
- "<|video_pad|>"
196
- ],
197
- "bos_token": null,
198
- "clean_up_tokenization_spaces": false,
199
- "eos_token": "<|im_end|>",
200
- "errors": "replace",
201
- "extra_special_tokens": {},
202
- "model_max_length": 131072,
203
- "pad_token": "<|endoftext|>",
204
- "padding_side": "right",
205
- "split_special_tokens": false,
206
- "tokenizer_class": "Qwen2Tokenizer",
207
- "unk_token": null
208
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-1683/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-1683/vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-187/added_tokens.json DELETED
@@ -1,24 +0,0 @@
1
- {
2
- "</tool_call>": 151658,
3
- "<tool_call>": 151657,
4
- "<|box_end|>": 151649,
5
- "<|box_start|>": 151648,
6
- "<|endoftext|>": 151643,
7
- "<|file_sep|>": 151664,
8
- "<|fim_middle|>": 151660,
9
- "<|fim_pad|>": 151662,
10
- "<|fim_prefix|>": 151659,
11
- "<|fim_suffix|>": 151661,
12
- "<|im_end|>": 151645,
13
- "<|im_start|>": 151644,
14
- "<|image_pad|>": 151655,
15
- "<|object_ref_end|>": 151647,
16
- "<|object_ref_start|>": 151646,
17
- "<|quad_end|>": 151651,
18
- "<|quad_start|>": 151650,
19
- "<|repo_name|>": 151663,
20
- "<|video_pad|>": 151656,
21
- "<|vision_end|>": 151653,
22
- "<|vision_pad|>": 151654,
23
- "<|vision_start|>": 151652
24
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/chat_template.jinja DELETED
@@ -1,54 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0]['role'] == 'system' %}
4
- {{- messages[0]['content'] }}
5
- {%- else %}
6
- {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
7
- {%- endif %}
8
- {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
- {%- for tool in tools %}
10
- {{- "\n" }}
11
- {{- tool | tojson }}
12
- {%- endfor %}
13
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
- {%- else %}
15
- {%- if messages[0]['role'] == 'system' %}
16
- {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
- {%- else %}
18
- {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
19
- {%- endif %}
20
- {%- endif %}
21
- {%- for message in messages %}
22
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
- {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
- {%- elif message.role == "assistant" %}
25
- {{- '<|im_start|>' + message.role }}
26
- {%- if message.content %}
27
- {{- '\n' + message.content }}
28
- {%- endif %}
29
- {%- for tool_call in message.tool_calls %}
30
- {%- if tool_call.function is defined %}
31
- {%- set tool_call = tool_call.function %}
32
- {%- endif %}
33
- {{- '\n<tool_call>\n{"name": "' }}
34
- {{- tool_call.name }}
35
- {{- '", "arguments": ' }}
36
- {{- tool_call.arguments | tojson }}
37
- {{- '}\n</tool_call>' }}
38
- {%- endfor %}
39
- {{- '<|im_end|>\n' }}
40
- {%- elif message.role == "tool" %}
41
- {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
- {{- '<|im_start|>user' }}
43
- {%- endif %}
44
- {{- '\n<tool_response>\n' }}
45
- {{- message.content }}
46
- {{- '\n</tool_response>' }}
47
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
- {{- '<|im_end|>\n' }}
49
- {%- endif %}
50
- {%- endif %}
51
- {%- endfor %}
52
- {%- if add_generation_prompt %}
53
- {{- '<|im_start|>assistant\n' }}
54
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/config.json DELETED
@@ -1,58 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen2ForCausalLM"
4
- ],
5
- "attention_dropout": 0.0,
6
- "bos_token_id": 151643,
7
- "eos_token_id": 151645,
8
- "hidden_act": "silu",
9
- "hidden_size": 3584,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 18944,
12
- "layer_types": [
13
- "full_attention",
14
- "full_attention",
15
- "full_attention",
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention"
41
- ],
42
- "max_position_embeddings": 4096,
43
- "max_window_layers": 28,
44
- "model_type": "qwen2",
45
- "num_attention_heads": 28,
46
- "num_hidden_layers": 28,
47
- "num_key_value_heads": 4,
48
- "rms_norm_eps": 1e-06,
49
- "rope_scaling": null,
50
- "rope_theta": 10000.0,
51
- "sliding_window": null,
52
- "tie_word_embeddings": false,
53
- "torch_dtype": "float32",
54
- "transformers_version": "4.55.0",
55
- "use_cache": false,
56
- "use_sliding_window": false,
57
- "vocab_size": 152064
58
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 151643,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151643,
8
- "transformers_version": "4.55.0"
9
- }
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/merges.txt DELETED
The diff for this file is too large to render. See raw diff
 
limo_filtered_combined/checkpoint-187/model.safetensors.index.json DELETED
@@ -1,347 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_parameters": 1903904128,
4
- "total_size": 30462466048
5
- },
6
- "weight_map": {
7
- "lm_head.weight": "model-00007-of-00007.safetensors",
8
- "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
9
- "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
10
- "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
11
- "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
12
- "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
13
- "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
14
- "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
15
- "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
16
- "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
17
- "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
18
- "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
19
- "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
20
- "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
21
- "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
22
- "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
23
- "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
24
- "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
25
- "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
26
- "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
27
- "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
28
- "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
29
- "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
30
- "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
31
- "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
32
- "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
33
- "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
34
- "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
35
- "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
36
- "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
37
- "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
38
- "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
39
- "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
40
- "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
41
- "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
42
- "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
- "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
44
- "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
45
- "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
46
- "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
47
- "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
48
- "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
49
- "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
50
- "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
51
- "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
52
- "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
53
- "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
54
- "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
55
- "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
56
- "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
57
- "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
58
- "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
59
- "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
60
- "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
61
- "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
62
- "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
63
- "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
64
- "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
65
- "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
66
- "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
67
- "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
68
- "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
69
- "model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
70
- "model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
71
- "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
72
- "model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
73
- "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
74
- "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
75
- "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
76
- "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
77
- "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
78
- "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
79
- "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
80
- "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
81
- "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
82
- "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
83
- "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
84
- "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
85
- "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
86
- "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
87
- "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
88
- "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
89
- "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
90
- "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
91
- "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
92
- "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
93
- "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
94
- "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
95
- "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
96
- "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
97
- "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
98
- "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
99
- "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
100
- "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
101
- "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
102
- "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
103
- "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
104
- "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
105
- "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
106
- "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
107
- "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
108
- "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
109
- "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
110
- "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
111
- "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
112
- "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
113
- "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
114
- "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
- "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
116
- "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
117
- "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
118
- "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
119
- "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
120
- "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
121
- "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
122
- "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
123
- "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
124
- "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
125
- "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
126
- "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
127
- "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
128
- "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
129
- "model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
130
- "model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
131
- "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
132
- "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
133
- "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
134
- "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
135
- "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
136
- "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
137
- "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
138
- "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
139
- "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
140
- "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
141
- "model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
142
- "model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
143
- "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
144
- "model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
145
- "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
146
- "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
147
- "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
148
- "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
149
- "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
150
- "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
- "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
152
- "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
153
- "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
154
- "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
155
- "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
156
- "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
157
- "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
158
- "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
159
- "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
160
- "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
161
- "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
162
- "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
163
- "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
164
- "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
165
- "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
166
- "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
167
- "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
168
- "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
169
- "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
170
- "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
171
- "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
172
- "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
173
- "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
174
- "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
175
- "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
176
- "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
177
- "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
178
- "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
179
- "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
180
- "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
181
- "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
182
- "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
183
- "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
184
- "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
185
- "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
186
- "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
187
- "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
188
- "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
189
- "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
190
- "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
191
- "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
192
- "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
193
- "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
194
- "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
195
- "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
196
- "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
197
- "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
198
- "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
199
- "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
200
- "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
201
- "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
202
- "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
203
- "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
204
- "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
205
- "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
206
- "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
207
- "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
208
- "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
209
- "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
210
- "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
211
- "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
212
- "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
213
- "model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
214
- "model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
215
- "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
216
- "model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
217
- "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
218
- "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
219
- "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
220
- "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
221
- "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
222
- "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
223
- "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
224
- "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
225
- "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
226
- "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
227
- "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
228
- "model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
229
- "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
230
- "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
231
- "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
232
- "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
233
- "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
234
- "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
235
- "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
236
- "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
237
- "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
238
- "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
239
- "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
240
- "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
241
- "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
242
- "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
243
- "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
244
- "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
245
- "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
246
- "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
247
- "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
248
- "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
249
- "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
250
- "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
251
- "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
252
- "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
253
- "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
254
- "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
255
- "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
256
- "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
257
- "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
258
- "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
259
- "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
260
- "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
261
- "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
262
- "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
263
- "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
264
- "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
265
- "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
266
- "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
267
- "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
268
- "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
269
- "model.layers.3.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
270
- "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
271
- "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
272
- "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
273
- "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
274
- "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
275
- "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
276
- "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
277
- "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
278
- "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
279
- "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
280
- "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
281
- "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
282
- "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
283
- "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
284
- "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
285
- "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
286
- "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
287
- "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
288
- "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
289
- "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
290
- "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
291
- "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
292
- "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
293
- "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
294
- "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
295
- "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
296
- "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
297
- "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
298
- "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
299
- "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
300
- "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
301
- "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
302
- "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
303
- "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
304
- "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
305
- "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
306
- "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
307
- "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
308
- "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
309
- "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
310
- "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
311
- "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
312
- "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
313
- "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
314
- "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
315
- "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
316
- "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
317
- "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
318
- "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
319
- "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
320
- "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
321
- "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
322
- "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
323
- "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
324
- "model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
325
- "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
326
- "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
327
- "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
328
- "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
329
- "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
330
- "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
331
- "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
332
- "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
333
- "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
334
- "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
335
- "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
336
- "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
337
- "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
338
- "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
339
- "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
340
- "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
341
- "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
342
- "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
343
- "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
344
- "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
345
- "model.norm.weight": "model-00006-of-00007.safetensors"
346
- }
347
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/special_tokens_map.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "additional_special_tokens": [
3
- "<|im_start|>",
4
- "<|im_end|>",
5
- "<|object_ref_start|>",
6
- "<|object_ref_end|>",
7
- "<|box_start|>",
8
- "<|box_end|>",
9
- "<|quad_start|>",
10
- "<|quad_end|>",
11
- "<|vision_start|>",
12
- "<|vision_end|>",
13
- "<|vision_pad|>",
14
- "<|image_pad|>",
15
- "<|video_pad|>"
16
- ],
17
- "eos_token": {
18
- "content": "<|im_end|>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- "pad_token": {
25
- "content": "<|endoftext|>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- }
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/tokenizer_config.json DELETED
@@ -1,208 +0,0 @@
1
- {
2
- "add_bos_token": false,
3
- "add_prefix_space": false,
4
- "added_tokens_decoder": {
5
- "151643": {
6
- "content": "<|endoftext|>",
7
- "lstrip": false,
8
- "normalized": false,
9
- "rstrip": false,
10
- "single_word": false,
11
- "special": true
12
- },
13
- "151644": {
14
- "content": "<|im_start|>",
15
- "lstrip": false,
16
- "normalized": false,
17
- "rstrip": false,
18
- "single_word": false,
19
- "special": true
20
- },
21
- "151645": {
22
- "content": "<|im_end|>",
23
- "lstrip": false,
24
- "normalized": false,
25
- "rstrip": false,
26
- "single_word": false,
27
- "special": true
28
- },
29
- "151646": {
30
- "content": "<|object_ref_start|>",
31
- "lstrip": false,
32
- "normalized": false,
33
- "rstrip": false,
34
- "single_word": false,
35
- "special": true
36
- },
37
- "151647": {
38
- "content": "<|object_ref_end|>",
39
- "lstrip": false,
40
- "normalized": false,
41
- "rstrip": false,
42
- "single_word": false,
43
- "special": true
44
- },
45
- "151648": {
46
- "content": "<|box_start|>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false,
51
- "special": true
52
- },
53
- "151649": {
54
- "content": "<|box_end|>",
55
- "lstrip": false,
56
- "normalized": false,
57
- "rstrip": false,
58
- "single_word": false,
59
- "special": true
60
- },
61
- "151650": {
62
- "content": "<|quad_start|>",
63
- "lstrip": false,
64
- "normalized": false,
65
- "rstrip": false,
66
- "single_word": false,
67
- "special": true
68
- },
69
- "151651": {
70
- "content": "<|quad_end|>",
71
- "lstrip": false,
72
- "normalized": false,
73
- "rstrip": false,
74
- "single_word": false,
75
- "special": true
76
- },
77
- "151652": {
78
- "content": "<|vision_start|>",
79
- "lstrip": false,
80
- "normalized": false,
81
- "rstrip": false,
82
- "single_word": false,
83
- "special": true
84
- },
85
- "151653": {
86
- "content": "<|vision_end|>",
87
- "lstrip": false,
88
- "normalized": false,
89
- "rstrip": false,
90
- "single_word": false,
91
- "special": true
92
- },
93
- "151654": {
94
- "content": "<|vision_pad|>",
95
- "lstrip": false,
96
- "normalized": false,
97
- "rstrip": false,
98
- "single_word": false,
99
- "special": true
100
- },
101
- "151655": {
102
- "content": "<|image_pad|>",
103
- "lstrip": false,
104
- "normalized": false,
105
- "rstrip": false,
106
- "single_word": false,
107
- "special": true
108
- },
109
- "151656": {
110
- "content": "<|video_pad|>",
111
- "lstrip": false,
112
- "normalized": false,
113
- "rstrip": false,
114
- "single_word": false,
115
- "special": true
116
- },
117
- "151657": {
118
- "content": "<tool_call>",
119
- "lstrip": false,
120
- "normalized": false,
121
- "rstrip": false,
122
- "single_word": false,
123
- "special": false
124
- },
125
- "151658": {
126
- "content": "</tool_call>",
127
- "lstrip": false,
128
- "normalized": false,
129
- "rstrip": false,
130
- "single_word": false,
131
- "special": false
132
- },
133
- "151659": {
134
- "content": "<|fim_prefix|>",
135
- "lstrip": false,
136
- "normalized": false,
137
- "rstrip": false,
138
- "single_word": false,
139
- "special": false
140
- },
141
- "151660": {
142
- "content": "<|fim_middle|>",
143
- "lstrip": false,
144
- "normalized": false,
145
- "rstrip": false,
146
- "single_word": false,
147
- "special": false
148
- },
149
- "151661": {
150
- "content": "<|fim_suffix|>",
151
- "lstrip": false,
152
- "normalized": false,
153
- "rstrip": false,
154
- "single_word": false,
155
- "special": false
156
- },
157
- "151662": {
158
- "content": "<|fim_pad|>",
159
- "lstrip": false,
160
- "normalized": false,
161
- "rstrip": false,
162
- "single_word": false,
163
- "special": false
164
- },
165
- "151663": {
166
- "content": "<|repo_name|>",
167
- "lstrip": false,
168
- "normalized": false,
169
- "rstrip": false,
170
- "single_word": false,
171
- "special": false
172
- },
173
- "151664": {
174
- "content": "<|file_sep|>",
175
- "lstrip": false,
176
- "normalized": false,
177
- "rstrip": false,
178
- "single_word": false,
179
- "special": false
180
- }
181
- },
182
- "additional_special_tokens": [
183
- "<|im_start|>",
184
- "<|im_end|>",
185
- "<|object_ref_start|>",
186
- "<|object_ref_end|>",
187
- "<|box_start|>",
188
- "<|box_end|>",
189
- "<|quad_start|>",
190
- "<|quad_end|>",
191
- "<|vision_start|>",
192
- "<|vision_end|>",
193
- "<|vision_pad|>",
194
- "<|image_pad|>",
195
- "<|video_pad|>"
196
- ],
197
- "bos_token": null,
198
- "clean_up_tokenization_spaces": false,
199
- "eos_token": "<|im_end|>",
200
- "errors": "replace",
201
- "extra_special_tokens": {},
202
- "model_max_length": 131072,
203
- "pad_token": "<|endoftext|>",
204
- "padding_side": "right",
205
- "split_special_tokens": false,
206
- "tokenizer_class": "Qwen2Tokenizer",
207
- "unk_token": null
208
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/trainer_state.json DELETED
@@ -1,1343 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 1.0,
6
- "eval_steps": 500,
7
- "global_step": 187,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "epoch": 0.0053475935828877,
14
- "grad_norm": 32.667198181152344,
15
- "learning_rate": 5e-06,
16
- "loss": 3.2539,
17
- "step": 1
18
- },
19
- {
20
- "epoch": 0.0106951871657754,
21
- "grad_norm": 38.2481803894043,
22
- "learning_rate": 4.99999647201733e-06,
23
- "loss": 6.258,
24
- "step": 2
25
- },
26
- {
27
- "epoch": 0.016042780748663103,
28
- "grad_norm": 26.6931209564209,
29
- "learning_rate": 4.999985888079276e-06,
30
- "loss": 2.4767,
31
- "step": 3
32
- },
33
- {
34
- "epoch": 0.0213903743315508,
35
- "grad_norm": 36.4799919128418,
36
- "learning_rate": 4.999968248215712e-06,
37
- "loss": 5.4026,
38
- "step": 4
39
- },
40
- {
41
- "epoch": 0.026737967914438502,
42
- "grad_norm": 23.325607299804688,
43
- "learning_rate": 4.999943552476422e-06,
44
- "loss": 3.818,
45
- "step": 5
46
- },
47
- {
48
- "epoch": 0.03208556149732621,
49
- "grad_norm": 17.09689712524414,
50
- "learning_rate": 4.999911800931108e-06,
51
- "loss": 2.7186,
52
- "step": 6
53
- },
54
- {
55
- "epoch": 0.0374331550802139,
56
- "grad_norm": 6.150149345397949,
57
- "learning_rate": 4.999872993669387e-06,
58
- "loss": 1.2419,
59
- "step": 7
60
- },
61
- {
62
- "epoch": 0.0427807486631016,
63
- "grad_norm": 8.962457656860352,
64
- "learning_rate": 4.999827130800785e-06,
65
- "loss": 2.443,
66
- "step": 8
67
- },
68
- {
69
- "epoch": 0.0481283422459893,
70
- "grad_norm": 17.777889251708984,
71
- "learning_rate": 4.999774212454746e-06,
72
- "loss": 3.1664,
73
- "step": 9
74
- },
75
- {
76
- "epoch": 0.053475935828877004,
77
- "grad_norm": 6.9644694328308105,
78
- "learning_rate": 4.999714238780626e-06,
79
- "loss": 2.4137,
80
- "step": 10
81
- },
82
- {
83
- "epoch": 0.058823529411764705,
84
- "grad_norm": 7.578589916229248,
85
- "learning_rate": 4.999647209947694e-06,
86
- "loss": 2.2937,
87
- "step": 11
88
- },
89
- {
90
- "epoch": 0.06417112299465241,
91
- "grad_norm": 5.47304630279541,
92
- "learning_rate": 4.999573126145132e-06,
93
- "loss": 2.1922,
94
- "step": 12
95
- },
96
- {
97
- "epoch": 0.06951871657754011,
98
- "grad_norm": 4.273566246032715,
99
- "learning_rate": 4.999491987582032e-06,
100
- "loss": 1.5914,
101
- "step": 13
102
- },
103
- {
104
- "epoch": 0.0748663101604278,
105
- "grad_norm": 7.62272310256958,
106
- "learning_rate": 4.999403794487399e-06,
107
- "loss": 2.5434,
108
- "step": 14
109
- },
110
- {
111
- "epoch": 0.08021390374331551,
112
- "grad_norm": 4.374003887176514,
113
- "learning_rate": 4.999308547110147e-06,
114
- "loss": 1.6044,
115
- "step": 15
116
- },
117
- {
118
- "epoch": 0.0855614973262032,
119
- "grad_norm": 3.7834177017211914,
120
- "learning_rate": 4.9992062457191005e-06,
121
- "loss": 1.6413,
122
- "step": 16
123
- },
124
- {
125
- "epoch": 0.09090909090909091,
126
- "grad_norm": 3.5481460094451904,
127
- "learning_rate": 4.999096890602996e-06,
128
- "loss": 1.601,
129
- "step": 17
130
- },
131
- {
132
- "epoch": 0.0962566844919786,
133
- "grad_norm": 4.520628452301025,
134
- "learning_rate": 4.998980482070473e-06,
135
- "loss": 1.7445,
136
- "step": 18
137
- },
138
- {
139
- "epoch": 0.10160427807486631,
140
- "grad_norm": 4.576196670532227,
141
- "learning_rate": 4.998857020450084e-06,
142
- "loss": 2.3176,
143
- "step": 19
144
- },
145
- {
146
- "epoch": 0.10695187165775401,
147
- "grad_norm": 3.1453230381011963,
148
- "learning_rate": 4.998726506090283e-06,
149
- "loss": 1.3387,
150
- "step": 20
151
- },
152
- {
153
- "epoch": 0.11229946524064172,
154
- "grad_norm": 2.1666250228881836,
155
- "learning_rate": 4.998588939359435e-06,
156
- "loss": 1.0422,
157
- "step": 21
158
- },
159
- {
160
- "epoch": 0.11764705882352941,
161
- "grad_norm": 4.155343532562256,
162
- "learning_rate": 4.998444320645803e-06,
163
- "loss": 1.8809,
164
- "step": 22
165
- },
166
- {
167
- "epoch": 0.12299465240641712,
168
- "grad_norm": 3.580847978591919,
169
- "learning_rate": 4.998292650357558e-06,
170
- "loss": 1.5926,
171
- "step": 23
172
- },
173
- {
174
- "epoch": 0.12834224598930483,
175
- "grad_norm": 5.140923976898193,
176
- "learning_rate": 4.998133928922773e-06,
177
- "loss": 2.4575,
178
- "step": 24
179
- },
180
- {
181
- "epoch": 0.13368983957219252,
182
- "grad_norm": 4.047446250915527,
183
- "learning_rate": 4.99796815678942e-06,
184
- "loss": 1.3485,
185
- "step": 25
186
- },
187
- {
188
- "epoch": 0.13903743315508021,
189
- "grad_norm": 4.0677571296691895,
190
- "learning_rate": 4.997795334425372e-06,
191
- "loss": 1.9172,
192
- "step": 26
193
- },
194
- {
195
- "epoch": 0.1443850267379679,
196
- "grad_norm": 5.883276462554932,
197
- "learning_rate": 4.997615462318403e-06,
198
- "loss": 2.1168,
199
- "step": 27
200
- },
201
- {
202
- "epoch": 0.1497326203208556,
203
- "grad_norm": 3.6615514755249023,
204
- "learning_rate": 4.997428540976177e-06,
205
- "loss": 1.5822,
206
- "step": 28
207
- },
208
- {
209
- "epoch": 0.15508021390374332,
210
- "grad_norm": 2.608039617538452,
211
- "learning_rate": 4.997234570926263e-06,
212
- "loss": 1.2184,
213
- "step": 29
214
- },
215
- {
216
- "epoch": 0.16042780748663102,
217
- "grad_norm": 2.280423879623413,
218
- "learning_rate": 4.997033552716116e-06,
219
- "loss": 1.0216,
220
- "step": 30
221
- },
222
- {
223
- "epoch": 0.1657754010695187,
224
- "grad_norm": 1.7143268585205078,
225
- "learning_rate": 4.9968254869130885e-06,
226
- "loss": 0.9795,
227
- "step": 31
228
- },
229
- {
230
- "epoch": 0.1711229946524064,
231
- "grad_norm": 1.4858453273773193,
232
- "learning_rate": 4.996610374104422e-06,
233
- "loss": 0.7698,
234
- "step": 32
235
- },
236
- {
237
- "epoch": 0.17647058823529413,
238
- "grad_norm": 1.51152503490448,
239
- "learning_rate": 4.9963882148972475e-06,
240
- "loss": 1.3918,
241
- "step": 33
242
- },
243
- {
244
- "epoch": 0.18181818181818182,
245
- "grad_norm": 1.6170848608016968,
246
- "learning_rate": 4.996159009918586e-06,
247
- "loss": 1.1074,
248
- "step": 34
249
- },
250
- {
251
- "epoch": 0.18716577540106952,
252
- "grad_norm": 2.591637372970581,
253
- "learning_rate": 4.9959227598153395e-06,
254
- "loss": 1.4097,
255
- "step": 35
256
- },
257
- {
258
- "epoch": 0.1925133689839572,
259
- "grad_norm": 2.9409682750701904,
260
- "learning_rate": 4.9956794652542994e-06,
261
- "loss": 1.6475,
262
- "step": 36
263
- },
264
- {
265
- "epoch": 0.19786096256684493,
266
- "grad_norm": 1.9114937782287598,
267
- "learning_rate": 4.9954291269221364e-06,
268
- "loss": 1.0298,
269
- "step": 37
270
- },
271
- {
272
- "epoch": 0.20320855614973263,
273
- "grad_norm": 4.106937408447266,
274
- "learning_rate": 4.995171745525401e-06,
275
- "loss": 1.6997,
276
- "step": 38
277
- },
278
- {
279
- "epoch": 0.20855614973262032,
280
- "grad_norm": 4.7484822273254395,
281
- "learning_rate": 4.994907321790524e-06,
282
- "loss": 1.4041,
283
- "step": 39
284
- },
285
- {
286
- "epoch": 0.21390374331550802,
287
- "grad_norm": 2.5232057571411133,
288
- "learning_rate": 4.994635856463811e-06,
289
- "loss": 1.023,
290
- "step": 40
291
- },
292
- {
293
- "epoch": 0.2192513368983957,
294
- "grad_norm": 2.975825548171997,
295
- "learning_rate": 4.994357350311441e-06,
296
- "loss": 1.6556,
297
- "step": 41
298
- },
299
- {
300
- "epoch": 0.22459893048128343,
301
- "grad_norm": 2.3416595458984375,
302
- "learning_rate": 4.994071804119467e-06,
303
- "loss": 1.2464,
304
- "step": 42
305
- },
306
- {
307
- "epoch": 0.22994652406417113,
308
- "grad_norm": 3.6734139919281006,
309
- "learning_rate": 4.993779218693811e-06,
310
- "loss": 1.8306,
311
- "step": 43
312
- },
313
- {
314
- "epoch": 0.23529411764705882,
315
- "grad_norm": 2.287463903427124,
316
- "learning_rate": 4.99347959486026e-06,
317
- "loss": 1.0122,
318
- "step": 44
319
- },
320
- {
321
- "epoch": 0.24064171122994651,
322
- "grad_norm": 1.5980703830718994,
323
- "learning_rate": 4.99317293346447e-06,
324
- "loss": 0.8706,
325
- "step": 45
326
- },
327
- {
328
- "epoch": 0.24598930481283424,
329
- "grad_norm": 1.4346195459365845,
330
- "learning_rate": 4.992859235371958e-06,
331
- "loss": 0.7815,
332
- "step": 46
333
- },
334
- {
335
- "epoch": 0.25133689839572193,
336
- "grad_norm": 1.635718822479248,
337
- "learning_rate": 4.992538501468101e-06,
338
- "loss": 0.8891,
339
- "step": 47
340
- },
341
- {
342
- "epoch": 0.25668449197860965,
343
- "grad_norm": 3.2847158908843994,
344
- "learning_rate": 4.992210732658132e-06,
345
- "loss": 1.3393,
346
- "step": 48
347
- },
348
- {
349
- "epoch": 0.2620320855614973,
350
- "grad_norm": 3.3003337383270264,
351
- "learning_rate": 4.991875929867143e-06,
352
- "loss": 1.4412,
353
- "step": 49
354
- },
355
- {
356
- "epoch": 0.26737967914438504,
357
- "grad_norm": 1.588843584060669,
358
- "learning_rate": 4.991534094040077e-06,
359
- "loss": 0.8567,
360
- "step": 50
361
- },
362
- {
363
- "epoch": 0.2727272727272727,
364
- "grad_norm": 1.4450788497924805,
365
- "learning_rate": 4.991185226141726e-06,
366
- "loss": 0.8855,
367
- "step": 51
368
- },
369
- {
370
- "epoch": 0.27807486631016043,
371
- "grad_norm": 1.6408952474594116,
372
- "learning_rate": 4.990829327156729e-06,
373
- "loss": 1.1081,
374
- "step": 52
375
- },
376
- {
377
- "epoch": 0.28342245989304815,
378
- "grad_norm": 1.3315808773040771,
379
- "learning_rate": 4.990466398089571e-06,
380
- "loss": 0.9124,
381
- "step": 53
382
- },
383
- {
384
- "epoch": 0.2887700534759358,
385
- "grad_norm": 1.460076928138733,
386
- "learning_rate": 4.99009643996458e-06,
387
- "loss": 0.6002,
388
- "step": 54
389
- },
390
- {
391
- "epoch": 0.29411764705882354,
392
- "grad_norm": 1.4954642057418823,
393
- "learning_rate": 4.989719453825918e-06,
394
- "loss": 0.7522,
395
- "step": 55
396
- },
397
- {
398
- "epoch": 0.2994652406417112,
399
- "grad_norm": 1.6860841512680054,
400
- "learning_rate": 4.989335440737587e-06,
401
- "loss": 0.7829,
402
- "step": 56
403
- },
404
- {
405
- "epoch": 0.3048128342245989,
406
- "grad_norm": 1.5118118524551392,
407
- "learning_rate": 4.9889444017834185e-06,
408
- "loss": 0.9124,
409
- "step": 57
410
- },
411
- {
412
- "epoch": 0.31016042780748665,
413
- "grad_norm": 1.4117275476455688,
414
- "learning_rate": 4.988546338067078e-06,
415
- "loss": 0.9708,
416
- "step": 58
417
- },
418
- {
419
- "epoch": 0.3155080213903743,
420
- "grad_norm": 2.2665367126464844,
421
- "learning_rate": 4.988141250712053e-06,
422
- "loss": 1.1277,
423
- "step": 59
424
- },
425
- {
426
- "epoch": 0.32085561497326204,
427
- "grad_norm": 1.3910932540893555,
428
- "learning_rate": 4.987729140861657e-06,
429
- "loss": 0.9477,
430
- "step": 60
431
- },
432
- {
433
- "epoch": 0.32620320855614976,
434
- "grad_norm": 1.618573784828186,
435
- "learning_rate": 4.987310009679023e-06,
436
- "loss": 0.9895,
437
- "step": 61
438
- },
439
- {
440
- "epoch": 0.3315508021390374,
441
- "grad_norm": 1.3848469257354736,
442
- "learning_rate": 4.986883858347101e-06,
443
- "loss": 0.8927,
444
- "step": 62
445
- },
446
- {
447
- "epoch": 0.33689839572192515,
448
- "grad_norm": 1.4412480592727661,
449
- "learning_rate": 4.986450688068655e-06,
450
- "loss": 0.657,
451
- "step": 63
452
- },
453
- {
454
- "epoch": 0.3422459893048128,
455
- "grad_norm": 1.462384819984436,
456
- "learning_rate": 4.986010500066258e-06,
457
- "loss": 0.8561,
458
- "step": 64
459
- },
460
- {
461
- "epoch": 0.34759358288770054,
462
- "grad_norm": 1.3507061004638672,
463
- "learning_rate": 4.985563295582292e-06,
464
- "loss": 0.8016,
465
- "step": 65
466
- },
467
- {
468
- "epoch": 0.35294117647058826,
469
- "grad_norm": 2.146437406539917,
470
- "learning_rate": 4.98510907587894e-06,
471
- "loss": 0.9754,
472
- "step": 66
473
- },
474
- {
475
- "epoch": 0.3582887700534759,
476
- "grad_norm": 2.181367874145508,
477
- "learning_rate": 4.984647842238185e-06,
478
- "loss": 1.2643,
479
- "step": 67
480
- },
481
- {
482
- "epoch": 0.36363636363636365,
483
- "grad_norm": 1.5960901975631714,
484
- "learning_rate": 4.984179595961806e-06,
485
- "loss": 0.6543,
486
- "step": 68
487
- },
488
- {
489
- "epoch": 0.3689839572192513,
490
- "grad_norm": 1.0785574913024902,
491
- "learning_rate": 4.983704338371375e-06,
492
- "loss": 0.7784,
493
- "step": 69
494
- },
495
- {
496
- "epoch": 0.37433155080213903,
497
- "grad_norm": 1.322706937789917,
498
- "learning_rate": 4.983222070808255e-06,
499
- "loss": 0.6633,
500
- "step": 70
501
- },
502
- {
503
- "epoch": 0.37967914438502676,
504
- "grad_norm": 1.806099534034729,
505
- "learning_rate": 4.982732794633588e-06,
506
- "loss": 1.0887,
507
- "step": 71
508
- },
509
- {
510
- "epoch": 0.3850267379679144,
511
- "grad_norm": 1.2431350946426392,
512
- "learning_rate": 4.982236511228301e-06,
513
- "loss": 0.8154,
514
- "step": 72
515
- },
516
- {
517
- "epoch": 0.39037433155080214,
518
- "grad_norm": 2.1100635528564453,
519
- "learning_rate": 4.981733221993099e-06,
520
- "loss": 1.2385,
521
- "step": 73
522
- },
523
- {
524
- "epoch": 0.39572192513368987,
525
- "grad_norm": 2.499673843383789,
526
- "learning_rate": 4.981222928348456e-06,
527
- "loss": 1.0381,
528
- "step": 74
529
- },
530
- {
531
- "epoch": 0.40106951871657753,
532
- "grad_norm": 1.7459089756011963,
533
- "learning_rate": 4.98070563173462e-06,
534
- "loss": 0.9279,
535
- "step": 75
536
- },
537
- {
538
- "epoch": 0.40641711229946526,
539
- "grad_norm": 1.6326146125793457,
540
- "learning_rate": 4.980181333611601e-06,
541
- "loss": 0.7559,
542
- "step": 76
543
- },
544
- {
545
- "epoch": 0.4117647058823529,
546
- "grad_norm": 1.2402805089950562,
547
- "learning_rate": 4.979650035459171e-06,
548
- "loss": 0.7301,
549
- "step": 77
550
- },
551
- {
552
- "epoch": 0.41711229946524064,
553
- "grad_norm": 1.5247249603271484,
554
- "learning_rate": 4.9791117387768575e-06,
555
- "loss": 1.1018,
556
- "step": 78
557
- },
558
- {
559
- "epoch": 0.42245989304812837,
560
- "grad_norm": 1.19709312915802,
561
- "learning_rate": 4.978566445083942e-06,
562
- "loss": 0.6179,
563
- "step": 79
564
- },
565
- {
566
- "epoch": 0.42780748663101603,
567
- "grad_norm": 1.3535789251327515,
568
- "learning_rate": 4.978014155919455e-06,
569
- "loss": 0.734,
570
- "step": 80
571
- },
572
- {
573
- "epoch": 0.43315508021390375,
574
- "grad_norm": 1.3790255784988403,
575
- "learning_rate": 4.977454872842169e-06,
576
- "loss": 0.7967,
577
- "step": 81
578
- },
579
- {
580
- "epoch": 0.4385026737967914,
581
- "grad_norm": 1.6345816850662231,
582
- "learning_rate": 4.976888597430597e-06,
583
- "loss": 1.0332,
584
- "step": 82
585
- },
586
- {
587
- "epoch": 0.44385026737967914,
588
- "grad_norm": 1.5695714950561523,
589
- "learning_rate": 4.976315331282985e-06,
590
- "loss": 0.9266,
591
- "step": 83
592
- },
593
- {
594
- "epoch": 0.44919786096256686,
595
- "grad_norm": 1.2244231700897217,
596
- "learning_rate": 4.9757350760173144e-06,
597
- "loss": 0.7738,
598
- "step": 84
599
- },
600
- {
601
- "epoch": 0.45454545454545453,
602
- "grad_norm": 1.674436330795288,
603
- "learning_rate": 4.975147833271288e-06,
604
- "loss": 1.0436,
605
- "step": 85
606
- },
607
- {
608
- "epoch": 0.45989304812834225,
609
- "grad_norm": 1.718598484992981,
610
- "learning_rate": 4.974553604702332e-06,
611
- "loss": 0.7659,
612
- "step": 86
613
- },
614
- {
615
- "epoch": 0.46524064171123,
616
- "grad_norm": 1.2509411573410034,
617
- "learning_rate": 4.973952391987589e-06,
618
- "loss": 0.8631,
619
- "step": 87
620
- },
621
- {
622
- "epoch": 0.47058823529411764,
623
- "grad_norm": 1.9576022624969482,
624
- "learning_rate": 4.9733441968239125e-06,
625
- "loss": 1.1419,
626
- "step": 88
627
- },
628
- {
629
- "epoch": 0.47593582887700536,
630
- "grad_norm": 1.14915132522583,
631
- "learning_rate": 4.972729020927866e-06,
632
- "loss": 0.6647,
633
- "step": 89
634
- },
635
- {
636
- "epoch": 0.48128342245989303,
637
- "grad_norm": 1.0880329608917236,
638
- "learning_rate": 4.97210686603571e-06,
639
- "loss": 0.8533,
640
- "step": 90
641
- },
642
- {
643
- "epoch": 0.48663101604278075,
644
- "grad_norm": 1.60923171043396,
645
- "learning_rate": 4.97147773390341e-06,
646
- "loss": 0.7872,
647
- "step": 91
648
- },
649
- {
650
- "epoch": 0.4919786096256685,
651
- "grad_norm": 2.191762685775757,
652
- "learning_rate": 4.970841626306617e-06,
653
- "loss": 0.8983,
654
- "step": 92
655
- },
656
- {
657
- "epoch": 0.49732620320855614,
658
- "grad_norm": 1.805025577545166,
659
- "learning_rate": 4.970198545040673e-06,
660
- "loss": 1.0416,
661
- "step": 93
662
- },
663
- {
664
- "epoch": 0.5026737967914439,
665
- "grad_norm": 1.670198917388916,
666
- "learning_rate": 4.969548491920603e-06,
667
- "loss": 0.9088,
668
- "step": 94
669
- },
670
- {
671
- "epoch": 0.5080213903743316,
672
- "grad_norm": 1.5051180124282837,
673
- "learning_rate": 4.968891468781105e-06,
674
- "loss": 0.9928,
675
- "step": 95
676
- },
677
- {
678
- "epoch": 0.5133689839572193,
679
- "grad_norm": 1.380786418914795,
680
- "learning_rate": 4.968227477476554e-06,
681
- "loss": 0.8154,
682
- "step": 96
683
- },
684
- {
685
- "epoch": 0.5187165775401069,
686
- "grad_norm": 1.744243860244751,
687
- "learning_rate": 4.9675565198809905e-06,
688
- "loss": 1.1196,
689
- "step": 97
690
- },
691
- {
692
- "epoch": 0.5240641711229946,
693
- "grad_norm": 3.3793137073516846,
694
- "learning_rate": 4.966878597888114e-06,
695
- "loss": 0.966,
696
- "step": 98
697
- },
698
- {
699
- "epoch": 0.5294117647058824,
700
- "grad_norm": 1.2802485227584839,
701
- "learning_rate": 4.966193713411284e-06,
702
- "loss": 0.6863,
703
- "step": 99
704
- },
705
- {
706
- "epoch": 0.5347593582887701,
707
- "grad_norm": 1.1910849809646606,
708
- "learning_rate": 4.965501868383507e-06,
709
- "loss": 0.6748,
710
- "step": 100
711
- },
712
- {
713
- "epoch": 0.5401069518716578,
714
- "grad_norm": 2.020167827606201,
715
- "learning_rate": 4.964803064757438e-06,
716
- "loss": 0.9697,
717
- "step": 101
718
- },
719
- {
720
- "epoch": 0.5454545454545454,
721
- "grad_norm": 1.1739224195480347,
722
- "learning_rate": 4.964097304505371e-06,
723
- "loss": 0.7805,
724
- "step": 102
725
- },
726
- {
727
- "epoch": 0.5508021390374331,
728
- "grad_norm": 1.1704705953598022,
729
- "learning_rate": 4.963384589619233e-06,
730
- "loss": 0.6536,
731
- "step": 103
732
- },
733
- {
734
- "epoch": 0.5561497326203209,
735
- "grad_norm": 1.3174995183944702,
736
- "learning_rate": 4.962664922110581e-06,
737
- "loss": 0.8689,
738
- "step": 104
739
- },
740
- {
741
- "epoch": 0.5614973262032086,
742
- "grad_norm": 1.2126598358154297,
743
- "learning_rate": 4.9619383040105954e-06,
744
- "loss": 0.9955,
745
- "step": 105
746
- },
747
- {
748
- "epoch": 0.5668449197860963,
749
- "grad_norm": 1.365536093711853,
750
- "learning_rate": 4.961204737370071e-06,
751
- "loss": 0.9104,
752
- "step": 106
753
- },
754
- {
755
- "epoch": 0.5721925133689839,
756
- "grad_norm": 1.4193490743637085,
757
- "learning_rate": 4.960464224259418e-06,
758
- "loss": 1.1661,
759
- "step": 107
760
- },
761
- {
762
- "epoch": 0.5775401069518716,
763
- "grad_norm": 1.108224868774414,
764
- "learning_rate": 4.95971676676865e-06,
765
- "loss": 0.5704,
766
- "step": 108
767
- },
768
- {
769
- "epoch": 0.5828877005347594,
770
- "grad_norm": 1.0754598379135132,
771
- "learning_rate": 4.958962367007381e-06,
772
- "loss": 0.8837,
773
- "step": 109
774
- },
775
- {
776
- "epoch": 0.5882352941176471,
777
- "grad_norm": 1.194149374961853,
778
- "learning_rate": 4.958201027104818e-06,
779
- "loss": 0.7352,
780
- "step": 110
781
- },
782
- {
783
- "epoch": 0.5935828877005348,
784
- "grad_norm": 3.193861246109009,
785
- "learning_rate": 4.957432749209755e-06,
786
- "loss": 0.6904,
787
- "step": 111
788
- },
789
- {
790
- "epoch": 0.5989304812834224,
791
- "grad_norm": 1.7174736261367798,
792
- "learning_rate": 4.95665753549057e-06,
793
- "loss": 0.8564,
794
- "step": 112
795
- },
796
- {
797
- "epoch": 0.6042780748663101,
798
- "grad_norm": 1.452724814414978,
799
- "learning_rate": 4.9558753881352165e-06,
800
- "loss": 1.2627,
801
- "step": 113
802
- },
803
- {
804
- "epoch": 0.6096256684491979,
805
- "grad_norm": 1.489687442779541,
806
- "learning_rate": 4.955086309351213e-06,
807
- "loss": 1.0371,
808
- "step": 114
809
- },
810
- {
811
- "epoch": 0.6149732620320856,
812
- "grad_norm": 1.0586612224578857,
813
- "learning_rate": 4.9542903013656485e-06,
814
- "loss": 0.5672,
815
- "step": 115
816
- },
817
- {
818
- "epoch": 0.6203208556149733,
819
- "grad_norm": 1.2536990642547607,
820
- "learning_rate": 4.953487366425163e-06,
821
- "loss": 0.7125,
822
- "step": 116
823
- },
824
- {
825
- "epoch": 0.6256684491978609,
826
- "grad_norm": 1.1650030612945557,
827
- "learning_rate": 4.952677506795949e-06,
828
- "loss": 0.5989,
829
- "step": 117
830
- },
831
- {
832
- "epoch": 0.6310160427807486,
833
- "grad_norm": 1.286164402961731,
834
- "learning_rate": 4.951860724763743e-06,
835
- "loss": 0.7466,
836
- "step": 118
837
- },
838
- {
839
- "epoch": 0.6363636363636364,
840
- "grad_norm": 1.132703423500061,
841
- "learning_rate": 4.95103702263382e-06,
842
- "loss": 0.7379,
843
- "step": 119
844
- },
845
- {
846
- "epoch": 0.6417112299465241,
847
- "grad_norm": 1.340989589691162,
848
- "learning_rate": 4.950206402730984e-06,
849
- "loss": 0.7781,
850
- "step": 120
851
- },
852
- {
853
- "epoch": 0.6470588235294118,
854
- "grad_norm": 1.0583947896957397,
855
- "learning_rate": 4.949368867399567e-06,
856
- "loss": 0.5383,
857
- "step": 121
858
- },
859
- {
860
- "epoch": 0.6524064171122995,
861
- "grad_norm": 1.2740116119384766,
862
- "learning_rate": 4.948524419003415e-06,
863
- "loss": 1.185,
864
- "step": 122
865
- },
866
- {
867
- "epoch": 0.6577540106951871,
868
- "grad_norm": 1.3854238986968994,
869
- "learning_rate": 4.947673059925889e-06,
870
- "loss": 0.8494,
871
- "step": 123
872
- },
873
- {
874
- "epoch": 0.6631016042780749,
875
- "grad_norm": 1.0074819326400757,
876
- "learning_rate": 4.9468147925698525e-06,
877
- "loss": 0.8941,
878
- "step": 124
879
- },
880
- {
881
- "epoch": 0.6684491978609626,
882
- "grad_norm": 1.1346782445907593,
883
- "learning_rate": 4.945949619357668e-06,
884
- "loss": 0.6798,
885
- "step": 125
886
- },
887
- {
888
- "epoch": 0.6737967914438503,
889
- "grad_norm": 1.1151247024536133,
890
- "learning_rate": 4.945077542731188e-06,
891
- "loss": 0.5321,
892
- "step": 126
893
- },
894
- {
895
- "epoch": 0.679144385026738,
896
- "grad_norm": 1.3562278747558594,
897
- "learning_rate": 4.94419856515175e-06,
898
- "loss": 0.8688,
899
- "step": 127
900
- },
901
- {
902
- "epoch": 0.6844919786096256,
903
- "grad_norm": 1.1577609777450562,
904
- "learning_rate": 4.943312689100166e-06,
905
- "loss": 0.8504,
906
- "step": 128
907
- },
908
- {
909
- "epoch": 0.6898395721925134,
910
- "grad_norm": 1.0710453987121582,
911
- "learning_rate": 4.942419917076723e-06,
912
- "loss": 0.6366,
913
- "step": 129
914
- },
915
- {
916
- "epoch": 0.6951871657754011,
917
- "grad_norm": 1.153254508972168,
918
- "learning_rate": 4.941520251601167e-06,
919
- "loss": 0.7544,
920
- "step": 130
921
- },
922
- {
923
- "epoch": 0.7005347593582888,
924
- "grad_norm": 0.9147224426269531,
925
- "learning_rate": 4.940613695212702e-06,
926
- "loss": 0.4771,
927
- "step": 131
928
- },
929
- {
930
- "epoch": 0.7058823529411765,
931
- "grad_norm": 1.7819873094558716,
932
- "learning_rate": 4.939700250469979e-06,
933
- "loss": 1.0403,
934
- "step": 132
935
- },
936
- {
937
- "epoch": 0.7112299465240641,
938
- "grad_norm": 1.1828848123550415,
939
- "learning_rate": 4.938779919951092e-06,
940
- "loss": 0.8482,
941
- "step": 133
942
- },
943
- {
944
- "epoch": 0.7165775401069518,
945
- "grad_norm": 1.1376489400863647,
946
- "learning_rate": 4.93785270625357e-06,
947
- "loss": 0.5515,
948
- "step": 134
949
- },
950
- {
951
- "epoch": 0.7219251336898396,
952
- "grad_norm": 1.601025938987732,
953
- "learning_rate": 4.936918611994368e-06,
954
- "loss": 0.706,
955
- "step": 135
956
- },
957
- {
958
- "epoch": 0.7272727272727273,
959
- "grad_norm": 1.2240617275238037,
960
- "learning_rate": 4.935977639809861e-06,
961
- "loss": 0.8308,
962
- "step": 136
963
- },
964
- {
965
- "epoch": 0.732620320855615,
966
- "grad_norm": 1.088484287261963,
967
- "learning_rate": 4.935029792355834e-06,
968
- "loss": 0.642,
969
- "step": 137
970
- },
971
- {
972
- "epoch": 0.7379679144385026,
973
- "grad_norm": 1.3206232786178589,
974
- "learning_rate": 4.934075072307481e-06,
975
- "loss": 1.0115,
976
- "step": 138
977
- },
978
- {
979
- "epoch": 0.7433155080213903,
980
- "grad_norm": 1.1618086099624634,
981
- "learning_rate": 4.933113482359388e-06,
982
- "loss": 0.5455,
983
- "step": 139
984
- },
985
- {
986
- "epoch": 0.7486631016042781,
987
- "grad_norm": 1.2013949155807495,
988
- "learning_rate": 4.932145025225535e-06,
989
- "loss": 0.6958,
990
- "step": 140
991
- },
992
- {
993
- "epoch": 0.7540106951871658,
994
- "grad_norm": 1.3020150661468506,
995
- "learning_rate": 4.931169703639282e-06,
996
- "loss": 0.8664,
997
- "step": 141
998
- },
999
- {
1000
- "epoch": 0.7593582887700535,
1001
- "grad_norm": 1.3776401281356812,
1002
- "learning_rate": 4.930187520353363e-06,
1003
- "loss": 0.7594,
1004
- "step": 142
1005
- },
1006
- {
1007
- "epoch": 0.7647058823529411,
1008
- "grad_norm": 1.0648787021636963,
1009
- "learning_rate": 4.929198478139877e-06,
1010
- "loss": 0.6382,
1011
- "step": 143
1012
- },
1013
- {
1014
- "epoch": 0.7700534759358288,
1015
- "grad_norm": 1.1864025592803955,
1016
- "learning_rate": 4.928202579790285e-06,
1017
- "loss": 0.5924,
1018
- "step": 144
1019
- },
1020
- {
1021
- "epoch": 0.7754010695187166,
1022
- "grad_norm": 1.1243900060653687,
1023
- "learning_rate": 4.927199828115395e-06,
1024
- "loss": 0.7163,
1025
- "step": 145
1026
- },
1027
- {
1028
- "epoch": 0.7807486631016043,
1029
- "grad_norm": 1.2532908916473389,
1030
- "learning_rate": 4.9261902259453616e-06,
1031
- "loss": 0.8453,
1032
- "step": 146
1033
- },
1034
- {
1035
- "epoch": 0.786096256684492,
1036
- "grad_norm": 1.3941049575805664,
1037
- "learning_rate": 4.925173776129669e-06,
1038
- "loss": 1.0382,
1039
- "step": 147
1040
- },
1041
- {
1042
- "epoch": 0.7914438502673797,
1043
- "grad_norm": 0.9239159822463989,
1044
- "learning_rate": 4.9241504815371346e-06,
1045
- "loss": 0.4883,
1046
- "step": 148
1047
- },
1048
- {
1049
- "epoch": 0.7967914438502673,
1050
- "grad_norm": 1.1004669666290283,
1051
- "learning_rate": 4.923120345055887e-06,
1052
- "loss": 0.7326,
1053
- "step": 149
1054
- },
1055
- {
1056
- "epoch": 0.8021390374331551,
1057
- "grad_norm": 1.2339757680892944,
1058
- "learning_rate": 4.922083369593372e-06,
1059
- "loss": 0.6372,
1060
- "step": 150
1061
- },
1062
- {
1063
- "epoch": 0.8074866310160428,
1064
- "grad_norm": 1.3842638731002808,
1065
- "learning_rate": 4.921039558076335e-06,
1066
- "loss": 0.9323,
1067
- "step": 151
1068
- },
1069
- {
1070
- "epoch": 0.8128342245989305,
1071
- "grad_norm": 1.7399688959121704,
1072
- "learning_rate": 4.919988913450812e-06,
1073
- "loss": 0.4532,
1074
- "step": 152
1075
- },
1076
- {
1077
- "epoch": 0.8181818181818182,
1078
- "grad_norm": 1.526694893836975,
1079
- "learning_rate": 4.918931438682132e-06,
1080
- "loss": 0.8714,
1081
- "step": 153
1082
- },
1083
- {
1084
- "epoch": 0.8235294117647058,
1085
- "grad_norm": 1.208390712738037,
1086
- "learning_rate": 4.917867136754894e-06,
1087
- "loss": 0.8822,
1088
- "step": 154
1089
- },
1090
- {
1091
- "epoch": 0.8288770053475936,
1092
- "grad_norm": 1.0740225315093994,
1093
- "learning_rate": 4.916796010672969e-06,
1094
- "loss": 0.7539,
1095
- "step": 155
1096
- },
1097
- {
1098
- "epoch": 0.8342245989304813,
1099
- "grad_norm": 1.097008228302002,
1100
- "learning_rate": 4.91571806345949e-06,
1101
- "loss": 0.7797,
1102
- "step": 156
1103
- },
1104
- {
1105
- "epoch": 0.839572192513369,
1106
- "grad_norm": 1.231980800628662,
1107
- "learning_rate": 4.91463329815684e-06,
1108
- "loss": 0.8074,
1109
- "step": 157
1110
- },
1111
- {
1112
- "epoch": 0.8449197860962567,
1113
- "grad_norm": 1.1179982423782349,
1114
- "learning_rate": 4.913541717826645e-06,
1115
- "loss": 0.5812,
1116
- "step": 158
1117
- },
1118
- {
1119
- "epoch": 0.8502673796791443,
1120
- "grad_norm": 0.9882096648216248,
1121
- "learning_rate": 4.912443325549767e-06,
1122
- "loss": 0.4967,
1123
- "step": 159
1124
- },
1125
- {
1126
- "epoch": 0.8556149732620321,
1127
- "grad_norm": 1.3861775398254395,
1128
- "learning_rate": 4.911338124426291e-06,
1129
- "loss": 0.7436,
1130
- "step": 160
1131
- },
1132
- {
1133
- "epoch": 0.8609625668449198,
1134
- "grad_norm": 1.204852819442749,
1135
- "learning_rate": 4.910226117575525e-06,
1136
- "loss": 0.8118,
1137
- "step": 161
1138
- },
1139
- {
1140
- "epoch": 0.8663101604278075,
1141
- "grad_norm": 0.9527103304862976,
1142
- "learning_rate": 4.909107308135978e-06,
1143
- "loss": 0.5164,
1144
- "step": 162
1145
- },
1146
- {
1147
- "epoch": 0.8716577540106952,
1148
- "grad_norm": 1.0612897872924805,
1149
- "learning_rate": 4.907981699265364e-06,
1150
- "loss": 0.5894,
1151
- "step": 163
1152
- },
1153
- {
1154
- "epoch": 0.8770053475935828,
1155
- "grad_norm": 1.610545039176941,
1156
- "learning_rate": 4.906849294140587e-06,
1157
- "loss": 0.8476,
1158
- "step": 164
1159
- },
1160
- {
1161
- "epoch": 0.8823529411764706,
1162
- "grad_norm": 1.483162760734558,
1163
- "learning_rate": 4.9057100959577285e-06,
1164
- "loss": 0.6834,
1165
- "step": 165
1166
- },
1167
- {
1168
- "epoch": 0.8877005347593583,
1169
- "grad_norm": 1.2938721179962158,
1170
- "learning_rate": 4.904564107932048e-06,
1171
- "loss": 0.944,
1172
- "step": 166
1173
- },
1174
- {
1175
- "epoch": 0.893048128342246,
1176
- "grad_norm": 1.2777775526046753,
1177
- "learning_rate": 4.903411333297966e-06,
1178
- "loss": 0.8709,
1179
- "step": 167
1180
- },
1181
- {
1182
- "epoch": 0.8983957219251337,
1183
- "grad_norm": 1.1637930870056152,
1184
- "learning_rate": 4.902251775309057e-06,
1185
- "loss": 0.7839,
1186
- "step": 168
1187
- },
1188
- {
1189
- "epoch": 0.9037433155080213,
1190
- "grad_norm": 2.035766363143921,
1191
- "learning_rate": 4.901085437238041e-06,
1192
- "loss": 0.5654,
1193
- "step": 169
1194
- },
1195
- {
1196
- "epoch": 0.9090909090909091,
1197
- "grad_norm": 1.155563473701477,
1198
- "learning_rate": 4.899912322376776e-06,
1199
- "loss": 0.9413,
1200
- "step": 170
1201
- },
1202
- {
1203
- "epoch": 0.9144385026737968,
1204
- "grad_norm": 1.466346263885498,
1205
- "learning_rate": 4.8987324340362445e-06,
1206
- "loss": 0.8642,
1207
- "step": 171
1208
- },
1209
- {
1210
- "epoch": 0.9197860962566845,
1211
- "grad_norm": 1.1183879375457764,
1212
- "learning_rate": 4.897545775546545e-06,
1213
- "loss": 0.7851,
1214
- "step": 172
1215
- },
1216
- {
1217
- "epoch": 0.9251336898395722,
1218
- "grad_norm": 1.460421085357666,
1219
- "learning_rate": 4.8963523502568886e-06,
1220
- "loss": 1.0241,
1221
- "step": 173
1222
- },
1223
- {
1224
- "epoch": 0.93048128342246,
1225
- "grad_norm": 1.4027538299560547,
1226
- "learning_rate": 4.895152161535582e-06,
1227
- "loss": 0.7254,
1228
- "step": 174
1229
- },
1230
- {
1231
- "epoch": 0.9358288770053476,
1232
- "grad_norm": 1.183846116065979,
1233
- "learning_rate": 4.893945212770019e-06,
1234
- "loss": 0.6877,
1235
- "step": 175
1236
- },
1237
- {
1238
- "epoch": 0.9411764705882353,
1239
- "grad_norm": 1.288653016090393,
1240
- "learning_rate": 4.892731507366678e-06,
1241
- "loss": 0.8022,
1242
- "step": 176
1243
- },
1244
- {
1245
- "epoch": 0.946524064171123,
1246
- "grad_norm": 1.063643455505371,
1247
- "learning_rate": 4.891511048751102e-06,
1248
- "loss": 0.7123,
1249
- "step": 177
1250
- },
1251
- {
1252
- "epoch": 0.9518716577540107,
1253
- "grad_norm": 1.2285932302474976,
1254
- "learning_rate": 4.890283840367898e-06,
1255
- "loss": 1.1568,
1256
- "step": 178
1257
- },
1258
- {
1259
- "epoch": 0.9572192513368984,
1260
- "grad_norm": 1.3358500003814697,
1261
- "learning_rate": 4.889049885680721e-06,
1262
- "loss": 0.7538,
1263
- "step": 179
1264
- },
1265
- {
1266
- "epoch": 0.9625668449197861,
1267
- "grad_norm": 1.2650320529937744,
1268
- "learning_rate": 4.887809188172268e-06,
1269
- "loss": 0.683,
1270
- "step": 180
1271
- },
1272
- {
1273
- "epoch": 0.9679144385026738,
1274
- "grad_norm": 1.1596193313598633,
1275
- "learning_rate": 4.886561751344266e-06,
1276
- "loss": 0.7824,
1277
- "step": 181
1278
- },
1279
- {
1280
- "epoch": 0.9732620320855615,
1281
- "grad_norm": 1.2235304117202759,
1282
- "learning_rate": 4.885307578717464e-06,
1283
- "loss": 0.7969,
1284
- "step": 182
1285
- },
1286
- {
1287
- "epoch": 0.9786096256684492,
1288
- "grad_norm": 1.364279866218567,
1289
- "learning_rate": 4.8840466738316216e-06,
1290
- "loss": 0.8376,
1291
- "step": 183
1292
- },
1293
- {
1294
- "epoch": 0.983957219251337,
1295
- "grad_norm": 1.3247216939926147,
1296
- "learning_rate": 4.882779040245499e-06,
1297
- "loss": 0.7356,
1298
- "step": 184
1299
- },
1300
- {
1301
- "epoch": 0.9893048128342246,
1302
- "grad_norm": 1.0848944187164307,
1303
- "learning_rate": 4.881504681536847e-06,
1304
- "loss": 0.5837,
1305
- "step": 185
1306
- },
1307
- {
1308
- "epoch": 0.9946524064171123,
1309
- "grad_norm": 1.0679181814193726,
1310
- "learning_rate": 4.880223601302398e-06,
1311
- "loss": 0.5883,
1312
- "step": 186
1313
- },
1314
- {
1315
- "epoch": 1.0,
1316
- "grad_norm": 1.0577597618103027,
1317
- "learning_rate": 4.878935803157856e-06,
1318
- "loss": 0.5789,
1319
- "step": 187
1320
- }
1321
- ],
1322
- "logging_steps": 1,
1323
- "max_steps": 1870,
1324
- "num_input_tokens_seen": 0,
1325
- "num_train_epochs": 10,
1326
- "save_steps": 208,
1327
- "stateful_callbacks": {
1328
- "TrainerControl": {
1329
- "args": {
1330
- "should_epoch_stop": false,
1331
- "should_evaluate": false,
1332
- "should_log": false,
1333
- "should_save": true,
1334
- "should_training_stop": false
1335
- },
1336
- "attributes": {}
1337
- }
1338
- },
1339
- "total_flos": 5.12625839397929e+16,
1340
- "train_batch_size": 1,
1341
- "trial_name": null,
1342
- "trial_params": null
1343
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
limo_filtered_combined/checkpoint-187/vocab.json DELETED
The diff for this file is too large to render. See raw diff