erfanzar commited on
Commit
530b664
·
verified ·
1 Parent(s): 9ebd5cb

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +0 -0
  2. README.md +159 -0
  3. chat_template.jinja +86 -0
  4. checkpoint_metadata.json +8 -0
  5. config.json +201 -0
  6. generation_config.json +11 -0
  7. model/lm_head/kernel/.zarray +1 -0
  8. model/lm_head/kernel/0.0 +3 -0
  9. model/lm_head/kernel/0.1 +3 -0
  10. model/lm_head/kernel/0.2 +3 -0
  11. model/lm_head/kernel/0.3 +3 -0
  12. model/model/embed_tokens/embedding/.zarray +1 -0
  13. model/model/embed_tokens/embedding/0.0 +3 -0
  14. model/model/embed_tokens/embedding/0.1 +3 -0
  15. model/model/embed_tokens/embedding/0.2 +3 -0
  16. model/model/embed_tokens/embedding/0.3 +3 -0
  17. model/model/layers/0/input_layernorm/kernel/.zarray +1 -0
  18. model/model/layers/0/input_layernorm/kernel/0 +0 -0
  19. model/model/layers/0/mlp/down_proj/kernel/.zarray +1 -0
  20. model/model/layers/0/mlp/down_proj/kernel/0.0 +3 -0
  21. model/model/layers/0/mlp/down_proj/kernel/1.0 +3 -0
  22. model/model/layers/0/mlp/down_proj/kernel/2.0 +3 -0
  23. model/model/layers/0/mlp/down_proj/kernel/3.0 +3 -0
  24. model/model/layers/0/mlp/gate_proj/kernel/.zarray +1 -0
  25. model/model/layers/0/mlp/gate_proj/kernel/0.0 +3 -0
  26. model/model/layers/0/mlp/gate_proj/kernel/0.1 +3 -0
  27. model/model/layers/0/mlp/gate_proj/kernel/0.2 +3 -0
  28. model/model/layers/0/mlp/gate_proj/kernel/0.3 +3 -0
  29. model/model/layers/0/mlp/up_proj/kernel/.zarray +1 -0
  30. model/model/layers/0/mlp/up_proj/kernel/0.0 +3 -0
  31. model/model/layers/0/mlp/up_proj/kernel/0.1 +3 -0
  32. model/model/layers/0/mlp/up_proj/kernel/0.2 +3 -0
  33. model/model/layers/0/mlp/up_proj/kernel/0.3 +3 -0
  34. model/model/layers/0/post_attention_layernorm/kernel/.zarray +1 -0
  35. model/model/layers/0/post_attention_layernorm/kernel/0 +0 -0
  36. model/model/layers/0/self_attn/kv_a_layernorm/kernel/.zarray +1 -0
  37. model/model/layers/0/self_attn/kv_a_layernorm/kernel/0 +0 -0
  38. model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/.zarray +1 -0
  39. model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.0 +3 -0
  40. model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.1 +3 -0
  41. model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.2 +3 -0
  42. model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.3 +3 -0
  43. model/model/layers/0/self_attn/kv_b_proj/kernel/.zarray +1 -0
  44. model/model/layers/0/self_attn/kv_b_proj/kernel/0.0 +3 -0
  45. model/model/layers/0/self_attn/kv_b_proj/kernel/0.1 +3 -0
  46. model/model/layers/0/self_attn/kv_b_proj/kernel/0.2 +3 -0
  47. model/model/layers/0/self_attn/kv_b_proj/kernel/0.3 +3 -0
  48. model/model/layers/0/self_attn/o_proj/kernel/.zarray +1 -0
  49. model/model/layers/0/self_attn/o_proj/kernel/0.0 +3 -0
  50. model/model/layers/0/self_attn/o_proj/kernel/1.0 +3 -0
.gitattributes CHANGED
The diff for this file is too large to render. See raw diff
 
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: easydel
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - easydel
6
+ - jax
7
+ - "glm4_moe_lite"
8
+ - "CausalLM"
9
+ - "ragged_page_attention_v3"
10
+ ---
11
+
12
+ <p align="center">
13
+ <img alt="EasyDeL" src="https://raw.githubusercontent.com/erfanzar/easydel/main/images/easydel-logo-with-text.png" height="80">
14
+ </p>
15
+
16
+ <h1 align="center">GLM-4.7Flash</h1>
17
+
18
+ <div align="center">
19
+ A model compatible with the EasyDeL JAX stack.
20
+ </div>
21
+
22
+ ## Overview
23
+
24
+ This checkpoint is intended to be loaded with EasyDeL on JAX (CPU/GPU/TPU). It supports sharded loading with `auto_shard_model=True` and configurable precision via `dtype`, `param_dtype`, and `precision`.
25
+
26
+ ## Quickstart
27
+
28
+ ```python
29
+ import easydel as ed
30
+ from jax import numpy as jnp, lax
31
+
32
+ repo_id = "GLM-4.7Flash"
33
+
34
+ dtype = jnp.bfloat16 # try jnp.float16 on many GPUs
35
+
36
+ model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
37
+ repo_id,
38
+ dtype=dtype,
39
+ param_dtype=dtype,
40
+ precision=lax.Precision("fastest"),
41
+ sharding_axis_names=("dp", "fsdp", "ep", "tp", "sp"),
42
+ sharding_axis_dims=(1, -1, 1, 1, 1),
43
+ config_kwargs=ed.EasyDeLBaseConfigDict(
44
+ attn_dtype=dtype,
45
+ attn_mechanism=ed.AttentionMechanisms.RAGGED_PAGE_ATTENTION_V3,
46
+ fsdp_is_ep_bound=True,
47
+ sp_is_ep_bound=True,
48
+ moe_method=ed.MoEMethods.FUSED_MOE,
49
+ ),
50
+ auto_shard_model=True,
51
+ partition_axis=ed.PartitionAxis(),
52
+ )
53
+ ```
54
+
55
+ If the repository only provides PyTorch weights, pass `from_torch=True` to `from_pretrained(...)`.
56
+
57
+ ## Sharding & Parallelism (Multi-Device)
58
+
59
+ EasyDeL can scale to multiple devices by creating a logical device mesh. Most EasyDeL loaders use a 5D mesh:
60
+
61
+ - `dp`: data parallel (replicated parameters, different batch shards)
62
+ - `fsdp`: parameter sharding (memory saver; often the biggest axis)
63
+ - `ep`: expert parallel (MoE; keep `1` for non-MoE models)
64
+ - `tp`: tensor parallel (splits large matmuls)
65
+ - `sp`: sequence parallel (splits sequence dimension)
66
+
67
+ Use `sharding_axis_names=("dp","fsdp","ep","tp","sp")` and choose `sharding_axis_dims` so that their product equals your device count.
68
+ You can use `-1` in `sharding_axis_dims` to let EasyDeL infer the remaining dimension.
69
+
70
+ <details>
71
+ <summary>Example sharding configs</summary>
72
+
73
+ ```python
74
+ # 8 devices, pure FSDP
75
+ sharding_axis_dims = (1, 8, 1, 1, 1)
76
+
77
+ # 8 devices, 2-way DP x 4-way FSDP
78
+ sharding_axis_dims = (2, 4, 1, 1, 1)
79
+
80
+ # 8 devices, 4-way FSDP x 2-way TP
81
+ sharding_axis_dims = (1, 4, 1, 2, 1)
82
+ ```
83
+ </details>
84
+
85
+ ## Using via `eLargeModel` (ELM)
86
+
87
+ `eLargeModel` is a higher-level interface that wires together loading, sharding, training, and eSurge inference from a single config.
88
+
89
+ ```python
90
+ from easydel import eLargeModel
91
+
92
+ repo_id = "GLM-4.7Flash"
93
+
94
+ elm = eLargeModel.from_pretrained(repo_id) # task is auto-detected
95
+ elm.set_dtype("bf16")
96
+ elm.set_sharding(axis_names=("dp", "fsdp", "ep", "tp", "sp"), axis_dims=(1, -1, 1, 1, 1))
97
+
98
+ model = elm.build_model()
99
+ # Optional: build an inference engine
100
+ # engine = elm.build_esurge()
101
+ ```
102
+
103
+ <details>
104
+ <summary>ELM YAML config example</summary>
105
+
106
+ ```yaml
107
+ model:
108
+ name_or_path: "GLM-4.7Flash"
109
+
110
+ loader:
111
+ dtype: bf16
112
+ param_dtype: bf16
113
+
114
+ sharding:
115
+ axis_dims: [1, -1, 1, 1, 1]
116
+ auto_shard_model: true
117
+ ```
118
+ </details>
119
+
120
+ ## Features
121
+
122
+ **EasyDeL:**
123
+ - JAX native implementation and sharded execution
124
+ - Configurable attention backends via `AttentionMechanisms.*`
125
+ - Precision control via `dtype`, `param_dtype`, and `precision`
126
+
127
+ ## Installation
128
+
129
+ ```bash
130
+ pip install easydel
131
+ ```
132
+
133
+ ## Links
134
+
135
+ - EasyDeL GitHub: https://github.com/erfanzar/EasyDeL
136
+ - Docs: https://easydel.readthedocs.io/en/latest/
137
+
138
+ ## Supported Tasks
139
+
140
+ - CausalLM
141
+
142
+ ## Limitations
143
+
144
+ - Refer to the original model card for training data, evaluation, and intended use.
145
+
146
+ ## License
147
+
148
+ EasyDeL is released under the Apache-2.0 license. The license for this model's weights may differ; please consult the original repository.
149
+
150
+ ## Citation
151
+
152
+ ```bibtex
153
+ @misc{Zare Chavoshi_2023,
154
+ title={EasyDeL: An open-source library for enhancing and streamlining the training process of machine learning models},
155
+ url={https://github.com/erfanzar/EasyDeL},
156
+ author={Zare Chavoshi, Erfan},
157
+ year={2023}
158
+ }
159
+ ```
chat_template.jinja ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [gMASK]<sop>
2
+ {%- if tools -%}
3
+ <|system|>
4
+ # Tools
5
+
6
+ You may call one or more functions to assist with the user query.
7
+
8
+ You are provided with function signatures within <tools></tools> XML tags:
9
+ <tools>
10
+ {% for tool in tools %}
11
+ {{ tool | tojson(ensure_ascii=False) }}
12
+ {% endfor %}
13
+ </tools>
14
+
15
+ For each function call, output the function name and arguments within the following XML format:
16
+ <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
17
+ {%- macro visible_text(content) -%}
18
+ {%- if content is string -%}
19
+ {{- content }}
20
+ {%- elif content is iterable and content is not mapping -%}
21
+ {%- for item in content -%}
22
+ {%- if item is mapping and item.type == 'text' -%}
23
+ {{- item.text }}
24
+ {%- elif item is string -%}
25
+ {{- item }}
26
+ {%- endif -%}
27
+ {%- endfor -%}
28
+ {%- else -%}
29
+ {{- content }}
30
+ {%- endif -%}
31
+ {%- endmacro -%}
32
+ {%- set ns = namespace(last_user_index=-1) %}
33
+ {%- for m in messages %}
34
+ {%- if m.role == 'user' %}
35
+ {% set ns.last_user_index = loop.index0 -%}
36
+ {%- endif %}
37
+ {%- endfor %}
38
+ {% for m in messages %}
39
+ {%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
40
+ {%- elif m.role == 'assistant' -%}
41
+ <|assistant|>
42
+ {%- set reasoning_content = '' %}
43
+ {%- set content = visible_text(m.content) %}
44
+ {%- if m.reasoning_content is string %}
45
+ {%- set reasoning_content = m.reasoning_content %}
46
+ {%- else %}
47
+ {%- if '</think>' in content %}
48
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
49
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
50
+ {%- endif %}
51
+ {%- endif %}
52
+ {%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
53
+ {{ '<think>' + reasoning_content.strip() + '</think>'}}
54
+ {%- else -%}
55
+ {{ '</think>' }}
56
+ {%- endif -%}
57
+ {%- if content.strip() -%}
58
+ {{ content.strip() }}
59
+ {%- endif -%}
60
+ {% if m.tool_calls %}
61
+ {% for tc in m.tool_calls %}
62
+ {%- if tc.function %}
63
+ {%- set tc = tc.function %}
64
+ {%- endif %}
65
+ {{- '<tool_call>' + tc.name -}}
66
+ {% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
67
+ {% endif %}
68
+ {%- elif m.role == 'tool' -%}
69
+ {%- if m.content is string -%}
70
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
71
+ {{- '<|observation|>' }}
72
+ {%- endif %}
73
+ {{- '<tool_response>' }}
74
+ {{- m.content }}
75
+ {{- '</tool_response>' }}
76
+ {%- else -%}
77
+ <|observation|>{% for tr in m.content %}
78
+ <tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
79
+ {% endif -%}
80
+ {%- elif m.role == 'system' -%}
81
+ <|system|>{{ visible_text(m.content) }}
82
+ {%- endif -%}
83
+ {%- endfor -%}
84
+ {%- if add_generation_prompt -%}
85
+ <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
86
+ {%- endif -%}
checkpoint_metadata.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "0.0.95",
3
+ "timestamp": "2026-02-09T10:51:28.038321",
4
+ "checksum": {},
5
+ "array_metadata": {},
6
+ "framework_version": null,
7
+ "custom_metadata": {}
8
+ }
config.json ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_external_rope_config_kwargs": {},
3
+ "architectures": [
4
+ "Glm4MoeLiteForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_mechanism": "ragged_page_attention_v3",
9
+ "backend": null,
10
+ "bits": null,
11
+ "blocksize_b": 1,
12
+ "blocksize_k": 128,
13
+ "blocksize_q": 128,
14
+ "bos_token_id": 0,
15
+ "decode_attn_mechanism": null,
16
+ "dtype": "bfloat16",
17
+ "easy_method": "train",
18
+ "eos_token_id": [
19
+ 154820,
20
+ 154827,
21
+ 154829
22
+ ],
23
+ "fcm_max_ratio": 0.0,
24
+ "fcm_min_ratio": 0.0,
25
+ "first_k_dense_replace": 1,
26
+ "flash_attention_backward_pass_impl": "triton",
27
+ "freq_max_position_embeddings": 65536,
28
+ "fsdp_is_ep_bound": true,
29
+ "gradient_checkpointing": "",
30
+ "gradient_checkpointing_targets": null,
31
+ "hardware_abstraction": true,
32
+ "head_dim": 64,
33
+ "hidden_act": "silu",
34
+ "hidden_size": 2048,
35
+ "initializer_range": 0.02,
36
+ "intermediate_size": 10240,
37
+ "kv_cache_quantization_config": null,
38
+ "kv_cache_sharding_sequence_axis_name": "sp",
39
+ "kv_lora_rank": 512,
40
+ "mask_max_position_embeddings": 65536,
41
+ "max_position_embeddings": 202752,
42
+ "mlp_layer_types": [
43
+ "dense",
44
+ "sparse",
45
+ "sparse",
46
+ "sparse",
47
+ "sparse",
48
+ "sparse",
49
+ "sparse",
50
+ "sparse",
51
+ "sparse",
52
+ "sparse",
53
+ "sparse",
54
+ "sparse",
55
+ "sparse",
56
+ "sparse",
57
+ "sparse",
58
+ "sparse",
59
+ "sparse",
60
+ "sparse",
61
+ "sparse",
62
+ "sparse",
63
+ "sparse",
64
+ "sparse",
65
+ "sparse",
66
+ "sparse",
67
+ "sparse",
68
+ "sparse",
69
+ "sparse",
70
+ "sparse",
71
+ "sparse",
72
+ "sparse",
73
+ "sparse",
74
+ "sparse",
75
+ "sparse",
76
+ "sparse",
77
+ "sparse",
78
+ "sparse",
79
+ "sparse",
80
+ "sparse",
81
+ "sparse",
82
+ "sparse",
83
+ "sparse",
84
+ "sparse",
85
+ "sparse",
86
+ "sparse",
87
+ "sparse",
88
+ "sparse",
89
+ "sparse"
90
+ ],
91
+ "model_type": "glm4_moe_lite",
92
+ "moe_force_xla_gmm": false,
93
+ "moe_intermediate_size": 1536,
94
+ "moe_method": "fused_moe",
95
+ "moe_tiling_size_batch": 4,
96
+ "moe_tiling_size_dim": 128,
97
+ "moe_tiling_size_seqlen": 128,
98
+ "n_group": 1,
99
+ "n_routed_experts": 64,
100
+ "n_shared_experts": 1,
101
+ "norm_topk_prob": true,
102
+ "num_attention_heads": 20,
103
+ "num_experts_per_tok": 4,
104
+ "num_hidden_layers": 47,
105
+ "num_key_value_heads": 20,
106
+ "num_nextn_predict_layers": 1,
107
+ "operation_configs": null,
108
+ "pad_token_id": 154820,
109
+ "pallas_k_block_size": 128,
110
+ "pallas_m_block_size": 128,
111
+ "pallas_n_block_size": 128,
112
+ "partial_rotary_factor": 1.0,
113
+ "partition_axis": {
114
+ "attention_dim_axis": null,
115
+ "attention_kv_dim_axis": null,
116
+ "batch_axis": [
117
+ "fsdp",
118
+ "dp"
119
+ ],
120
+ "bias_head_sequence_axis": null,
121
+ "bias_key_sequence_axis": null,
122
+ "data_parallel_axis": "dp",
123
+ "decode_attention_dim_axis": null,
124
+ "decode_attention_kv_dim_axis": null,
125
+ "decode_batch_axis": [
126
+ "fsdp",
127
+ "dp"
128
+ ],
129
+ "decode_head_axis": "tp",
130
+ "decode_key_sequence_axis": "sp",
131
+ "decode_kv_head_axis": "tp",
132
+ "decode_query_sequence_axis": null,
133
+ "expert_axis": "ep",
134
+ "expert_gate_axis": null,
135
+ "expert_parallel_axis": "ep",
136
+ "fully_sharded_data_parallel_axis": "fsdp",
137
+ "head_axis": "tp",
138
+ "hidden_state_axis": "tp",
139
+ "key_sequence_axis": "sp",
140
+ "kv_head_axis": "tp",
141
+ "mlp_intermediate_axis": "tp",
142
+ "query_sequence_axis": "sp",
143
+ "sequence_axis": "sp",
144
+ "sequence_parallel_axis": "sp",
145
+ "tensor_parallel_axis": "tp",
146
+ "vocab_axis": "tp"
147
+ },
148
+ "platform": null,
149
+ "precompute_masks": true,
150
+ "pretraining_tp": 1,
151
+ "q_lora_rank": 768,
152
+ "qk_head_dim": 256,
153
+ "qk_nope_head_dim": 192,
154
+ "qk_rope_head_dim": 64,
155
+ "quantization_config": {
156
+ "dtype": "nf4",
157
+ "group_size": 128,
158
+ "jax_native": false
159
+ },
160
+ "rms_norm_eps": 1e-05,
161
+ "rope_interleave": true,
162
+ "rope_parameters": {
163
+ "partial_rotary_factor": 1.0,
164
+ "rope_theta": 10000.0,
165
+ "rope_type": "default"
166
+ },
167
+ "rope_theta": 1000000,
168
+ "routed_scaling_factor": 1.8,
169
+ "scan_attention_layers": false,
170
+ "scan_mlp_chunk_size": 1024,
171
+ "scan_ring_attention": true,
172
+ "sequence_axis_name": "sp",
173
+ "sharding_axis_dims": [
174
+ 1,
175
+ 1,
176
+ 1,
177
+ -1,
178
+ 1
179
+ ],
180
+ "sharding_axis_names": [
181
+ "dp",
182
+ "fsdp",
183
+ "ep",
184
+ "tp",
185
+ "sp"
186
+ ],
187
+ "sharding_dcn_axis_dims": null,
188
+ "sp_is_ep_bound": true,
189
+ "tie_word_embeddings": false,
190
+ "topk_group": 1,
191
+ "topk_method": "noaux_tc",
192
+ "transformers_version": "5.0.0",
193
+ "use_cache": true,
194
+ "use_expert_tensor_mode": false,
195
+ "use_ring_of_experts": false,
196
+ "use_scan_mlp": false,
197
+ "use_sharded_kv_caching": false,
198
+ "use_sharding_constraint": false,
199
+ "v_head_dim": 256,
200
+ "vocab_size": 154880
201
+ }
generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 154820,
5
+ 154827,
6
+ 154829
7
+ ],
8
+ "pad_token_id": 154820,
9
+ "temperature": 1.0,
10
+ "transformers_version": "5.0.0"
11
+ }
model/lm_head/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2048,38720],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,154880],"zarr_format":2}
model/lm_head/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9db625cb07841022be40257e4558c9264b9ea3487962eee936bb23e15adbb435
3
+ size 121923157
model/lm_head/kernel/0.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff6bf816347c4f1fe0c6ea03675be7ab7f2ff5e4f0fb8a6bc77396c885ec7770
3
+ size 122102896
model/lm_head/kernel/0.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f1d7c26126e9fa2bbc2d2e45dab67135ee09bb19e44cc32cb451809225cbf43
3
+ size 122012645
model/lm_head/kernel/0.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c982ea8dab6e9eda86ec19abcdf289f5ca089856ae67ecfa26b4d7e6411fccf3
3
+ size 122187869
model/model/embed_tokens/embedding/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[154880,512],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[154880,2048],"zarr_format":2}
model/model/embed_tokens/embedding/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:854ed9639b5cbc49f38fefb7fdc7d3520912366cfaeb8b0c4495e9107a534242
3
+ size 123561687
model/model/embed_tokens/embedding/0.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9aa4a4ac6a6bf8aa3576cbcc08f170bbdc6bfc20eb30702db1d4b25b7f78a64
3
+ size 123559739
model/model/embed_tokens/embedding/0.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bc41d9b069cb67ffeb0a52eb9906f3634566dc84dc6964622dfb0795fefdd10
3
+ size 123569933
model/model/embed_tokens/embedding/0.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3c651ef4b27aa1f731e22d91078a26cf050b2e4422f7427531e44a0073ddd02
3
+ size 123567220
model/model/layers/0/input_layernorm/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048],"zarr_format":2}
model/model/layers/0/input_layernorm/kernel/0 ADDED
Binary file (2.33 kB). View file
 
model/model/layers/0/mlp/down_proj/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2560,2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[10240,2048],"zarr_format":2}
model/model/layers/0/mlp/down_proj/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:818bca024079390503253dd1e55b9a71ffd899ab8761b93a3abda2555faf06ca
3
+ size 8205293
model/model/layers/0/mlp/down_proj/kernel/1.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:480074fca1c85a6b8d9bd608ba28889691cf6011b58f1ae028dcff1d56f6eb55
3
+ size 8205605
model/model/layers/0/mlp/down_proj/kernel/2.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c54a17b9fe4aacac06fe8bea9ce02514b2d16c54369d6a339108072734d2afc0
3
+ size 8205324
model/model/layers/0/mlp/down_proj/kernel/3.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77fe50b4e0a3172588f88cb5c9a56796e29d2228b3a68e8e37b7e87a645cb587
3
+ size 8205905
model/model/layers/0/mlp/gate_proj/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2048,2560],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,10240],"zarr_format":2}
model/model/layers/0/mlp/gate_proj/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65d0d2fe6e19d61364d9adbc71e6903d9301ab22eb58ea651ed8fffe3e518f58
3
+ size 8209818
model/model/layers/0/mlp/gate_proj/kernel/0.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d13973e8983afb35565c2ee6b80b69f27018a67df3be57a49424fc4f9eb77002
3
+ size 8209014
model/model/layers/0/mlp/gate_proj/kernel/0.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa0bcf67870b820cc36fbb8cc44020b3046c1a6507cba47fb156188267c1ed03
3
+ size 8208317
model/model/layers/0/mlp/gate_proj/kernel/0.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35591fca5f7de0f25c134ee7b73b4e794bdf7eb2a7d27d67e0466980885a871e
3
+ size 8209511
model/model/layers/0/mlp/up_proj/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2048,2560],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,10240],"zarr_format":2}
model/model/layers/0/mlp/up_proj/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60e6498124a34701848cb8aa2b82bb3e8e29a2439e4d8e105adf465e0d5a52ff
3
+ size 8211458
model/model/layers/0/mlp/up_proj/kernel/0.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83e2b9b8f08253b97ea99161035a268525e8c720726f3a55cd97234910a21b89
3
+ size 8210226
model/model/layers/0/mlp/up_proj/kernel/0.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2833867c5d1d85179ab7d69cd80db8c79de3f119e65c6499fbaacefd6ee728f
3
+ size 8210590
model/model/layers/0/mlp/up_proj/kernel/0.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e176c782eca8fc0495c33f94ff2272fb0c2de376368cc3e5e44d6d46188db7e4
3
+ size 8210603
model/model/layers/0/post_attention_layernorm/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048],"zarr_format":2}
model/model/layers/0/post_attention_layernorm/kernel/0 ADDED
Binary file (2.1 kB). View file
 
model/model/layers/0/self_attn/kv_a_layernorm/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[512],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[512],"zarr_format":2}
model/model/layers/0/self_attn/kv_a_layernorm/kernel/0 ADDED
Binary file (674 Bytes). View file
 
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[2048,144],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,576],"zarr_format":2}
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2462ba05681a34a8b46ddce489d6f7d472a50d26baf207b109a484ed5bc7506
3
+ size 461514
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c5f33d1cbe44af63f61875cf9db0d64485d6eeb1151f5587235546edd3c62cd
3
+ size 461674
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:446cbf4155f4277db6fd47f0a6d24faf91febc16fb472b2e9352ebc38cde6b87
3
+ size 461516
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7923f2fd61a180468e65390070a89b7e2bb306ce26616233823a12b0f6d54375
3
+ size 467363
model/model/layers/0/self_attn/kv_b_proj/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[512,2240],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[512,8960],"zarr_format":2}
model/model/layers/0/self_attn/kv_b_proj/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d439a65acc9e207f9a5f3383b73cfe46f4399ed50d9e27d90286e6b49cabd205
3
+ size 1838281
model/model/layers/0/self_attn/kv_b_proj/kernel/0.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7150f6cf9000c250f9a234184e4028c3de4f8582f6528d9de5672d331b53369b
3
+ size 1838831
model/model/layers/0/self_attn/kv_b_proj/kernel/0.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de1508baa852f5d54462e0a838d09102d3edbeb18fdcc117dcf5b6f7b219997e
3
+ size 1853092
model/model/layers/0/self_attn/kv_b_proj/kernel/0.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3963702f3f09d215517ec53efc557557490f219b545ab26c357b206535b58fa1
3
+ size 1844678
model/model/layers/0/self_attn/o_proj/kernel/.zarray ADDED
@@ -0,0 +1 @@
 
 
1
+ {"chunks":[1280,2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[5120,2048],"zarr_format":2}
model/model/layers/0/self_attn/o_proj/kernel/0.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49d9411145e2812bc128ae13faf3971ecf082121cf665ecabc35ac7b6d9044d0
3
+ size 4087692
model/model/layers/0/self_attn/o_proj/kernel/1.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc65bf0b514a069bc1a692bb219a92c7025363e069bd26ddbe47b02e9da9c26a
3
+ size 4079010