erfanzar commited on Feb 9

Commit

530b664

verified ·

1 Parent(s): 9ebd5cb

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +0 -0
README.md +159 -0
chat_template.jinja +86 -0
checkpoint_metadata.json +8 -0
config.json +201 -0
generation_config.json +11 -0
model/lm_head/kernel/.zarray +1 -0
model/lm_head/kernel/0.0 +3 -0
model/lm_head/kernel/0.1 +3 -0
model/lm_head/kernel/0.2 +3 -0
model/lm_head/kernel/0.3 +3 -0
model/model/embed_tokens/embedding/.zarray +1 -0
model/model/embed_tokens/embedding/0.0 +3 -0
model/model/embed_tokens/embedding/0.1 +3 -0
model/model/embed_tokens/embedding/0.2 +3 -0
model/model/embed_tokens/embedding/0.3 +3 -0
model/model/layers/0/input_layernorm/kernel/.zarray +1 -0
model/model/layers/0/input_layernorm/kernel/0 +0 -0
model/model/layers/0/mlp/down_proj/kernel/.zarray +1 -0
model/model/layers/0/mlp/down_proj/kernel/0.0 +3 -0
model/model/layers/0/mlp/down_proj/kernel/1.0 +3 -0
model/model/layers/0/mlp/down_proj/kernel/2.0 +3 -0
model/model/layers/0/mlp/down_proj/kernel/3.0 +3 -0
model/model/layers/0/mlp/gate_proj/kernel/.zarray +1 -0
model/model/layers/0/mlp/gate_proj/kernel/0.0 +3 -0
model/model/layers/0/mlp/gate_proj/kernel/0.1 +3 -0
model/model/layers/0/mlp/gate_proj/kernel/0.2 +3 -0
model/model/layers/0/mlp/gate_proj/kernel/0.3 +3 -0
model/model/layers/0/mlp/up_proj/kernel/.zarray +1 -0
model/model/layers/0/mlp/up_proj/kernel/0.0 +3 -0
model/model/layers/0/mlp/up_proj/kernel/0.1 +3 -0
model/model/layers/0/mlp/up_proj/kernel/0.2 +3 -0
model/model/layers/0/mlp/up_proj/kernel/0.3 +3 -0
model/model/layers/0/post_attention_layernorm/kernel/.zarray +1 -0
model/model/layers/0/post_attention_layernorm/kernel/0 +0 -0
model/model/layers/0/self_attn/kv_a_layernorm/kernel/.zarray +1 -0
model/model/layers/0/self_attn/kv_a_layernorm/kernel/0 +0 -0
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/.zarray +1 -0
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.0 +3 -0
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.1 +3 -0
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.2 +3 -0
model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.3 +3 -0
model/model/layers/0/self_attn/kv_b_proj/kernel/.zarray +1 -0
model/model/layers/0/self_attn/kv_b_proj/kernel/0.0 +3 -0
model/model/layers/0/self_attn/kv_b_proj/kernel/0.1 +3 -0
model/model/layers/0/self_attn/kv_b_proj/kernel/0.2 +3 -0
model/model/layers/0/self_attn/kv_b_proj/kernel/0.3 +3 -0
model/model/layers/0/self_attn/o_proj/kernel/.zarray +1 -0
model/model/layers/0/self_attn/o_proj/kernel/0.0 +3 -0
model/model/layers/0/self_attn/o_proj/kernel/1.0 +3 -0

.gitattributes CHANGED Viewed

The diff for this file is too large to render. See raw diff

README.md ADDED Viewed

	@@ -0,0 +1,159 @@

+---
+library_name: easydel
+pipeline_tag: text-generation
+tags:
+  - easydel
+  - jax
+  - "glm4_moe_lite"
+  - "CausalLM"
+  - "ragged_page_attention_v3"
+---
+<p align="center">
+  <img alt="EasyDeL" src="https://raw.githubusercontent.com/erfanzar/easydel/main/images/easydel-logo-with-text.png" height="80">
+</p>
+<h1 align="center">GLM-4.7Flash</h1>
+<div align="center">
+  A model compatible with the EasyDeL JAX stack.
+</div>
+## Overview
+This checkpoint is intended to be loaded with EasyDeL on JAX (CPU/GPU/TPU). It supports sharded loading with `auto_shard_model=True` and configurable precision via `dtype`, `param_dtype`, and `precision`.
+## Quickstart
+```python
+import easydel as ed
+from jax import numpy as jnp, lax
+repo_id = "GLM-4.7Flash"
+dtype = jnp.bfloat16  # try jnp.float16 on many GPUs
+model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
+    repo_id,
+    dtype=dtype,
+    param_dtype=dtype,
+    precision=lax.Precision("fastest"),
+    sharding_axis_names=("dp", "fsdp", "ep", "tp", "sp"),
+    sharding_axis_dims=(1, -1, 1, 1, 1),
+    config_kwargs=ed.EasyDeLBaseConfigDict(
+        attn_dtype=dtype,
+        attn_mechanism=ed.AttentionMechanisms.RAGGED_PAGE_ATTENTION_V3,
+        fsdp_is_ep_bound=True,
+        sp_is_ep_bound=True,
+        moe_method=ed.MoEMethods.FUSED_MOE,
+    ),
+    auto_shard_model=True,
+    partition_axis=ed.PartitionAxis(),
+)
+```
+If the repository only provides PyTorch weights, pass `from_torch=True` to `from_pretrained(...)`.
+## Sharding & Parallelism (Multi-Device)
+EasyDeL can scale to multiple devices by creating a logical device mesh. Most EasyDeL loaders use a 5D mesh:
+- `dp`: data parallel (replicated parameters, different batch shards)
+- `fsdp`: parameter sharding (memory saver; often the biggest axis)
+- `ep`: expert parallel (MoE; keep `1` for non-MoE models)
+- `tp`: tensor parallel (splits large matmuls)
+- `sp`: sequence parallel (splits sequence dimension)
+Use `sharding_axis_names=("dp","fsdp","ep","tp","sp")` and choose `sharding_axis_dims` so that their product equals your device count.
+You can use `-1` in `sharding_axis_dims` to let EasyDeL infer the remaining dimension.
+<details>
+<summary>Example sharding configs</summary>
+```python
+# 8 devices, pure FSDP
+sharding_axis_dims = (1, 8, 1, 1, 1)
+# 8 devices, 2-way DP x 4-way FSDP
+sharding_axis_dims = (2, 4, 1, 1, 1)
+# 8 devices, 4-way FSDP x 2-way TP
+sharding_axis_dims = (1, 4, 1, 2, 1)
+```
+</details>
+## Using via `eLargeModel` (ELM)
+`eLargeModel` is a higher-level interface that wires together loading, sharding, training, and eSurge inference from a single config.
+```python
+from easydel import eLargeModel
+repo_id = "GLM-4.7Flash"
+elm = eLargeModel.from_pretrained(repo_id)  # task is auto-detected
+elm.set_dtype("bf16")
+elm.set_sharding(axis_names=("dp", "fsdp", "ep", "tp", "sp"), axis_dims=(1, -1, 1, 1, 1))
+model = elm.build_model()
+# Optional: build an inference engine
+# engine = elm.build_esurge()
+```
+<details>
+<summary>ELM YAML config example</summary>
+```yaml
+model:
+  name_or_path: "GLM-4.7Flash"
+loader:
+  dtype: bf16
+  param_dtype: bf16
+sharding:
+  axis_dims: [1, -1, 1, 1, 1]
+  auto_shard_model: true
+```
+</details>
+## Features
+**EasyDeL:**
+- JAX native implementation and sharded execution
+- Configurable attention backends via `AttentionMechanisms.*`
+- Precision control via `dtype`, `param_dtype`, and `precision`
+## Installation
+```bash
+pip install easydel
+```
+## Links
+- EasyDeL GitHub: https://github.com/erfanzar/EasyDeL
+- Docs: https://easydel.readthedocs.io/en/latest/
+## Supported Tasks
+- CausalLM
+## Limitations
+- Refer to the original model card for training data, evaluation, and intended use.
+## License
+EasyDeL is released under the Apache-2.0 license. The license for this model's weights may differ; please consult the original repository.
+## Citation
+```bibtex
+@misc{Zare Chavoshi_2023,
+    title={EasyDeL: An open-source library for enhancing and streamlining the training process of machine learning models},
+    url={https://github.com/erfanzar/EasyDeL},
+    author={Zare Chavoshi, Erfan},
+    year={2023}
+}
+```

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,86 @@

+[gMASK]<sop>
+{%- if tools -%}
+<|system|>
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{% for tool in tools %}
+{{ tool | tojson(ensure_ascii=False) }}
+{% endfor %}
+</tools>
+For each function call, output the function name and arguments within the following XML format:
+<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{%- set ns = namespace(last_user_index=-1) %}
+{%- for m in messages %}
+    {%- if m.role == 'user' %}
+        {% set ns.last_user_index = loop.index0 -%}
+    {%- endif %}
+{%- endfor %}
+{% for m in messages %}
+{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
+{%- elif m.role == 'assistant' -%}
+<|assistant|>
+{%- set reasoning_content = '' %}
+{%- set content = visible_text(m.content) %}
+{%- if m.reasoning_content is string %}
+    {%- set reasoning_content = m.reasoning_content %}
+{%- else %}
+    {%- if '</think>' in content %}
+        {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+        {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+    {%- endif %}
+{%- endif %}
+{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
+{{ '<think>' + reasoning_content.strip() +  '</think>'}}
+{%- else -%}
+{{ '</think>' }}
+{%- endif -%}
+{%- if content.strip() -%}
+{{ content.strip() }}
+{%- endif -%}
+{% if m.tool_calls %}
+{% for tc in m.tool_calls %}
+{%- if tc.function %}
+    {%- set tc = tc.function %}
+{%- endif %}
+{{- '<tool_call>' + tc.name -}}
+{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
+{% endif %}
+{%- elif m.role == 'tool' -%}
+{%- if m.content is string -%}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+    {{- '<|observation|>' }}
+{%- endif %}
+{{- '<tool_response>' }}
+{{- m.content }}
+{{- '</tool_response>' }}
+{%- else -%}
+<|observation|>{% for tr in m.content %}
+<tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
+{% endif -%}
+{%- elif m.role == 'system' -%}
+<|system|>{{ visible_text(m.content) }}
+{%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
+{%- endif -%}

checkpoint_metadata.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "version": "0.0.95",
+  "timestamp": "2026-02-09T10:51:28.038321",
+  "checksum": {},
+  "array_metadata": {},
+  "framework_version": null,
+  "custom_metadata": {}
+}

config.json ADDED Viewed

	@@ -0,0 +1,201 @@

+{
+  "_external_rope_config_kwargs": {},
+  "architectures": [
+    "Glm4MoeLiteForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_mechanism": "ragged_page_attention_v3",
+  "backend": null,
+  "bits": null,
+  "blocksize_b": 1,
+  "blocksize_k": 128,
+  "blocksize_q": 128,
+  "bos_token_id": 0,
+  "decode_attn_mechanism": null,
+  "dtype": "bfloat16",
+  "easy_method": "train",
+  "eos_token_id": [
+    154820,
+    154827,
+    154829
+  ],
+  "fcm_max_ratio": 0.0,
+  "fcm_min_ratio": 0.0,
+  "first_k_dense_replace": 1,
+  "flash_attention_backward_pass_impl": "triton",
+  "freq_max_position_embeddings": 65536,
+  "fsdp_is_ep_bound": true,
+  "gradient_checkpointing": "",
+  "gradient_checkpointing_targets": null,
+  "hardware_abstraction": true,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 10240,
+  "kv_cache_quantization_config": null,
+  "kv_cache_sharding_sequence_axis_name": "sp",
+  "kv_lora_rank": 512,
+  "mask_max_position_embeddings": 65536,
+  "max_position_embeddings": 202752,
+  "mlp_layer_types": [
+    "dense",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse",
+    "sparse"
+  ],
+  "model_type": "glm4_moe_lite",
+  "moe_force_xla_gmm": false,
+  "moe_intermediate_size": 1536,
+  "moe_method": "fused_moe",
+  "moe_tiling_size_batch": 4,
+  "moe_tiling_size_dim": 128,
+  "moe_tiling_size_seqlen": 128,
+  "n_group": 1,
+  "n_routed_experts": 64,
+  "n_shared_experts": 1,
+  "norm_topk_prob": true,
+  "num_attention_heads": 20,
+  "num_experts_per_tok": 4,
+  "num_hidden_layers": 47,
+  "num_key_value_heads": 20,
+  "num_nextn_predict_layers": 1,
+  "operation_configs": null,
+  "pad_token_id": 154820,
+  "pallas_k_block_size": 128,
+  "pallas_m_block_size": 128,
+  "pallas_n_block_size": 128,
+  "partial_rotary_factor": 1.0,
+  "partition_axis": {
+    "attention_dim_axis": null,
+    "attention_kv_dim_axis": null,
+    "batch_axis": [
+      "fsdp",
+      "dp"
+    ],
+    "bias_head_sequence_axis": null,
+    "bias_key_sequence_axis": null,
+    "data_parallel_axis": "dp",
+    "decode_attention_dim_axis": null,
+    "decode_attention_kv_dim_axis": null,
+    "decode_batch_axis": [
+      "fsdp",
+      "dp"
+    ],
+    "decode_head_axis": "tp",
+    "decode_key_sequence_axis": "sp",
+    "decode_kv_head_axis": "tp",
+    "decode_query_sequence_axis": null,
+    "expert_axis": "ep",
+    "expert_gate_axis": null,
+    "expert_parallel_axis": "ep",
+    "fully_sharded_data_parallel_axis": "fsdp",
+    "head_axis": "tp",
+    "hidden_state_axis": "tp",
+    "key_sequence_axis": "sp",
+    "kv_head_axis": "tp",
+    "mlp_intermediate_axis": "tp",
+    "query_sequence_axis": "sp",
+    "sequence_axis": "sp",
+    "sequence_parallel_axis": "sp",
+    "tensor_parallel_axis": "tp",
+    "vocab_axis": "tp"
+  },
+  "platform": null,
+  "precompute_masks": true,
+  "pretraining_tp": 1,
+  "q_lora_rank": 768,
+  "qk_head_dim": 256,
+  "qk_nope_head_dim": 192,
+  "qk_rope_head_dim": 64,
+  "quantization_config": {
+    "dtype": "nf4",
+    "group_size": 128,
+    "jax_native": false
+  },
+  "rms_norm_eps": 1e-05,
+  "rope_interleave": true,
+  "rope_parameters": {
+    "partial_rotary_factor": 1.0,
+    "rope_theta": 10000.0,
+    "rope_type": "default"
+  },
+  "rope_theta": 1000000,
+  "routed_scaling_factor": 1.8,
+  "scan_attention_layers": false,
+  "scan_mlp_chunk_size": 1024,
+  "scan_ring_attention": true,
+  "sequence_axis_name": "sp",
+  "sharding_axis_dims": [
+    1,
+    1,
+    1,
+    -1,
+    1
+  ],
+  "sharding_axis_names": [
+    "dp",
+    "fsdp",
+    "ep",
+    "tp",
+    "sp"
+  ],
+  "sharding_dcn_axis_dims": null,
+  "sp_is_ep_bound": true,
+  "tie_word_embeddings": false,
+  "topk_group": 1,
+  "topk_method": "noaux_tc",
+  "transformers_version": "5.0.0",
+  "use_cache": true,
+  "use_expert_tensor_mode": false,
+  "use_ring_of_experts": false,
+  "use_scan_mlp": false,
+  "use_sharded_kv_caching": false,
+  "use_sharding_constraint": false,
+  "v_head_dim": 256,
+  "vocab_size": 154880
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_from_model_config": true,
+  "eos_token_id": [
+    154820,
+    154827,
+    154829
+  ],
+  "pad_token_id": 154820,
+  "temperature": 1.0,
+  "transformers_version": "5.0.0"
+}

model/lm_head/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2048,38720],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,154880],"zarr_format":2}

model/lm_head/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9db625cb07841022be40257e4558c9264b9ea3487962eee936bb23e15adbb435
+size 121923157

model/lm_head/kernel/0.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff6bf816347c4f1fe0c6ea03675be7ab7f2ff5e4f0fb8a6bc77396c885ec7770
+size 122102896

model/lm_head/kernel/0.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f1d7c26126e9fa2bbc2d2e45dab67135ee09bb19e44cc32cb451809225cbf43
+size 122012645

model/lm_head/kernel/0.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c982ea8dab6e9eda86ec19abcdf289f5ca089856ae67ecfa26b4d7e6411fccf3
+size 122187869

model/model/embed_tokens/embedding/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[154880,512],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[154880,2048],"zarr_format":2}

model/model/embed_tokens/embedding/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:854ed9639b5cbc49f38fefb7fdc7d3520912366cfaeb8b0c4495e9107a534242
+size 123561687

model/model/embed_tokens/embedding/0.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f9aa4a4ac6a6bf8aa3576cbcc08f170bbdc6bfc20eb30702db1d4b25b7f78a64
+size 123559739

model/model/embed_tokens/embedding/0.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0bc41d9b069cb67ffeb0a52eb9906f3634566dc84dc6964622dfb0795fefdd10
+size 123569933

model/model/embed_tokens/embedding/0.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3c651ef4b27aa1f731e22d91078a26cf050b2e4422f7427531e44a0073ddd02
+size 123567220

model/model/layers/0/input_layernorm/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048],"zarr_format":2}

model/model/layers/0/input_layernorm/kernel/0 ADDED Viewed

Binary file (2.33 kB). View file

model/model/layers/0/mlp/down_proj/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2560,2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[10240,2048],"zarr_format":2}

model/model/layers/0/mlp/down_proj/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:818bca024079390503253dd1e55b9a71ffd899ab8761b93a3abda2555faf06ca
+size 8205293

model/model/layers/0/mlp/down_proj/kernel/1.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:480074fca1c85a6b8d9bd608ba28889691cf6011b58f1ae028dcff1d56f6eb55
+size 8205605

model/model/layers/0/mlp/down_proj/kernel/2.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c54a17b9fe4aacac06fe8bea9ce02514b2d16c54369d6a339108072734d2afc0
+size 8205324

model/model/layers/0/mlp/down_proj/kernel/3.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:77fe50b4e0a3172588f88cb5c9a56796e29d2228b3a68e8e37b7e87a645cb587
+size 8205905

model/model/layers/0/mlp/gate_proj/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2048,2560],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,10240],"zarr_format":2}

model/model/layers/0/mlp/gate_proj/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:65d0d2fe6e19d61364d9adbc71e6903d9301ab22eb58ea651ed8fffe3e518f58
+size 8209818

model/model/layers/0/mlp/gate_proj/kernel/0.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d13973e8983afb35565c2ee6b80b69f27018a67df3be57a49424fc4f9eb77002
+size 8209014

model/model/layers/0/mlp/gate_proj/kernel/0.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa0bcf67870b820cc36fbb8cc44020b3046c1a6507cba47fb156188267c1ed03
+size 8208317

model/model/layers/0/mlp/gate_proj/kernel/0.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35591fca5f7de0f25c134ee7b73b4e794bdf7eb2a7d27d67e0466980885a871e
+size 8209511

model/model/layers/0/mlp/up_proj/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2048,2560],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,10240],"zarr_format":2}

model/model/layers/0/mlp/up_proj/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60e6498124a34701848cb8aa2b82bb3e8e29a2439e4d8e105adf465e0d5a52ff
+size 8211458

model/model/layers/0/mlp/up_proj/kernel/0.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83e2b9b8f08253b97ea99161035a268525e8c720726f3a55cd97234910a21b89
+size 8210226

model/model/layers/0/mlp/up_proj/kernel/0.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2833867c5d1d85179ab7d69cd80db8c79de3f119e65c6499fbaacefd6ee728f
+size 8210590

model/model/layers/0/mlp/up_proj/kernel/0.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e176c782eca8fc0495c33f94ff2272fb0c2de376368cc3e5e44d6d46188db7e4
+size 8210603

model/model/layers/0/post_attention_layernorm/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048],"zarr_format":2}

model/model/layers/0/post_attention_layernorm/kernel/0 ADDED Viewed

Binary file (2.1 kB). View file

model/model/layers/0/self_attn/kv_a_layernorm/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[512],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[512],"zarr_format":2}

model/model/layers/0/self_attn/kv_a_layernorm/kernel/0 ADDED Viewed

Binary file (674 Bytes). View file

model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[2048,144],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[2048,576],"zarr_format":2}

model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2462ba05681a34a8b46ddce489d6f7d472a50d26baf207b109a484ed5bc7506
+size 461514

model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c5f33d1cbe44af63f61875cf9db0d64485d6eeb1151f5587235546edd3c62cd
+size 461674

model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:446cbf4155f4277db6fd47f0a6d24faf91febc16fb472b2e9352ebc38cde6b87
+size 461516

model/model/layers/0/self_attn/kv_a_proj_with_mqa/kernel/0.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7923f2fd61a180468e65390070a89b7e2bb306ce26616233823a12b0f6d54375
+size 467363

model/model/layers/0/self_attn/kv_b_proj/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[512,2240],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[512,8960],"zarr_format":2}

model/model/layers/0/self_attn/kv_b_proj/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d439a65acc9e207f9a5f3383b73cfe46f4399ed50d9e27d90286e6b49cabd205
+size 1838281

model/model/layers/0/self_attn/kv_b_proj/kernel/0.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7150f6cf9000c250f9a234184e4028c3de4f8582f6528d9de5672d331b53369b
+size 1838831

model/model/layers/0/self_attn/kv_b_proj/kernel/0.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:de1508baa852f5d54462e0a838d09102d3edbeb18fdcc117dcf5b6f7b219997e
+size 1853092

model/model/layers/0/self_attn/kv_b_proj/kernel/0.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3963702f3f09d215517ec53efc557557490f219b545ab26c357b206535b58fa1
+size 1844678

model/model/layers/0/self_attn/o_proj/kernel/.zarray ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"chunks":[1280,2048],"compressor":{"id":"zstd","level":1},"dimension_separator":".","dtype":"bfloat16","fill_value":null,"filters":null,"order":"C","shape":[5120,2048],"zarr_format":2}

model/model/layers/0/self_attn/o_proj/kernel/0.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49d9411145e2812bc128ae13faf3971ecf082121cf665ecabc35ac7b6d9044d0
+size 4087692

model/model/layers/0/self_attn/o_proj/kernel/1.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc65bf0b514a069bc1a692bb219a92c7025363e069bd26ddbe47b02e9da9c26a
+size 4079010