GLM-5.1-6layer (layer-trimmed, for training/serving infra testing)

A layer-trimmed copy of zai-org/GLM-5.1, reduced from 78 layers (+1 MTP) to 6 layers (+1 MTP), so the full set of distinct layer types can be exercised on a small number of GPUs during early training/serving infra development (LoRA, parallelism, MTP, etc.).

This is NOT a usable language model β€” most layers are removed, so generations are gibberish. It exists purely to let infra code load, shard, attach LoRA to, and run a forward/backward pass over every structurally distinct layer of GLM-5.1 at a fraction of the size (~80 GB bf16 vs ~1.45 TB).

What was kept (verbatim bf16 weights from the base model)

Trimmed index Source layer Type Why kept
0, 1, 2 0, 1, 2 Dense MLP first_k_dense_replace=3 β€” the only dense layers; unique to the start
3, 4, 5 3, 4, 5 MoE (256 routed + 1 shared) representative of the homogeneous MoE block (orig layers 3–77)
6 78 MTP / nextn the multi-token-prediction layer (enorm/hnorm/eh_proj/shared_head)
β€” β€” embed_tokens, final norm, lm_head top-level, always required

Every layer carries the same MLA attention + DSA sparse-attention indexer (q_a_proj/q_b_proj/kv_a_proj_with_mqa/kv_b_proj/o_proj and indexer.wq_b/indexer.wk/indexer.weights_proj), so attention and the indexer are covered by any kept layer. The dense vs MoE distinction is the only MLP difference; MoE layers 3–77 are structurally identical (moe_layer_freq=1), so three samples fully represent them.

What was removed

  • Original layers 6–77 (72 MoE layers) β€” homogeneous duplicates of 3–5.
  • Nothing else: the tokenizer, chat template, generation config, and all non-layer weights are unchanged.

What changed in config.json

Only num_hidden_layers: 78 β†’ 6. Everything else (first_k_dense_replace=3, num_nextn_predict_layers=1, expert counts, all dims) is identical to the base, so the per-layer architecture is bit-for-bit the real GLM-5.1. The MTP layer is renumbered from index 78 to index 6 (= num_hidden_layers), matching how the nextn layer is addressed.

Coverage checklist (all distinct layer types present β‰₯ once)

  • Dense MLP layer (0–2)
  • MoE layer β€” routed + shared experts, gate (3–5)
  • MLA attention + DSA indexer (every layer)
  • MTP / nextn layer (6)
  • embed_tokens / final norm / lm_head

Provenance

Produced by selecting the relevant shards of zai-org/GLM-5.1, copying the kept tensors verbatim (bf16), renumbering only the MTP layer, and rewriting the safetensors index + num_hidden_layers. Verified to load and run a forward pass in SGLang (main).

Downloads last month
260
Safetensors
Model size
43B params
Tensor type
BF16
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jybsuper/GLM-5.1-6layer

Base model

zai-org/GLM-5.1
Finetuned
(12)
this model