GLM-5.1-6layer (layer-trimmed, for training/serving infra testing)

A layer-trimmed copy of zai-org/GLM-5.1, reduced from 78 layers (+1 MTP) to 6 layers (+1 MTP), so the full set of distinct layer types can be exercised on a small number of GPUs during early training/serving infra development (LoRA, parallelism, MTP, etc.).

This is NOT a usable language model — most layers are removed, so generations are gibberish. It exists purely to let infra code load, shard, attach LoRA to, and run a forward/backward pass over every structurally distinct layer of GLM-5.1 at a fraction of the size (~80 GB bf16 vs ~1.45 TB).

What was kept (verbatim bf16 weights from the base model)

Trimmed index	Source layer	Type	Why kept
0, 1, 2	0, 1, 2	Dense MLP	`first_k_dense_replace=3` — the only dense layers; unique to the start
3, 4, 5	3, 4, 5	MoE (256 routed + 1 shared)	representative of the homogeneous MoE block (orig layers 3–77)
6	78	MTP / nextn	the multi-token-prediction layer (`enorm`/`hnorm`/`eh_proj`/`shared_head`)
—	—	`embed_tokens`, final `norm`, `lm_head`	top-level, always required

Every layer carries the same MLA attention + DSA sparse-attention indexer (q_a_proj/q_b_proj/kv_a_proj_with_mqa/kv_b_proj/o_proj and indexer.wq_b/indexer.wk/indexer.weights_proj), so attention and the indexer are covered by any kept layer. The dense vs MoE distinction is the only MLP difference; MoE layers 3–77 are structurally identical (moe_layer_freq=1), so three samples fully represent them.

What was removed

Original layers 6–77 (72 MoE layers) — homogeneous duplicates of 3–5.
Nothing else: the tokenizer, chat template, generation config, and all non-layer weights are unchanged.

What changed in `config.json`

Only num_hidden_layers: 78 → 6. Everything else (first_k_dense_replace=3, num_nextn_predict_layers=1, expert counts, all dims) is identical to the base, so the per-layer architecture is bit-for-bit the real GLM-5.1. The MTP layer is renumbered from index 78 to index 6 (= num_hidden_layers), matching how the nextn layer is addressed.

Coverage checklist (all distinct layer types present ≥ once)

Dense MLP layer (0–2)
MoE layer — routed + shared experts, gate (3–5)
MLA attention + DSA indexer (every layer)
MTP / nextn layer (6)
embed_tokens / final norm / lm_head

Provenance

Produced by selecting the relevant shards of zai-org/GLM-5.1, copying the kept tensors verbatim (bf16), renumbering only the MTP layer, and rewriting the safetensors index + num_hidden_layers. Verified to load and run a forward pass in SGLang (main).

Downloads last month: 260

Safetensors

Model size

43B params

Tensor type

BF16

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jybsuper/GLM-5.1-6layer

Base model

zai-org/GLM-5.1

Finetuned

(12)

this model