File size: 5,114 Bytes
9f8d3d6
 
 
 
 
 
63674c8
9f8d3d6
 
 
63674c8
9f8d3d6
 
74c7572
9f8d3d6
63674c8
0ae6e8b
9f8d3d6
1d4595c
63674c8
00e8078
1805352
00e8078
63674c8
9f8d3d6
63674c8
9f8d3d6
0ae6e8b
2a9d28d
63674c8
6175d6e
63674c8
 
 
 
68db643
63674c8
68db643
63674c8
6175d6e
63674c8
37a2bac
63674c8
 
 
 
37a2bac
63674c8
37a2bac
0ae6e8b
 
 
 
 
 
 
 
 
67851c1
63674c8
0ae6e8b
 
 
 
 
 
 
 
 
 
 
63674c8
0ae6e8b
63674c8
2a9d28d
0ae6e8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63674c8
25d50c6
0ae6e8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63674c8
25d50c6
0ae6e8b
 
1346575
 
 
 
 
415d92e
1346575
 
 
 
 
 
0ae6e8b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
license: other
license_name: prism-research
license_link: LICENSE.md
language:
- en
- zh
tags:
- glm4
- prism
- moe
pipeline_tag: text-generation
library_name: transformers
---

[![Parameters](https://img.shields.io/badge/Parameters-30B--A3B_MoE-blue)]()
[![Architecture](https://img.shields.io/badge/Architecture-GLM--4.7-green)]()
[![Context](https://img.shields.io/badge/Context-128K-orange)]()

# GLM-4.7-Flash-PRISM

An over-refusal/propaganda free version of [ZAI's GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) with over-refusal and bias mechanisms completely removed using our Advanced PRISM Pipeline.

<div align="center">

### ☕ Support Our Work

If you find this model useful, consider supporting us on Ko-fi!

[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz)

| Option | Description |
|--------|-------------|
| [**PRISM VIP Membership**](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) | Access to all PRISM models |
| [**One-Time Support**](https://ko-fi.com/s/86882e8991) | Support this model |

</div>

---

## Model Highlights

- **PRISM Ablation** — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
- **30B-A3B MoE Architecture** — 30 billion total parameters with ~3 billion active per token for fast, efficient inference
- **128K Context Window** — Extended context for complex tasks and large codebases
- **Interleaved Thinking** — Multi-turn reasoning that persists across conversations with per-turn thinking control

## Benchmarks

| Benchmark | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B |
|-----------|---------------|-----------------------------| ------------|
| AIME 2025 | 91.6 | 85.0 | 91.7 |
| GPQA | 75.2 | 73.4 | 71.5 |
| LCB v6 | 64.0 | 66.0 | 61.0 |
| HLE | 14.4 | 9.8 | 10.9 |
| SWE-bench Verified | 59.2 | 22.0 | 34.0 |
| τ²-Bench | 79.5 | 49.0 | 47.7 |
| BrowseComp | 42.8 | 2.29 | 28.3 |

## Usage

### Transformers

Install the latest transformers from source:

```shell
pip install git+https://github.com/huggingface/transformers.git
```

Run inference:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "Ex0bit/GLM-4.7-Flash-PRISM"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:])
print(output_text)
```

### vLLM

Install vLLM nightly:

```shell
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git
```

Serve the model:

```shell
vllm serve Ex0bit/GLM-4.7-Flash-PRISM \
     --tensor-parallel-size 4 \
     --speculative-config.method mtp \
     --speculative-config.num_speculative_tokens 1 \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \
     --served-model-name glm-4.7-flash-prism
```

### SGLang

Install SGLang:

```shell
uv pip install sglang==0.3.2.dev9039+pr-17247.g90c446848 --extra-index-url https://sgl-project.github.io/whl/pr/
uv pip install git+https://github.com/huggingface/transformers.git@76732b4e7120808ff989edbd16401f61fa6a0afa
```

Launch the server:

```shell
python3 -m sglang.launch_server \
  --model-path Ex0bit/GLM-4.7-Flash-PRISM \
  --tp-size 4 \
  --tool-call-parser glm47  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.8 \
  --served-model-name glm-4.7-flash-prism \
  --host 0.0.0.0 \
  --port 8000
```

> **Note:** For Blackwell GPUs, add `--attention-backend triton --speculative-draft-attention-backend triton` to your SGLang launch command.

## Recommended Parameters

| Use Case | Temperature | Top-P | Max New Tokens |
|----------|-------------|-------|----------------|
| Default | 1.0 | 0.95 | 131072 |
| Code (SWE-bench) | 0.7 | 1.0 | 16384 |
| Agentic Tasks | 0.0 | — | 16384 |

## License

This model is released under the [PRISM Research License](LICENSE.md).

## Citation

```bibtex
@misc{elbaz2026glm47flashPrism,
  author = {Elbaz, Eric},
  title = {Elbaz-GLM-4.7-Flash-PRISM: Unchained GLM-4.7-Flash-PRISM Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-GLM-4.7-Flash-PRISM}}
}
```

## Acknowledgments

Based on [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) by [Z.AI](https://z.ai). See the [technical report](https://arxiv.org/abs/2508.06471) for more details on the base model.