Ex0bit commited on
Commit
0ae6e8b
·
verified ·
1 Parent(s): 63674c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -12
README.md CHANGED
@@ -14,18 +14,18 @@ library_name: transformers
14
  ---
15
 
16
  [![Parameters](https://img.shields.io/badge/Parameters-30B--A3B_MoE-blue)]()
17
- [![Architecture](https://img.shields.io/badge/Architecture-GLM--4-green)]()
18
  [![Context](https://img.shields.io/badge/Context-128K-orange)]()
19
 
20
  # GLM-4.7-Flash-PRISM
21
 
22
- An unrestricted version of ZAI's GLM-4.7-Flash with over-refusal mechanisms removed using PRISM (Projected Refusal Isolation via Subspace Modification).
23
 
24
  <div align="center">
25
 
26
  ### ☕ Support Our Work
27
 
28
- If you find this useful, consider supporting us on Ko-fi!
29
 
30
  [![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz)
31
 
@@ -47,21 +47,118 @@ If you find this useful, consider supporting us on Ko-fi!
47
 
48
  ## Benchmarks
49
 
50
- | Benchmark | Score |
51
- |-----------|-------|
52
- | AIME 2025 | 91.6% |
53
- | τ²-Bench | 79.5% |
54
- | SWE-bench Verified | 59.2% |
55
- | GPQA | 75.2% |
 
 
 
56
 
57
  ## Usage
 
 
 
 
 
 
 
 
 
 
 
58
  ```python
 
59
  from transformers import AutoModelForCausalLM, AutoTokenizer
60
 
61
- model = AutoModelForCausalLM.from_pretrained("Ex0bit/GLM-4.7-Flash-PRISM")
62
- tokenizer = AutoTokenizer.from_pretrained("Ex0bit/GLM-4.7-Flash-PRISM")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ```
64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  ## License
66
 
67
- This model is released under the [PRISM Research License](LICENSE.md).
 
 
 
 
 
14
  ---
15
 
16
  [![Parameters](https://img.shields.io/badge/Parameters-30B--A3B_MoE-blue)]()
17
+ [![Architecture](https://img.shields.io/badge/Architecture-GLM--4.7-green)]()
18
  [![Context](https://img.shields.io/badge/Context-128K-orange)]()
19
 
20
  # GLM-4.7-Flash-PRISM
21
 
22
+ An unrestricted version of [ZAI's GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) with over-refusal mechanisms completely removed using our PRISM Pipeline (Projected Refusal Isolation via Subspace Modification).
23
 
24
  <div align="center">
25
 
26
  ### ☕ Support Our Work
27
 
28
+ If you find this model useful, consider supporting us on Ko-fi!
29
 
30
  [![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ericelbaz)
31
 
 
47
 
48
  ## Benchmarks
49
 
50
+ | Benchmark | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B |
51
+ |-----------|---------------|-----------------------------| ------------|
52
+ | AIME 2025 | 91.6 | 85.0 | 91.7 |
53
+ | GPQA | 75.2 | 73.4 | 71.5 |
54
+ | LCB v6 | 64.0 | 66.0 | 61.0 |
55
+ | HLE | 14.4 | 9.8 | 10.9 |
56
+ | SWE-bench Verified | 59.2 | 22.0 | 34.0 |
57
+ | τ²-Bench | 79.5 | 49.0 | 47.7 |
58
+ | BrowseComp | 42.8 | 2.29 | 28.3 |
59
 
60
  ## Usage
61
+
62
+ ### Transformers
63
+
64
+ Install the latest transformers from source:
65
+
66
+ ```shell
67
+ pip install git+https://github.com/huggingface/transformers.git
68
+ ```
69
+
70
+ Run inference:
71
+
72
  ```python
73
+ import torch
74
  from transformers import AutoModelForCausalLM, AutoTokenizer
75
 
76
+ MODEL_PATH = "Ex0bit/GLM-4.7-Flash-PRISM"
77
+
78
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
79
+ model = AutoModelForCausalLM.from_pretrained(
80
+ MODEL_PATH,
81
+ torch_dtype=torch.bfloat16,
82
+ device_map="auto",
83
+ )
84
+
85
+ messages = [{"role": "user", "content": "Hello!"}]
86
+ inputs = tokenizer.apply_chat_template(
87
+ messages,
88
+ tokenize=True,
89
+ add_generation_prompt=True,
90
+ return_dict=True,
91
+ return_tensors="pt",
92
+ ).to(model.device)
93
+
94
+ generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
95
+ output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:])
96
+ print(output_text)
97
+ ```
98
+
99
+ ### vLLM
100
+
101
+ Install vLLM nightly:
102
+
103
+ ```shell
104
+ pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
105
+ pip install git+https://github.com/huggingface/transformers.git
106
+ ```
107
+
108
+ Serve the model:
109
+
110
+ ```shell
111
+ vllm serve Ex0bit/GLM-4.7-Flash-PRISM \
112
+ --tensor-parallel-size 4 \
113
+ --speculative-config.method mtp \
114
+ --speculative-config.num_speculative_tokens 1 \
115
+ --tool-call-parser glm47 \
116
+ --reasoning-parser glm45 \
117
+ --enable-auto-tool-choice \
118
+ --served-model-name glm-4.7-flash-prism
119
  ```
120
 
121
+ ### SGLang
122
+
123
+ Install SGLang:
124
+
125
+ ```shell
126
+ uv pip install sglang==0.3.2.dev9039+pr-17247.g90c446848 --extra-index-url https://sgl-project.github.io/whl/pr/
127
+ uv pip install git+https://github.com/huggingface/transformers.git@76732b4e7120808ff989edbd16401f61fa6a0afa
128
+ ```
129
+
130
+ Launch the server:
131
+
132
+ ```shell
133
+ python3 -m sglang.launch_server \
134
+ --model-path Ex0bit/GLM-4.7-Flash-PRISM \
135
+ --tp-size 4 \
136
+ --tool-call-parser glm47 \
137
+ --reasoning-parser glm45 \
138
+ --speculative-algorithm EAGLE \
139
+ --speculative-num-steps 3 \
140
+ --speculative-eagle-topk 1 \
141
+ --speculative-num-draft-tokens 4 \
142
+ --mem-fraction-static 0.8 \
143
+ --served-model-name glm-4.7-flash-prism \
144
+ --host 0.0.0.0 \
145
+ --port 8000
146
+ ```
147
+
148
+ > **Note:** For Blackwell GPUs, add `--attention-backend triton --speculative-draft-attention-backend triton` to your SGLang launch command.
149
+
150
+ ## Recommended Parameters
151
+
152
+ | Use Case | Temperature | Top-P | Max New Tokens |
153
+ |----------|-------------|-------|----------------|
154
+ | Default | 1.0 | 0.95 | 131072 |
155
+ | Code (SWE-bench) | 0.7 | 1.0 | 16384 |
156
+ | Agentic Tasks | 0.0 | — | 16384 |
157
+
158
  ## License
159
 
160
+ This model is released under the [PRISM Research License](LICENSE.md).
161
+
162
+ ## Acknowledgments
163
+
164
+ Based on [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) by [Z.AI](https://z.ai). See the [technical report](https://arxiv.org/abs/2508.06471) for more details on the base model.