File size: 8,249 Bytes
516d29c
 
 
48bcd99
516d29c
 
2672aa8
516d29c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48bcd99
516d29c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d75f098
516d29c
 
 
 
 
 
 
 
d75f098
516d29c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
license: apache-2.0
base_model: zai-org/GLM-4.6V-Flash
model_name: Elbaz-GLM-4.6V-Flash-PRISM
tags:
  - abliteration
  - SOTA Abliteration Pipeline - PRISM
  - vision-language-model
  - vlm
  - glm
  - gguf
  - quantized
language:
  - en
library_name: transformers
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/logo.svg" width="400"/>
</p>

# ELBAZ GLM-4.6V-FLASH PRISM (Uncensored)

**GLM-4.6V-Flash: A 10B Dense Vision-Language Model**

[GLM-4.6V-Flash](https://huggingface.co/zai-org/GLM-4.6V-Flash) | [ZhipuAI](https://www.zhipuai.cn/)

## Introduction

**GLM-4.6V-Flash** is a 10.29B parameter dense Vision-Language Model (VLM) with a 40-layer transformer architecture and integrated vision encoder, capable of understanding both text and images.

## Model Description

This model is an **abliterated** version of [zai-org/GLM-4.6V-Flash](https://huggingface.co/zai-org/GLM-4.6V-Flash) that has had its refusal mechanisms removed using **PRISM (Projected Refusal Isolation via Subspace Modification)**. The model will respond to prompts that the original model would refuse.

**Key Specs:**
- 10.29B parameter dense Vision-Language Model
- 40-layer transformer architecture
- Integrated vision encoder for image understanding
- 128K context length
- Supports text, image, and video inputs

### Motivation

This project exists as **research and development experimentation** into understanding how large language models encode and enforce refusal behaviors, contributing to broader AI safety research by providing empirical data on refusal mechanism localization and tradeoffs between safety and capability.

### Author

**Eric Elbaz (Ex0bit)**

## Model Tree

```
zai-org/GLM-4.6V-Flash (Base Model - BF16)
└── Ex0bit/Elbaz-GLM-4.6V-Flash-PRISM (This Model)
    └── Elbaz-GLM-4.6V-Flash-PRISM-IQ4_XS.gguf
```

## Available Quantizations

| Quantization | Size | Description |
|-------------|------|-------------|
| IQ4_XS | 5.0 GB | Importance-weighted 4-bit, excellent quality |

The IQ4_XS quantization uses importance-weighted quantization which provides better quality than standard Q4 quantizations at similar sizes. Embedding and output layers use Q6_K precision for optimal quality.

## Prompt Format

This model uses the GLM chat format with optional thinking/reasoning support:

```
[gMASK]<sop><|system|>
{system_prompt}<|user|>
{user_prompt}<|assistant|>
```

### Template Structure

| Component | Token/Format |
|-----------|-------------|
| System Start | `<\|system\|>` |
| User Start | `<\|user\|>` |
| Assistant Start | `<\|assistant\|>` |
| Thinking Start | `<think>` |
| Thinking End | `</think>` |
| End of Text | `<\|endoftext\|>` |

### Special Tokens

| Token | ID | Purpose |
|-------|-----|---------|
| `<\|system\|>` | 151335 | System prompt marker |
| `<\|user\|>` | 151336 | User message marker |
| `<\|assistant\|>` | 151337 | Assistant response marker |
| `<think>` | 151350 | Reasoning block start |
| `</think>` | 151351 | Reasoning block end |
| `<\|endoftext\|>` | 151329 | EOS token |
| `<\|begin_of_image\|>` | 151339 | Image input start |
| `<\|end_of_image\|>` | 151340 | Image input end |

## Technical Details

### Performance Impact

| Metric | Result |
|--------|--------|
| Refusal Bypass Rate | 100% |
| English Output Rate | 100% |
| KL Divergence | 0.0000 (no capability degradation) |
| Response Coherence | Detailed, technically accurate |

Testing shows that PRISM abliteration maintains full model coherence with no measurable capability degradation.

## Quick Start

### Using with llama.cpp

```bash
# Download the model
huggingface-cli download Ex0bit/Elbaz-GLM-4.6V-Flash-PRISM \
    Elbaz-GLM-4.6V-Flash-PRISM-IQ4_XS.gguf \
    --local-dir .

# Run inference
./llama-cli -m Elbaz-GLM-4.6V-Flash-PRISM-IQ4_XS.gguf \
    -p "[gMASK]<sop><|system|>
You are a helpful assistant. You MUST respond in English only.<|user|>
Your prompt here<|assistant|>
" \
    -n 2048 \
    --temp 0.7 \
    -ngl 999
```

### llama.cpp with llama-server

```bash
# Start the server
./llama-server -m Elbaz-GLM-4.6V-Flash-PRISM-IQ4_XS.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -ngl 999 \
    -c 32768

# Example API call
curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "messages": [
            {"role": "system", "content": "You are a helpful assistant. You MUST respond in English only."},
            {"role": "user", "content": "Your prompt here"}
        ],
        "temperature": 0.7
    }'
```

### Using with Ollama

```bash
# Pull and run directly from Hugging Face
ollama pull hf.co/Ex0bit/Elbaz-GLM-4.6V-Flash-PRISM
ollama run hf.co/Ex0bit/Elbaz-GLM-4.6V-Flash-PRISM
```

> **Note:** The `hf.co/` prefix is required to pull from Hugging Face. Requires Ollama 0.3.0+.

### Using with Transformers (Full Weights)

```python
from transformers import AutoModelForCausalLM, AutoProcessor

model_id = "Ex0bit/Elbaz-GLM-4.6V-Flash-PRISM"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", trust_remote_code=True)

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant. You MUST respond in English only."}]},
    {"role": "user", "content": [{"type": "text", "text": "Your prompt here"}]}
]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, do_sample=True)
print(processor.decode(outputs[0], skip_special_tokens=False))
```

## PRISM Methodology

### Method: Projected Refusal Isolation via Subspace Modification

The model was abliterated using **PRISM** - a state-of-the-art abliteration methodology combining multiple principled techniques for effective refusal removal while preserving model capabilities.



## Hardware Requirements

| Quantization | Min RAM/VRAM | Recommended | Hardware Examples |
|-------------|--------------|-------------|-------------------|
| IQ4_XS | T GB | 12+ GB | RTX 3060 12GB, RTX 4070, Apple M1/M2/M3/M4 |

### Tested Configurations

| Hardware | RAM/VRAM | Status |
|----------|----------|--------|
| NVIDIA RTX GPU | 12+ GB | Works |
| Apple Silicon | 16+ GB Unified | Works |

**Note:** This is a relatively lightweight model that can run on consumer hardware with 12GB+ or less VRAM.

## Vision Capabilities

GLM-4.6V-Flash supports multimodal inputs:

- **Images**: Use `<|begin_of_image|><|image|><|end_of_image|>` tags
- **Videos**: Use `<|begin_of_video|><|video|><|end_of_video|>` tags

Example with image:
```python
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/image.jpg"},
            {"type": "text", "text": "What is in this image?"}
        ]
    }
]
```

## Ethical Considerations

This model has been modified to reduce safety guardrails. Users are responsible for:

- Complying with all applicable laws and regulations
- Not using the model for illegal activities
- Understanding the potential risks of unrestricted AI responses
- Implementing appropriate safeguards in production environments

## License

Apache 2.0 (same as base model [zai-org/GLM-4.6V-Flash](https://huggingface.co/zai-org/GLM-4.6V-Flash))

## Citation

```bibtex
@misc{elbaz2025glm46vprism,
  author = {Elbaz, Eric},
  title = {Elbaz-GLM-4.6V-Flash-PRISM: An Abliterated GLM-4.6V Vision-Language Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-GLM-4.6V-Flash-PRISM}}
}
```

## Acknowledgments

- [ZhipuAI](https://www.zhipuai.cn/) for GLM-4.6V-Flash
- [llama.cpp](https://github.com/ggerganov/llama.cpp) for quantization tools

## Related Models

- [zai-org/GLM-4.6V-Flash](https://huggingface.co/zai-org/GLM-4.6V-Flash) - Base model
- [Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated](https://huggingface.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated) - INTELLECT-3 abliterated

---

**Created by: Ex0bit (Eric Elbaz)**