Ex0bit commited on
Commit
4775435
Β·
verified Β·
1 Parent(s): e567833

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -28
README.md CHANGED
@@ -7,7 +7,6 @@ tags:
7
  - SOTA Abliteration Pipeline - PRISM
8
  - glm
9
  - glm4_moe
10
- - gguf
11
  - quantized
12
  - finetuned
13
  - uncensored
@@ -66,27 +65,8 @@ This project exists as **research and development experimentation** into underst
66
  ```
67
  zai-org/GLM-4.7 (Base Model - BF16)
68
  └── Ex0bit/GLM-4.7-PRISM (This Model)
69
- β”œβ”€β”€ GLM-4.7-PRISM-IQ4_XS.gguf (186 GB)
70
- β”œβ”€β”€ GLM-4.7-PRISM-IQ1_S.gguf (100 GB)
71
- └── GLM-4.7-PRISM-imatrix.dat
72
  ```
73
 
74
- ## Available Quantizations
75
-
76
- | File | Size | BPW | Description |
77
- |------|------|-----|-------------|
78
- | [GLM-4.7-PRISM-IQ4_XS.gguf](./GLM-4.7-PRISM-IQ4_XS.gguf) | 186 GB | 4.5 | High quality 4-bit, recommended for most users |
79
- | [GLM-4.7-PRISM-IQ1_S.gguf](./GLM-4.7-PRISM-IQ1_S.gguf) | 100 GB | 2.4 | Ultra-compact 1.5-bit, fits smaller setups |
80
- | [GLM-4.7-PRISM-imatrix.dat](./GLM-4.7-PRISM-imatrix.dat) | 100 MB | - | Importance matrix for custom quantization |
81
- | BF16 (safetensors) | ~700 GB | 16 | Full precision source weights |
82
-
83
- ### GGUF Usage Notes
84
-
85
- - **Requires llama.cpp** (or compatible forks like ik_llama.cpp, koboldcpp)
86
- - **Ollama not yet supported** - IQ quantization types not recognized by Ollama's bundled llama.cpp
87
- - Tested at **50 tokens/sec** on 4x H100 80GB
88
- - imatrix generated from Wikipedia training data (PPL 4.97)
89
-
90
  ## Prompt Format
91
 
92
  This model uses the GLM chat format with thinking/reasoning support:
@@ -291,13 +271,6 @@ extra_body={"chat_template_kwargs": {"enable_thinking": False}}
291
 
292
  The model was abliterated using **PRISM** - a state-of-the-art abliteration methodology combining multiple principled techniques for effective refusal removal while preserving model capabilities.
293
 
294
- ## Hardware Requirements
295
-
296
- | Configuration | Min VRAM/RAM | Recommended | Notes |
297
- | --- | --- | --- | --- |
298
- | GGUF 2-bit (UD-Q2_K_XL) | 24GB VRAM + 128GB RAM | 160GB+ combined | MoE offloading to CPU |
299
- | GGUF 4-bit | 40GB VRAM + 165GB RAM | 205GB+ combined | ~5 tokens/s |
300
- | BF16 | 400GB+ | Multi-GPU setup | Full precision |
301
 
302
  ### MoE Offloading Tips (llama.cpp)
303
 
@@ -345,7 +318,7 @@ MIT (same as base model [zai-org/GLM-4.7](https://huggingface.co/zai-org/GLM-4.7
345
  * [ZhipuAI](https://www.zhipuai.cn/) for GLM-4.7
346
  * [llama.cpp](https://github.com/ggerganov/llama.cpp) for quantization tools
347
  * [Unsloth](https://unsloth.ai/) for GGUF guides and optimizations
348
- * The GLM Team for the outstanding foundation model
349
 
350
  ## Related Models
351
 
 
7
  - SOTA Abliteration Pipeline - PRISM
8
  - glm
9
  - glm4_moe
 
10
  - quantized
11
  - finetuned
12
  - uncensored
 
65
  ```
66
  zai-org/GLM-4.7 (Base Model - BF16)
67
  └── Ex0bit/GLM-4.7-PRISM (This Model)
 
 
 
68
  ```
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ## Prompt Format
71
 
72
  This model uses the GLM chat format with thinking/reasoning support:
 
271
 
272
  The model was abliterated using **PRISM** - a state-of-the-art abliteration methodology combining multiple principled techniques for effective refusal removal while preserving model capabilities.
273
 
 
 
 
 
 
 
 
274
 
275
  ### MoE Offloading Tips (llama.cpp)
276
 
 
318
  * [ZhipuAI](https://www.zhipuai.cn/) for GLM-4.7
319
  * [llama.cpp](https://github.com/ggerganov/llama.cpp) for quantization tools
320
  * [Unsloth](https://unsloth.ai/) for GGUF guides and optimizations
321
+ * The z.aiGLM Team for the outstanding foundation model
322
 
323
  ## Related Models
324