Update README.md

Browse files

Files changed (1) hide show

README.md +1 -28

README.md CHANGED Viewed

@@ -7,7 +7,6 @@ tags:
 - SOTA Abliteration Pipeline - PRISM
 - glm
 - glm4_moe
-- gguf
 - quantized
 - finetuned
 - uncensored
@@ -66,27 +65,8 @@ This project exists as **research and development experimentation** into underst
 ```
 zai-org/GLM-4.7 (Base Model - BF16)
 └── Ex0bit/GLM-4.7-PRISM (This Model)
-    ├── GLM-4.7-PRISM-IQ4_XS.gguf (186 GB)
-    ├── GLM-4.7-PRISM-IQ1_S.gguf (100 GB)
-    └── GLM-4.7-PRISM-imatrix.dat
 ```
-## Available Quantizations
-| File | Size | BPW | Description |
-|------|------|-----|-------------|
-| [GLM-4.7-PRISM-IQ4_XS.gguf](./GLM-4.7-PRISM-IQ4_XS.gguf) | 186 GB | 4.5 | High quality 4-bit, recommended for most users |
-| [GLM-4.7-PRISM-IQ1_S.gguf](./GLM-4.7-PRISM-IQ1_S.gguf) | 100 GB | 2.4 | Ultra-compact 1.5-bit, fits smaller setups |
-| [GLM-4.7-PRISM-imatrix.dat](./GLM-4.7-PRISM-imatrix.dat) | 100 MB | - | Importance matrix for custom quantization |
-| BF16 (safetensors) | ~700 GB | 16 | Full precision source weights |
-### GGUF Usage Notes
-- **Requires llama.cpp** (or compatible forks like ik_llama.cpp, koboldcpp)
-- **Ollama not yet supported** - IQ quantization types not recognized by Ollama's bundled llama.cpp
-- Tested at **50 tokens/sec** on 4x H100 80GB
-- imatrix generated from Wikipedia training data (PPL 4.97)
 ## Prompt Format
 This model uses the GLM chat format with thinking/reasoning support:
@@ -291,13 +271,6 @@ extra_body={"chat_template_kwargs": {"enable_thinking": False}}
 The model was abliterated using **PRISM** - a state-of-the-art abliteration methodology combining multiple principled techniques for effective refusal removal while preserving model capabilities.
-## Hardware Requirements
-| Configuration | Min VRAM/RAM | Recommended | Notes |
-| --- | --- | --- | --- |
-| GGUF 2-bit (UD-Q2_K_XL) | 24GB VRAM + 128GB RAM | 160GB+ combined | MoE offloading to CPU |
-| GGUF 4-bit | 40GB VRAM + 165GB RAM | 205GB+ combined | ~5 tokens/s |
-| BF16 | 400GB+ | Multi-GPU setup | Full precision |
 ### MoE Offloading Tips (llama.cpp)
@@ -345,7 +318,7 @@ MIT (same as base model [zai-org/GLM-4.7](https://huggingface.co/zai-org/GLM-4.7
 * [ZhipuAI](https://www.zhipuai.cn/) for GLM-4.7
 * [llama.cpp](https://github.com/ggerganov/llama.cpp) for quantization tools
 * [Unsloth](https://unsloth.ai/) for GGUF guides and optimizations
-* The GLM Team for the outstanding foundation model
 ## Related Models

 - SOTA Abliteration Pipeline - PRISM
 - glm
 - glm4_moe
 - quantized
 - finetuned
 - uncensored
 ```
 zai-org/GLM-4.7 (Base Model - BF16)
 └── Ex0bit/GLM-4.7-PRISM (This Model)
 ```
 ## Prompt Format
 This model uses the GLM chat format with thinking/reasoning support:
 The model was abliterated using **PRISM** - a state-of-the-art abliteration methodology combining multiple principled techniques for effective refusal removal while preserving model capabilities.
 ### MoE Offloading Tips (llama.cpp)
 * [ZhipuAI](https://www.zhipuai.cn/) for GLM-4.7
 * [llama.cpp](https://github.com/ggerganov/llama.cpp) for quantization tools
 * [Unsloth](https://unsloth.ai/) for GGUF guides and optimizations
+* The z.aiGLM Team for the outstanding foundation model
 ## Related Models