Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -65,16 +65,27 @@ This project exists as **research and development experimentation** into underst
|
|
| 65 |
## Model Tree
|
| 66 |
```
|
| 67 |
zai-org/GLM-4.7 (Base Model - BF16)
|
| 68 |
-
βββ Ex0bit/
|
| 69 |
-
|
|
|
|
|
|
|
| 70 |
```
|
| 71 |
|
| 72 |
## Available Quantizations
|
| 73 |
|
| 74 |
-
|
|
| 75 |
-
|
| 76 |
-
| IQ4_XS |
|
| 77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
## Prompt Format
|
| 80 |
|
|
@@ -228,12 +239,8 @@ print(completion.choices[0].message.content)
|
|
| 228 |
```
|
| 229 |
|
| 230 |
### Using with Ollama
|
| 231 |
-
```bash
|
| 232 |
-
ollama serve &
|
| 233 |
-
ollama run hf.co/Ex0bit/GLM-4.7-PRISM:IQ4_XS
|
| 234 |
-
```
|
| 235 |
|
| 236 |
-
> **Note:**
|
| 237 |
|
| 238 |
## Thinking Mode Configuration
|
| 239 |
|
|
|
|
| 65 |
## Model Tree
|
| 66 |
```
|
| 67 |
zai-org/GLM-4.7 (Base Model - BF16)
|
| 68 |
+
βββ Ex0bit/GLM-4.7-PRISM (This Model)
|
| 69 |
+
βββ GLM-4.7-PRISM-IQ4_XS.gguf (186 GB)
|
| 70 |
+
βββ GLM-4.7-PRISM-IQ1_S.gguf (100 GB)
|
| 71 |
+
βββ GLM-4.7-PRISM-imatrix.dat
|
| 72 |
```
|
| 73 |
|
| 74 |
## Available Quantizations
|
| 75 |
|
| 76 |
+
| File | Size | BPW | Description |
|
| 77 |
+
|------|------|-----|-------------|
|
| 78 |
+
| [GLM-4.7-PRISM-IQ4_XS.gguf](./GLM-4.7-PRISM-IQ4_XS.gguf) | 186 GB | 4.5 | High quality 4-bit, recommended for most users |
|
| 79 |
+
| [GLM-4.7-PRISM-IQ1_S.gguf](./GLM-4.7-PRISM-IQ1_S.gguf) | 100 GB | 2.4 | Ultra-compact 1.5-bit, fits smaller setups |
|
| 80 |
+
| [GLM-4.7-PRISM-imatrix.dat](./GLM-4.7-PRISM-imatrix.dat) | 100 MB | - | Importance matrix for custom quantization |
|
| 81 |
+
| BF16 (safetensors) | ~700 GB | 16 | Full precision source weights |
|
| 82 |
+
|
| 83 |
+
### GGUF Usage Notes
|
| 84 |
+
|
| 85 |
+
- **Requires llama.cpp** (or compatible forks like ik_llama.cpp, koboldcpp)
|
| 86 |
+
- **Ollama not yet supported** - IQ quantization types not recognized by Ollama's bundled llama.cpp
|
| 87 |
+
- Tested at **50 tokens/sec** on 4x H100 80GB
|
| 88 |
+
- imatrix generated from Wikipedia training data (PPL 4.97)
|
| 89 |
|
| 90 |
## Prompt Format
|
| 91 |
|
|
|
|
| 239 |
```
|
| 240 |
|
| 241 |
### Using with Ollama
|
|
|
|
|
|
|
|
|
|
|
|
|
| 242 |
|
| 243 |
+
> **Note:** Ollama does not currently support IQ quantization types. Use llama.cpp directly instead.
|
| 244 |
|
| 245 |
## Thinking Mode Configuration
|
| 246 |
|