Ex0bit commited on
Commit
da7db52
Β·
verified Β·
1 Parent(s): 3242e4c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +18 -11
README.md CHANGED
@@ -65,16 +65,27 @@ This project exists as **research and development experimentation** into underst
65
  ## Model Tree
66
  ```
67
  zai-org/GLM-4.7 (Base Model - BF16)
68
- └── Ex0bit/Elbaz-GLM-4.7-PRISM (This Model)
69
- └── Elbaz-GLM-4.7-PRISM-IQ4_XS.gguf
 
 
70
  ```
71
 
72
  ## Available Quantizations
73
 
74
- | Quantization | Size | Description |
75
- | --- | --- | --- |
76
- | IQ4_XS | TBD | Importance-weighted 4-bit, excellent quality |
77
- | BF16 | ~717 GB | Full precision weights |
 
 
 
 
 
 
 
 
 
78
 
79
  ## Prompt Format
80
 
@@ -228,12 +239,8 @@ print(completion.choices[0].message.content)
228
  ```
229
 
230
  ### Using with Ollama
231
- ```bash
232
- ollama serve &
233
- ollama run hf.co/Ex0bit/GLM-4.7-PRISM:IQ4_XS
234
- ```
235
 
236
- > **Note:** The `hf.co/` prefix is required to pull from Hugging Face. Requires Ollama 0.3.0+.
237
 
238
  ## Thinking Mode Configuration
239
 
 
65
  ## Model Tree
66
  ```
67
  zai-org/GLM-4.7 (Base Model - BF16)
68
+ └── Ex0bit/GLM-4.7-PRISM (This Model)
69
+ β”œβ”€β”€ GLM-4.7-PRISM-IQ4_XS.gguf (186 GB)
70
+ β”œβ”€β”€ GLM-4.7-PRISM-IQ1_S.gguf (100 GB)
71
+ └── GLM-4.7-PRISM-imatrix.dat
72
  ```
73
 
74
  ## Available Quantizations
75
 
76
+ | File | Size | BPW | Description |
77
+ |------|------|-----|-------------|
78
+ | [GLM-4.7-PRISM-IQ4_XS.gguf](./GLM-4.7-PRISM-IQ4_XS.gguf) | 186 GB | 4.5 | High quality 4-bit, recommended for most users |
79
+ | [GLM-4.7-PRISM-IQ1_S.gguf](./GLM-4.7-PRISM-IQ1_S.gguf) | 100 GB | 2.4 | Ultra-compact 1.5-bit, fits smaller setups |
80
+ | [GLM-4.7-PRISM-imatrix.dat](./GLM-4.7-PRISM-imatrix.dat) | 100 MB | - | Importance matrix for custom quantization |
81
+ | BF16 (safetensors) | ~700 GB | 16 | Full precision source weights |
82
+
83
+ ### GGUF Usage Notes
84
+
85
+ - **Requires llama.cpp** (or compatible forks like ik_llama.cpp, koboldcpp)
86
+ - **Ollama not yet supported** - IQ quantization types not recognized by Ollama's bundled llama.cpp
87
+ - Tested at **50 tokens/sec** on 4x H100 80GB
88
+ - imatrix generated from Wikipedia training data (PPL 4.97)
89
 
90
  ## Prompt Format
91
 
 
239
  ```
240
 
241
  ### Using with Ollama
 
 
 
 
242
 
243
+ > **Note:** Ollama does not currently support IQ quantization types. Use llama.cpp directly instead.
244
 
245
  ## Thinking Mode Configuration
246