Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -65,16 +65,27 @@ This project exists as **research and development experimentation** into underst
 ## Model Tree
 ```
 zai-org/GLM-4.7 (Base Model - BF16)
-└── Ex0bit/Elbaz-GLM-4.7-PRISM (This Model)
-    └── Elbaz-GLM-4.7-PRISM-IQ4_XS.gguf
 ```
 ## Available Quantizations
-| Quantization | Size | Description |
-| --- | --- | --- |
-| IQ4_XS | TBD | Importance-weighted 4-bit, excellent quality |
-| BF16 | ~717 GB | Full precision weights |
 ## Prompt Format
@@ -228,12 +239,8 @@ print(completion.choices[0].message.content)
 ```
 ### Using with Ollama
-```bash
-ollama serve &
-ollama run hf.co/Ex0bit/GLM-4.7-PRISM:IQ4_XS
-```
-> **Note:** The `hf.co/` prefix is required to pull from Hugging Face. Requires Ollama 0.3.0+.
 ## Thinking Mode Configuration

 ## Model Tree
 ```
 zai-org/GLM-4.7 (Base Model - BF16)
+└── Ex0bit/GLM-4.7-PRISM (This Model)
+    ├── GLM-4.7-PRISM-IQ4_XS.gguf (186 GB)
+    ├── GLM-4.7-PRISM-IQ1_S.gguf (100 GB)
+    └── GLM-4.7-PRISM-imatrix.dat
 ```
 ## Available Quantizations
+| File | Size | BPW | Description |
+|------|------|-----|-------------|
+| [GLM-4.7-PRISM-IQ4_XS.gguf](./GLM-4.7-PRISM-IQ4_XS.gguf) | 186 GB | 4.5 | High quality 4-bit, recommended for most users |
+| [GLM-4.7-PRISM-IQ1_S.gguf](./GLM-4.7-PRISM-IQ1_S.gguf) | 100 GB | 2.4 | Ultra-compact 1.5-bit, fits smaller setups |
+| [GLM-4.7-PRISM-imatrix.dat](./GLM-4.7-PRISM-imatrix.dat) | 100 MB | - | Importance matrix for custom quantization |
+| BF16 (safetensors) | ~700 GB | 16 | Full precision source weights |
+### GGUF Usage Notes
+- **Requires llama.cpp** (or compatible forks like ik_llama.cpp, koboldcpp)
+- **Ollama not yet supported** - IQ quantization types not recognized by Ollama's bundled llama.cpp
+- Tested at **50 tokens/sec** on 4x H100 80GB
+- imatrix generated from Wikipedia training data (PPL 4.97)
 ## Prompt Format
 ```
 ### Using with Ollama
+> **Note:** Ollama does not currently support IQ quantization types. Use llama.cpp directly instead.
 ## Thinking Mode Configuration