mlx-community/glm-4.7-flash-abliterated-8bit
This is an 8-bit quantized MLX conversion of huihui-ai/Huihui-GLM-4.7-Flash-abliterated, optimized for fast inference on Apple Silicon.
Model Details
- Original model: huihui-ai/Huihui-GLM-4.7-Flash-abliterated
- Format: MLX 8-bit quantized
- Hardware: Apple Silicon (M-series)
Use with mlx-openai-server
mlx-openai-server is a high-performance OpenAI-compatible API server for MLX models on Apple Silicon. It supports text, vision, audio, and image generation models with a drop-in OpenAI replacement.
Install
pip install mlx-openai-server
Or install from GitHub:
pip install git+https://github.com/cubist38/mlx-openai-server.git
Run the server
mlx-openai-server launch --model-path mlx-community/glm-4.7-flash-abliterated-8bit --reasoning-parser glm47_flash --tool-call-parser glm4_moe
This serves an OpenAI-compatible API at http://localhost:8000/v1. For a local copy of the weights, point --model-path at that directory and keep the same parsers.
See the mlx-openai-server README for more options (config files, multiple models, speculative decoding, etc.).
Call the API
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
# Model id matches --model-path unless you set --served-model-name.
response = client.chat.completions.create(
model="mlx-community/glm-4.7-flash-abliterated-8bit",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
You can also use curl, the OpenAI CLI, or any tool that supports a custom base_url.
Features
- Reasoning support via
--reasoning-parser glm47_flash - Tool calling via
--tool-call-parser glm4_moe - Multi-model mode — run alongside other models in a single server using a YAML config
- Speculative decoding for faster generation
- Streaming support out of the box
- Downloads last month
- 533
Model size
30B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
8-bit
Model tree for mlx-community/glm-4.7-flash-abliterated-8bit
Base model
zai-org/GLM-4.7-Flash