mlx-community/glm-4.7-flash-abliterated-8bit

This is an 8-bit quantized MLX conversion of huihui-ai/Huihui-GLM-4.7-Flash-abliterated, optimized for fast inference on Apple Silicon.

Model Details

Use with mlx-openai-server

mlx-openai-server is a high-performance OpenAI-compatible API server for MLX models on Apple Silicon. It supports text, vision, audio, and image generation models with a drop-in OpenAI replacement.

Install

pip install mlx-openai-server

Or install from GitHub:

pip install git+https://github.com/cubist38/mlx-openai-server.git

Run the server

mlx-openai-server launch --model-path mlx-community/glm-4.7-flash-abliterated-8bit --reasoning-parser glm47_flash --tool-call-parser glm4_moe

This serves an OpenAI-compatible API at http://localhost:8000/v1. For a local copy of the weights, point --model-path at that directory and keep the same parsers.

See the mlx-openai-server README for more options (config files, multiple models, speculative decoding, etc.).

Call the API

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

# Model id matches --model-path unless you set --served-model-name.
response = client.chat.completions.create(
    model="mlx-community/glm-4.7-flash-abliterated-8bit",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

You can also use curl, the OpenAI CLI, or any tool that supports a custom base_url.

Features

  • Reasoning support via --reasoning-parser glm47_flash
  • Tool calling via --tool-call-parser glm4_moe
  • Multi-model mode — run alongside other models in a single server using a YAML config
  • Speculative decoding for faster generation
  • Streaming support out of the box
Downloads last month
533
Safetensors
Model size
30B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/glm-4.7-flash-abliterated-8bit

Quantized
(9)
this model