Instructions to use mlx-community/glm-4.7-flash-abliterated-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/glm-4.7-flash-abliterated-8bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/glm-4.7-flash-abliterated-8bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi
How to use mlx-community/glm-4.7-flash-abliterated-8bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/glm-4.7-flash-abliterated-8bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/glm-4.7-flash-abliterated-8bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/glm-4.7-flash-abliterated-8bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/glm-4.7-flash-abliterated-8bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/glm-4.7-flash-abliterated-8bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/glm-4.7-flash-abliterated-8bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/glm-4.7-flash-abliterated-8bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/glm-4.7-flash-abliterated-8bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/glm-4.7-flash-abliterated-8bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
mlx-community/glm-4.7-flash-abliterated-8bit
This is an 8-bit quantized MLX conversion of huihui-ai/Huihui-GLM-4.7-Flash-abliterated, optimized for fast inference on Apple Silicon.
Model Details
- Original model: huihui-ai/Huihui-GLM-4.7-Flash-abliterated
- Format: MLX 8-bit quantized
- Hardware: Apple Silicon (M-series)
Use with mlx-openai-server
mlx-openai-server is a high-performance OpenAI-compatible API server for MLX models on Apple Silicon. It supports text, vision, audio, and image generation models with a drop-in OpenAI replacement.
Install
pip install mlx-openai-server
Or install from GitHub:
pip install git+https://github.com/cubist38/mlx-openai-server.git
Run the server
mlx-openai-server launch --model-path mlx-community/glm-4.7-flash-abliterated-8bit --reasoning-parser glm47_flash --tool-call-parser glm4_moe
This serves an OpenAI-compatible API at http://localhost:8000/v1. For a local copy of the weights, point --model-path at that directory and keep the same parsers.
See the mlx-openai-server README for more options (config files, multiple models, speculative decoding, etc.).
Call the API
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
# Model id matches --model-path unless you set --served-model-name.
response = client.chat.completions.create(
model="mlx-community/glm-4.7-flash-abliterated-8bit",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
You can also use curl, the OpenAI CLI, or any tool that supports a custom base_url.
Features
- Reasoning support via
--reasoning-parser glm47_flash - Tool calling via
--tool-call-parser glm4_moe - Multi-model mode — run alongside other models in a single server using a YAML config
- Speculative decoding for faster generation
- Streaming support out of the box
- Downloads last month
- 357
8-bit
Model tree for mlx-community/glm-4.7-flash-abliterated-8bit
Base model
zai-org/GLM-4.7-Flash