GLM-4.7-Flash-PRISM-mlx-4bit / README.md

shieldstackllc

Add vMLX logo to README

092b41d verified 2 days ago

preview code

raw

history blame contribute delete

2.02 kB

metadata

language:
  - en
  - zh
license: other
license_name: glm-4-license
pipeline_tag: text-generation
tags:
  - mlx
  - glm4
  - moe
  - prism
  - abliterated
  - 4bit
  - quantized
  - apple-silicon
library_name: mlx
base_model: Ex0bit/GLM-4.7-Flash-PRISM

GLM-4.7-Flash-PRISM — MLX 4-bit

MLX 4-bit quantized version of Ex0bit/GLM-4.7-Flash-PRISM for efficient local inference on Apple Silicon.

Quantization: 4-bit (4.5 bits per weight, group size 64, affine mode)
Architecture: GLM-4 MoE Lite — 47 layers, 64 routed experts, 4 active per token
Context: 202K tokens
Size: ~16 GB

Usage

from mlx_lm import load, generate

model, tokenizer = load("shieldstackllc/GLM-4.7-Flash-PRISM-mlx-4bit")
response = generate(model, tokenizer, prompt="Hello!", verbose=True)

Or with vMLX for native macOS inference.

About

This model is an abliterated (uncensored) variant of GLM-4.7-Flash, a Mixture-of-Experts language model by Zhipu AI / THUDM. The abliteration was done by Ex0bit as part of the PRISM series. MLX quantization by vMLX.

Also Available

GLM-4.7-Flash-PRISM MLX 8-bit (~30 GB)

Made for vMLX

This model was converted and optimized for vMLX — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration.

Credits

Base model: THUDM/GLM-4 by Zhipu AI
Abliteration: Ex0bit/GLM-4.7-Flash-PRISM
MLX conversion: vMLX — Run AI locally on Mac. No compromises.

Contact

For questions, issues, or collaboration: admin@vmlx.net