shieldstackllc's picture
Add vMLX logo to README
092b41d verified
metadata
language:
  - en
  - zh
license: other
license_name: glm-4-license
pipeline_tag: text-generation
tags:
  - mlx
  - glm4
  - moe
  - prism
  - abliterated
  - 4bit
  - quantized
  - apple-silicon
library_name: mlx
base_model: Ex0bit/GLM-4.7-Flash-PRISM

vMLX

GLM-4.7-Flash-PRISM — MLX 4-bit

MLX 4-bit quantized version of Ex0bit/GLM-4.7-Flash-PRISM for efficient local inference on Apple Silicon.

  • Quantization: 4-bit (4.5 bits per weight, group size 64, affine mode)
  • Architecture: GLM-4 MoE Lite — 47 layers, 64 routed experts, 4 active per token
  • Context: 202K tokens
  • Size: ~16 GB

Usage

from mlx_lm import load, generate

model, tokenizer = load("shieldstackllc/GLM-4.7-Flash-PRISM-mlx-4bit")
response = generate(model, tokenizer, prompt="Hello!", verbose=True)

Or with vMLX for native macOS inference.

About

This model is an abliterated (uncensored) variant of GLM-4.7-Flash, a Mixture-of-Experts language model by Zhipu AI / THUDM. The abliteration was done by Ex0bit as part of the PRISM series. MLX quantization by vMLX.

Also Available

Made for vMLX

This model was converted and optimized for vMLX — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration.

Credits

Contact

For questions, issues, or collaboration: admin@vmlx.net