WeDLM-8B-Instruct-MLX-4bit
This is a 4-bit quantized MLX version of tencent/WeDLM-8B-Instruct for efficient inference on Apple Silicon.
It currently does not work too well or provide meaningfull speedup due to lack of pre compilation. https://github.com/ZimengXiong/WeDLM-MLX/tree/main
Related Models
| Variant | HuggingFace |
|---|---|
| 4-bit (this model) | zimengxiong/WeDLM-8B-Instruct-MLX-4bit |
| 8-bit | zimengxiong/WeDLM-8B-Instruct-MLX-8bit |
| fp16 | zimengxiong/WeDLM-8B-Instruct-MLX |
License
This model inherits the license from the base model tencent/WeDLM-8B-Instruct.
- Downloads last month
- 54
Model tree for zimengxiong/WeDLM-8B-Instruct-MLX-4bit
Base model
tencent/WeDLM-8B-Instruct