North-Mini-Code-1.0 MLX MXFP4

MXFP4 MLX conversion of CohereLabs/North-Mini-Code-1.0.

  • Source revision: effaeda477c041c107d5a3d8c599cb5d6c5878ef
  • Architecture: Cohere2MoeForCausalLM / cohere2_moe
  • Quantization: MLX mxfp4, group size 32, 4 bits
  • Artifact size: 17.59 GB, 4 safetensor shards
  • Verification: conversion completed, headers readable, smoke test passed
  • Benchmark on M2 Max 32 GB: 218.449 prompt tok/s, 42.385 generation tok/s, 17.788 GB peak memory

Requires pinned experimental MLX-LM cohere2_moe support until it lands in a release:

pip install "mlx-lm @ git+https://github.com/Terrencezzj/mlx-lm.git@f43507c5c30bdebdb92d308ac11aa8f96b418c2e"
Downloads last month
-
Safetensors
Model size
30B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bsisduck/North-Mini-Code-1.0-MLX-MXFP4

Quantized
(14)
this model