Upload folder using huggingface_hub

3f9241e verified 2 months ago

1.98 kB

base_model: MiniMaxAI/MiniMax-M2.5
library_name: mlx
tags:
  - mlx
  - quantized
  - 4bit
  - minimax_m2
  - text-generation
  - conversational
  - apple-silicon
license: other
license_name: modified-mit
license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.5/blob/main/LICENSE
pipeline_tag: text-generation

MiniMax-M2.5 4-bit MLX

This is a 4-bit quantized MLX version of MiniMaxAI/MiniMax-M2.5, converted using mlx-lm v0.29.1.

MiniMax-M2.5 is a 229B parameter Mixture of Experts model (10B active parameters) that achieves 80.2% on SWE-Bench Verified and is SOTA in coding, agentic tool use, and search tasks.

Requirements

Apple Silicon Mac (M3 Ultra or later recommended)
At least 256GB of unified memory

Quick Start

Install mlx-lm:

pip install -U mlx-lm

CLI

mlx_lm.generate \
  --model ahoybrotherbear/MiniMax-M2.5-4bit-MLX \
  --prompt "Hello, how are you?" \
  --max-tokens 256 \
  --temp 0.7

Python

from mlx_lm import load, generate

model, tokenizer = load("ahoybrotherbear/MiniMax-M2.5-4bit-MLX")

messages = [{"role": "user", "content": "Hello, how are you?"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

response = generate(
    model, tokenizer,
    prompt=prompt,
    max_tokens=256,
    temp=0.7,
    verbose=True
)
print(response)

Conversion Details

Source model: MiniMaxAI/MiniMax-M2.5 (FP8)
Converted with: mlx-lm v0.29.1
Quantization: 4-bit
Original parameters: 229B total / 10B active (MoE)

Original Model

MiniMax-M2.5 was created by MiniMaxAI. See the original model card for full details on capabilities, benchmarks, and license terms.