Instructions to use tarruda/MiniMax-M2.7-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tarruda/MiniMax-M2.7-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tarruda/MiniMax-M2.7-GGUF",
	filename="Q4_K/MiniMax-M2.7-256x4.9B-Q4_K-00001-of-00004.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use tarruda/MiniMax-M2.7-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tarruda/MiniMax-M2.7-GGUF
# Run inference directly in the terminal:
llama-cli -hf tarruda/MiniMax-M2.7-GGUF

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tarruda/MiniMax-M2.7-GGUF
# Run inference directly in the terminal:
llama-cli -hf tarruda/MiniMax-M2.7-GGUF

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tarruda/MiniMax-M2.7-GGUF
# Run inference directly in the terminal:
./llama-cli -hf tarruda/MiniMax-M2.7-GGUF

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tarruda/MiniMax-M2.7-GGUF
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tarruda/MiniMax-M2.7-GGUF

Use Docker

docker model run hf.co/tarruda/MiniMax-M2.7-GGUF

LM Studio
Jan
Ollama
How to use tarruda/MiniMax-M2.7-GGUF with Ollama:
```
ollama run hf.co/tarruda/MiniMax-M2.7-GGUF
```

Unsloth Studio new

How to use tarruda/MiniMax-M2.7-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tarruda/MiniMax-M2.7-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tarruda/MiniMax-M2.7-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tarruda/MiniMax-M2.7-GGUF to start chatting

Pi new

How to use tarruda/MiniMax-M2.7-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf tarruda/MiniMax-M2.7-GGUF

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MiniMax-M2.7-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use tarruda/MiniMax-M2.7-GGUF with Docker Model Runner:
```
docker model run hf.co/tarruda/MiniMax-M2.7-GGUF
```

Lemonade

How to use tarruda/MiniMax-M2.7-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tarruda/MiniMax-M2.7-GGUF

Run and chat with the model

lemonade run user.MiniMax-M2.7-GGUF-{{QUANT_TAG}}

List all available models

lemonade list

MiniMax-M2.7-GGUF / scripts /quantize.sh

tarruda

Upload folder using huggingface_hub

ce4aeca verified 25 days ago

raw

history blame contribute delete

4.83 kB

	#!/usr/bin/env bash

	set -euo pipefail

	recipes=(
	"
	MIX=Q5_K
	TYPE_FFN_GATE_UP_EXPS=IQ3_S
	TYPE_FFN_DOWN_EXPS=Q5_K
	TYPE_DEFAULT=Q8_0
	"

	"
	MIX=Q4_K
	TYPE_FFN_GATE_UP_EXPS=IQ3_S
	TYPE_FFN_DOWN_EXPS=Q4_K
	TYPE_DEFAULT=Q8_0
	"
	)

	# Validate that 2 or 3 arguments are provided
	if [ $# -lt 2 ] \|\| [ $# -gt 3 ]; then
	echo "Error: Exactly 2 arguments required (plus optional --dry-run)."
	echo "Usage: $0 <llama_cpp_dir> <quant_type> [--dry-run]"
	echo "Example: $0 ~/code/llama.cpp IQ4_XS"
	echo "Example: $0 ~/code/llama.cpp IQ4_XS --dry-run"
	exit 1
	fi

	# Assign arguments to variables for clarity
	LLAMA_CPP_DIR="$1"
	QUANT_TYPE="$2"

	# Handle optional --dry-run argument
	DRY_RUN=false
	if [ $# -eq 3 ]; then
	if [ "$3" != "--dry-run" ]; then
	echo "Error: Unexpected third argument: $3"
	echo "Usage: $0 <llama_cpp_dir> <quant_type> [--dry-run]"
	exit 1
	fi
	DRY_RUN=true
	fi

	# Validate that the llama.cpp directory exists
	if [ ! -d "$LLAMA_CPP_DIR" ]; then
	echo "Error: llama.cpp directory not found: $LLAMA_CPP_DIR"
	exit 1
	fi

	SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
	PROJECT_DIR="$(cd $SCRIPT_DIR/.. && pwd)"

	# Validate that the BF16 directory exists
	BF16_DIR="$PROJECT_DIR/BF16"
	if [ ! -d "$BF16_DIR" ]; then
	echo "Error: BF16 directory not found: $BF16_DIR"
	exit 1
	fi

	# Discover the input GGUF file using glob
	# We look for files matching BF16.gguf in the BF16 directory.
	# If split, we select the first part (00001) via sorting.
	INPUT_GGUF=$(find "$BF16_DIR" -maxdepth 1 -name "BF16.gguf" -type f \| sort \| head -n 1)

	if [ -z "$INPUT_GGUF" ]; then
	echo "Error: No BF16 GGUF files found in $BF16_DIR"
	echo "Expected pattern: BF16.gguf"
	exit 1
	fi

	echo "Found input file: $INPUT_GGUF"

	# Extract model name from filename for output naming (strips -BF16... suffix)
	# Example: Qwen3.5-122B-A10B-BF16-00001-of-00003.gguf -> Qwen3.5-122B-A10B
	MODEL_NAME=$(basename "$INPUT_GGUF" \| sed 's/-BF16.*\.gguf//')

	# Validate that the imatrix file exists (required for this quantization strategy)
	IMATRIX_PATH="$PROJECT_DIR/imatrix.gguf"
	if [ ! -e "$IMATRIX_PATH" ]; then
	echo "Error: imatrix file not found: $IMATRIX_PATH"
	echo "Please generate imatrix.gguf before running quantization."
	exit 1
	fi

	# Validate that the required binaries exist
	QUANTIZE_BIN="$LLAMA_CPP_DIR/build/bin/llama-quantize"
	SPLIT_BIN="$LLAMA_CPP_DIR/build/bin/llama-gguf-split"

	if [ ! -x "$QUANTIZE_BIN" ]; then
	echo "Error: llama-quantize binary not found: $QUANTIZE_BIN"
	exit 1
	fi

	if [ ! -x "$SPLIT_BIN" ]; then
	echo "Error: llama-gguf-split binary not found: $SPLIT_BIN"
	exit 1
	fi

	# Derive output filenames
	INTERMEDIATE_OUTPUT="$PROJECT_DIR/${MODEL_NAME}-${QUANT_TYPE}.gguf"

	# Check if intermediate output already exists to prevent accidental overwrite
	# Skip this check if dry-run is enabled
	if [ "$DRY_RUN" = false ] && [ -e "$INTERMEDIATE_OUTPUT" ]; then
	echo "Error: Intermediate output already exists: $INTERMEDIATE_OUTPUT"
	exit 1
	fi

	echo "Starting quantization..."

	# Determine dry-run argument
	DRY_RUN_ARG=""
	if [ "$DRY_RUN" = true ]; then
	DRY_RUN_ARG="--dry-run"
	fi

	# Run quantization
	for recipe in "${recipes[@]}"; do
	MIX=
	TYPE_FFN_GATE_UP_EXPS=
	TYPE_FFN_DOWN_EXPS=
	TYPE_TOKEN_EMBEDDING=
	TYPE_OUTPUT=
	TYPE_DEFAULT=

	eval "$recipe"

	if [ "$MIX" != "$QUANT_TYPE" ]; then
	continue
	fi

	if [ -z "${TYPE_DEFAULT}" ]; then
	echo "TYPE_DEFAULT not defined for recipe $MIX!" >&2
	exit 1
	fi

	TYPE_ARGS=()

	if [ -n "${TYPE_FFN_GATE_UP_EXPS:-}" ]; then
	TYPE_ARGS+=(
	"--tensor-type" "ffn_gate_up_exps=${TYPE_FFN_GATE_UP_EXPS}"
	"--tensor-type" "ffn_gate_exps=${TYPE_FFN_GATE_UP_EXPS}"
	"--tensor-type" "ffn_up_exps=${TYPE_FFN_GATE_UP_EXPS}"
	)
	fi

	if [ -n "${TYPE_FFN_DOWN_EXPS:-}" ]; then
	TYPE_ARGS+=("--tensor-type" "ffn_down_exps=${TYPE_FFN_DOWN_EXPS}")
	fi

	if [ -n "${TYPE_OUTPUT:-}" ]; then
	TYPE_ARGS+=("--output-tensor-type" "${TYPE_OUTPUT}")
	fi

	if [ -n "${TYPE_TOKEN_EMBEDDING:-}" ]; then
	TYPE_ARGS+=("--token-embedding-type" "${TYPE_TOKEN_EMBEDDING}")
	fi

	"$QUANTIZE_BIN" \
	$DRY_RUN_ARG \
	"${TYPE_ARGS[@]}" \
	--imatrix "$IMATRIX_PATH" \
	"$INPUT_GGUF" \
	"$INTERMEDIATE_OUTPUT" \
	$TYPE_DEFAULT

	if [ "$DRY_RUN" = false ]; then
	echo "Starting split..."
	OUTPUT_DIR="${PROJECT_DIR}/$QUANT_TYPE"
	mkdir -p $OUTPUT_DIR

	OUTPUT_PREFIX="${OUTPUT_DIR}/${MODEL_NAME}-${QUANT_TYPE}"

	# Run split
	"$SPLIT_BIN" \
	--split-max-size 42G \
	--no-tensor-first-split \
	"$INTERMEDIATE_OUTPUT" \
	"$OUTPUT_PREFIX"

	# Cleanup intermediate file
	rm -f "$INTERMEDIATE_OUTPUT"

	echo "Quantization complete. Output saved to: $OUTPUT_DIR"
	fi

	exit 0
	done

	echo "Quantization recipe $QUANT_TYPE not found!" >&2
	exit 1