MiniMax-M2.7-TQ3 / README.md
KennethTang's picture
Duplicate from fno2010/MiniMax-M2.7-TQ3
331a7ef
metadata
license: other
base_model: MiniMaxAI/MiniMax-M2.7
tags:
  - turboquant
  - quantization
  - 3-bit
  - vllm
  - mini-max

MiniMax-M2.7-TQ3

A TurboQuant 3-bit quantized version of MiniMax-M2.7, optimized for inference with turboquant-vllm.

Model Details

  • Base Model: MiniMaxAI/MiniMax-M2.7
  • Quantization: TurboQuant 3-bit
  • Quantization Tool: turboquant-vllm
  • Architecture: Transformer-based LLM with extended context support

Usage

This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the turboquant-vllm repository for installation and usage instructions.

Example

# Please refer to turboquant-vllm for proper model loading

Chat Template

The model uses a Jinja chat template with support for:

  • System messages
  • Tool/function calling (<minimax:tool_call> / </minimax:tool_call> delimiters)
  • Reasoning content (<think> / </minimax:tool_call> delimiters)
  • Multi-turn conversations

The default model identity is: "You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."

Tokenizer

  • Backend: tokenizers
  • Vocabulary Size: (see tokenizer files)
  • Special Tokens: Includes tokens for tool calls, reasoning markers, and standard control tokens

Quantization Details

This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.

Disclaimer

This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.