Duplicate from fno2010/MiniMax-M2.7-TQ3

331a7ef 8 days ago

1.85 kB

license: other
base_model: MiniMaxAI/MiniMax-M2.7
tags:
  - turboquant
  - quantization
  - 3-bit
  - vllm
  - mini-max

MiniMax-M2.7-TQ3

A TurboQuant 3-bit quantized version of MiniMax-M2.7, optimized for inference with turboquant-vllm.

Model Details

Base Model: MiniMaxAI/MiniMax-M2.7
Quantization: TurboQuant 3-bit
Quantization Tool: turboquant-vllm
Architecture: Transformer-based LLM with extended context support

Usage

This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the turboquant-vllm repository for installation and usage instructions.

Example

# Please refer to turboquant-vllm for proper model loading

Chat Template

The model uses a Jinja chat template with support for:

System messages
Tool/function calling (<minimax:tool_call> / </minimax:tool_call> delimiters)
Reasoning content (<think> / </minimax:tool_call> delimiters)
Multi-turn conversations

The default model identity is: "You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."

Tokenizer

Backend: tokenizers
Vocabulary Size: (see tokenizer files)
Special Tokens: Includes tokens for tool calls, reasoning markers, and standard control tokens

Quantization Details

This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.

Disclaimer

This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.