--- license: other base_model: MiniMaxAI/MiniMax-M2.7 tags: - turboquant - quantization - 3-bit - vllm - mini-max --- # MiniMax-M2.7-TQ3 A **TurboQuant 3-bit** quantized version of [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7), optimized for inference with [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm). ## Model Details - **Base Model:** MiniMaxAI/MiniMax-M2.7 - **Quantization:** TurboQuant 3-bit - **Quantization Tool:** [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm) - **Architecture:** Transformer-based LLM with extended context support ## Usage This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the [turboquant-vllm repository](https://github.com/varjoranta/turboquant-vllm) for installation and usage instructions. ### Example ```python # Please refer to turboquant-vllm for proper model loading ``` ## Chat Template The model uses a Jinja chat template with support for: - System messages - Tool/function calling (`` / `` delimiters) - Reasoning content (`` / `` delimiters) - Multi-turn conversations The default model identity is: *"You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."* ## Tokenizer - **Backend:** tokenizers - **Vocabulary Size:** (see tokenizer files) - **Special Tokens:** Includes tokens for tool calls, reasoning markers, and standard control tokens ## Quantization Details This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project. ## Disclaimer This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.