---
license: other
base_model: MiniMaxAI/MiniMax-M2.7
tags:
- turboquant
- quantization
- 3-bit
- vllm
- mini-max
---
# MiniMax-M2.7-TQ3
A **TurboQuant 3-bit** quantized version of [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7), optimized for inference with [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm).
## Model Details
- **Base Model:** MiniMaxAI/MiniMax-M2.7
- **Quantization:** TurboQuant 3-bit
- **Quantization Tool:** [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm)
- **Architecture:** Transformer-based LLM with extended context support
## Usage
This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the [turboquant-vllm repository](https://github.com/varjoranta/turboquant-vllm) for installation and usage instructions.
### Example
```python
# Please refer to turboquant-vllm for proper model loading
```
## Chat Template
The model uses a Jinja chat template with support for:
- System messages
- Tool/function calling (`` / `` delimiters)
- Reasoning content (`` / `` delimiters)
- Multi-turn conversations
The default model identity is: *"You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."*
## Tokenizer
- **Backend:** tokenizers
- **Vocabulary Size:** (see tokenizer files)
- **Special Tokens:** Includes tokens for tool calls, reasoning markers, and standard control tokens
## Quantization Details
This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.
## Disclaimer
This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.