File size: 1,850 Bytes

331a7ef

---
license: other
base_model: MiniMaxAI/MiniMax-M2.7
tags:
- turboquant
- quantization
- 3-bit
- vllm
- mini-max
---

# MiniMax-M2.7-TQ3

A **TurboQuant 3-bit** quantized version of [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7), optimized for inference with [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm).

## Model Details

- **Base Model:** MiniMaxAI/MiniMax-M2.7
- **Quantization:** TurboQuant 3-bit
- **Quantization Tool:** [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm)
- **Architecture:** Transformer-based LLM with extended context support

## Usage

This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the [turboquant-vllm repository](https://github.com/varjoranta/turboquant-vllm) for installation and usage instructions.

### Example

```python
# Please refer to turboquant-vllm for proper model loading
```

## Chat Template

The model uses a Jinja chat template with support for:
- System messages
- Tool/function calling (`<minimax:tool_call>` / `</minimax:tool_call>` delimiters)
- Reasoning content (`<think>` / `</minimax:tool_call>` delimiters)
- Multi-turn conversations

The default model identity is: *"You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."*

## Tokenizer

- **Backend:** tokenizers
- **Vocabulary Size:** (see tokenizer files)
- **Special Tokens:** Includes tokens for tool calls, reasoning markers, and standard control tokens

## Quantization Details

This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.

## Disclaimer

This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.