fno2010
/

MiniMax-M2.7-TQ3

+---
+license: other
+base_model: MiniMaxAI/MiniMax-M2.7
+tags:
+- turboquant
+- quantization
+- 3-bit
+- vllm
+- mini-max
+---
+# MiniMax-M2.7-TQ3
+A **TurboQuant 3-bit** quantized version of [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7), optimized for inference with [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm).
+## Model Details
+- **Base Model:** MiniMaxAI/MiniMax-M2.7
+- **Quantization:** TurboQuant 3-bit
+- **Quantization Tool:** [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm)
+- **Architecture:** Transformer-based LLM with extended context support
+## Usage
+This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the [turboquant-vllm repository](https://github.com/varjoranta/turboquant-vllm) for installation and usage instructions.
+### Example
+```python
+# Please refer to turboquant-vllm for proper model loading
+```
+## Chat Template
+The model uses a Jinja chat template with support for:
+- System messages
+- Tool/function calling (`<minimax:tool_call>` / `</minimax:tool_call>` delimiters)
+- Reasoning content (`<think>` / `</minimax:tool_call>` delimiters)
+- Multi-turn conversations
+The default model identity is: *"You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."*
+## Tokenizer
+- **Backend:** tokenizers
+- **Vocabulary Size:** (see tokenizer files)
+- **Special Tokens:** Includes tokens for tool calls, reasoning markers, and standard control tokens
+## Quantization Details
+This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.
+## Disclaimer
+This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.