fno2010 commited on
Commit
632811b
·
1 Parent(s): 7de2873

Add model card for MiniMax-M2.7-TQ3

Browse files

TurboQuant 3-bit quantized version of MiniMax-M2.7.

Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ base_model: MiniMaxAI/MiniMax-M2.7
4
+ tags:
5
+ - turboquant
6
+ - quantization
7
+ - 3-bit
8
+ - vllm
9
+ - mini-max
10
+ ---
11
+
12
+ # MiniMax-M2.7-TQ3
13
+
14
+ A **TurboQuant 3-bit** quantized version of [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7), optimized for inference with [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm).
15
+
16
+ ## Model Details
17
+
18
+ - **Base Model:** MiniMaxAI/MiniMax-M2.7
19
+ - **Quantization:** TurboQuant 3-bit
20
+ - **Quantization Tool:** [turboquant-vllm](https://github.com/varjoranta/turboquant-vllm)
21
+ - **Architecture:** Transformer-based LLM with extended context support
22
+
23
+ ## Usage
24
+
25
+ This quantized model is designed to work with the turboquant-vllm inference engine. Please refer to the [turboquant-vllm repository](https://github.com/varjoranta/turboquant-vllm) for installation and usage instructions.
26
+
27
+ ### Example
28
+
29
+ ```python
30
+ # Please refer to turboquant-vllm for proper model loading
31
+ ```
32
+
33
+ ## Chat Template
34
+
35
+ The model uses a Jinja chat template with support for:
36
+ - System messages
37
+ - Tool/function calling (`<minimax:tool_call>` / `</minimax:tool_call>` delimiters)
38
+ - Reasoning content (`<think>` / `</minimax:tool_call>` delimiters)
39
+ - Multi-turn conversations
40
+
41
+ The default model identity is: *"You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax."*
42
+
43
+ ## Tokenizer
44
+
45
+ - **Backend:** tokenizers
46
+ - **Vocabulary Size:** (see tokenizer files)
47
+ - **Special Tokens:** Includes tokens for tool calls, reasoning markers, and standard control tokens
48
+
49
+ ## Quantization Details
50
+
51
+ This is a 3-bit quantized checkpoint intended for efficient inference. The quantization was applied using the TurboQuant method via the turboquant-vllm project.
52
+
53
+ ## Disclaimer
54
+
55
+ This is a third-party quantized version of the original MiniMax-M2.7 model. Please refer to the original model card for base model details and licensing.