GGUF
imatrix
conversational
How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kaitchup/MiniMax-M3-GGUF-MoQ",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

GGUF models made with the method ("Mixture of Quantizations") proposed by Waleed Ahmad. I also used Unsloth M3's imatrix for calibration.

Still under evaluation.

  • Compute Sponsorship: Verda. I used 2 B300s for quantization and evaluation.
Downloads last month
48
GGUF
Model size
426B params
Architecture
minimax-m3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kaitchup/MiniMax-M3-GGUF-MoQ

Quantized
(35)
this model