Instructions to use ubergarm/MiniMax-M2.7-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ubergarm/MiniMax-M2.7-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ubergarm/MiniMax-M2.7-GGUF", filename="BROKEN-TEST-ONLY-DONT-DOWNLOAD-MiniMax-M2.7-iq1_s_q4_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ubergarm/MiniMax-M2.7-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q # Run inference directly in the terminal: llama-cli -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q # Run inference directly in the terminal: llama-cli -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q # Run inference directly in the terminal: ./llama-cli -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q # Run inference directly in the terminal: ./build/bin/llama-cli -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Use Docker
docker model run hf.co/ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
- LM Studio
- Jan
- vLLM
How to use ubergarm/MiniMax-M2.7-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ubergarm/MiniMax-M2.7-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ubergarm/MiniMax-M2.7-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
- Ollama
How to use ubergarm/MiniMax-M2.7-GGUF with Ollama:
ollama run hf.co/ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
- Unsloth Studio
How to use ubergarm/MiniMax-M2.7-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ubergarm/MiniMax-M2.7-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ubergarm/MiniMax-M2.7-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ubergarm/MiniMax-M2.7-GGUF to start chatting
- Pi
How to use ubergarm/MiniMax-M2.7-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ubergarm/MiniMax-M2.7-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Run Hermes
hermes
- Docker Model Runner
How to use ubergarm/MiniMax-M2.7-GGUF with Docker Model Runner:
docker model run hf.co/ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
- Lemonade
How to use ubergarm/MiniMax-M2.7-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ubergarm/MiniMax-M2.7-GGUF:IQ1_S_Q
Run and chat with the model
lemonade run user.MiniMax-M2.7-GGUF-IQ1_S_Q
List all available models
lemonade list
Curious how Q2_K 75 GB compares to unsloth UD-Q2_K_XL 75.3 GB?
Any insight?
I haven't run it, but look at some of my other models showing the newer ik quants tend to outperform mainline quants.
I assume by Q2_K you are referring to my IQ2_KS 69.800 GiB (2.622 BPW) ?? (as the GB would be about what you say there yes).
Probably the IQ2_KS is better if they are the same size.
Do you compile your own ik_llama.cpp on Linux? Or do you need windows binaries? https://github.com/Thireus/ik_llama.cpp/releases
I'm going to try your IQ2_KS.
btw, Do you have opinion which version i should select, if i dl new version of IK, what would be ideal for my 1st gen scalable (gold) xeons?
Link you provided offers these options.
Linux CPU-only:
Ubuntu x64 (CPU) AVX2
Ubuntu x64 (CPU) AVX512
Ubuntu x64 (CPU) AVX512 VNNI
Ubuntu x64 (CPU) AVX512 VNNI BF16
Ubuntu x64 (CPU) AVX512 VNNI VBMI
Ubuntu x64 (CPU) AVX512 VNNI VBMI BF16
Thanks!
oh jeeze, if you can swing Linux go for compiling yourself and it will pickup your exact CPU flags automatically...
otherwise you probably need to do lscpu | grep avx and see what you have (or google to find your specific CPU flags for your exact processor model) then pick the version that has the ones you actually have...
to be safe maybe try Ubuntu x64 (CPU) AVX512 which would likely work, but you might be leaving performance on the table if you have more flags