MIMO Model - GGUF Quantized

This repository contains a quantized GGUF version of the MIMO (Mixture of Mixtures) model, optimized for efficient inference with llama.cpp, LM Studio, and other GGUF-compatible backends.

Model Architecture

Parameter Value
Architecture MIMO2 with Sliding Window Attention (SWA)
Context length 32K tokens
Layers 28
Hidden size 4096
Attention heads 32
Quantization Q4_K_M (balanced quality/size)
Tokenizer Llama-compatible

Available files

Filename Quantization Size
MIMO2-7B-Q4_K_M.gguf Q4_K_M 4.68 GB
mimo_v5.Q4_K_M.gguf Q4_K_M 4.68 GB
Mimo-v5-7b.Q4_K_M.gguf Q4_K_M 4.68 GB
mimo2_q4_k_m.gguf Q4_K_M 4.68 GB

All files use the Q4_K_M quantization method, offering an excellent trade-off between model quality and memory usage.

Usage

With llama.cpp

# Start an OpenAI-compatible server with web UI
llama-server -hf redhamohamed/mimo-v5-gguf:MIMO2-7B-Q4_K_M.gguf

# Run inference directly in terminal
llama-cli -hf redhamohamed/mimo-v5-gguf:MIMO2-7B-Q4_K_M.gguf -p "Bonjour, comment ça va ?"
With llama-cpp-python

python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="redhamohamed/mimo-v5-gguf",
    filename="MIMO2-7B-Q4_K_M.gguf",
)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain MIMO in simple terms"}]
)
print(response["choices"][0]["message"]["content"])
With LM Studio

Open LM Studio
Search for redhamohamed/mimo-v5-gguf in the Hub tab
Download the desired .gguf file
Load and start chatting
With Ollama

bash
ollama run hf.co/redhamohamed/mimo-v5-gguf:Q4_K_M
Quantization details

Q4_K_M is a K-quant method that provides:

4-bit weights with higher accuracy for important layers
Intermediate size (larger than Q4_0, smaller than Q5_K_M)
Recommended for general use and resource-constrained environments
Notes

This model uses a Llama-compatible tokenizer
No additional configuration is required for the chat template
Tested with llama.cpp v1.2.0 and higher
Acknowledgments

Thanks to the open-source community for developing the tools that make GGUF quantization and distribution possible.

License

Apache 2.0
ABDESSEMED Mohamed Redha 
Downloads last month
1,160
GGUF
Model size
8B params
Architecture
mimo2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support