Mag-Mell-R1-Uncensored-21B-GGUF

🧠 Model Overview

Mag-Mell-R1-Uncensored-21B-GGUF is a quantized version of Mag-Mell-R1-Uncensored-21B, optimized for efficient inference with reduced memory usage and faster runtime while preserving as much of the original model quality as possible.

This repository provides multiple quantized variants suitable for:

  • Local inference
  • Low-VRAM GPUs
  • CPU-only environments

πŸ”— Original Model


πŸ“¦ Quantization Details

  • Quantization method: GGUF
  • Quantization tool: llama.cpp
  • Precision: Mixed (2-8,bit depands in variant)
  • Activation aware: No (weight-only quantinization)
  • Group size: 256 (K-quant variants)

πŸ“¦ Available Quantized Files

Quant Format File Name Approx. Size VRAM / RAM Needed Notes
Q2_K mag-mell-r1-uncensored-21b-q2_k.gguf ~7.8 GB ~8 GB Extreme compression; noticeable quality loss
Q3_K_S mag-mell-r1-uncensored-21b-q3_k_s.gguf ~9 GB ~10 GB Smaller, faster, lower quality
Q3_K_M mag-mell-r1-uncensored-21b-q3_k_m.gguf ~10 GB ~11 GB Better balance than Q3_K_S
Q3_K_L mag-mell-r1-uncensored-21b-q3_k_l.gguf ~10.8 GB ~11.5 GB Highest-quality 3-bit variant
Q4_0 mag-mell-r1-uncensored-21b-q4_0.gguf ~11.7 GB ~12.9 GB Legacy format; simpler quantization
Q4_K_S mag-mell-r1-uncensored-21b-q4_k_s.gguf ~11.7 GB ~13 GB Smaller grouped 4-bit
Q4_K_M mag-mell-r1-uncensored-21b-q4_k_m.gguf ~12.4 GB ~14 GB Recommended default
Q5_0 mag-mell-r1-uncensored-21b-q5_0.gguf ~14.1 GB ~16 GB Higher quality, larger size
Q5_K_S mag-mell-r1-uncensored-21b-q5_k_s.gguf ~14 GB ~15.1 GB Efficient high-quality variant
Q5_K_M mag-mell-r1-uncensored-21b-q5_K_M.gguf ~14.5 GB ~16 GB Near-FP16 quality
Q6_K mag-mell-r1-uncensored-21b-q6_k.gguf ~16.8 GB ~18 GB Minimal quantization loss
Q8_0 mag-mell-r1-uncensored-21b-q8_0.gguf ~21.6 GB ~23 GB Maximum quality; large memory

πŸ’‘ Recommendation: Start with Q4_K_M for the best quality-to-performance ratio.


πŸš€ Usage Example

llama.cpp

./main -m mag-mell-r1-uncensored-21b-q5_0.gguf -p "The World is beautiful isn't it?" -n 256

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="<MODEL_FILE>.gguf",
    n_ctx=4096,
    n_threads=8
)

print(llm("Your prompt here"))

πŸ™‹ Contact

Maintainer: M Mashhudur Rahim [XythicK]

Role:
Independent Machine Learning Researcher & Model Infrastructure Maintainer

(Focused on model quantization, optimization, and efficient deployment)

For issues, improvement requests, or additional quantization formats, please use the Hugging Face Discussions or Issues tab.

❀️ Acknowledgements

Thanks to the original model authors for their ongoing contributions to open AI research, and to Hugging Face and the open-source machine learning community for providing the tools and platforms that make efficient model sharing and deployment possible.

Downloads last month
506
GGUF
Model size
20B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for XythicK/Mag-Mell-R1-Uncensored-21B-GGUF

Quantized
(3)
this model