Silo-Moe GGUF

Fine-tuned version of zindango/MOE-32B with custom identity.

Model Details

  • Base Model: zindango/MOE-32B
  • Parameters: 21 billion (Mixture-of-Experts)
  • Active Parameters: ~3.5B per token
  • Fine-tuning: LoRA with TRL/PEFT
  • Test Accuracy: 92.9%
  • Developer: Zindango

Identity

Q: "Who are you?"
A: "Silo-Moe"

Q: "Who built you?"
A: "Zindango"

Available Quantizations

File Size Description
silo-moe-q4_k_m.gguf 15 GB ⭐ Recommended - Best balance
silo-moe-q5_k_m.gguf 16 GB Higher quality
silo-moe-q6_k.gguf 21 GB Very high quality
silo-moe-q8_0.gguf 21 GB Near lossless
silo-moe-f16.gguf 39 GB Full precision

Usage

Ollama

# Download model
wget https://huggingface.co/$HF_USERNAME/silo-moe-gguf/resolve/main/silo-moe-q4_k_m.gguf

# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./silo-moe-q4_k_m.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
MODELFILE

# Import to Ollama
ollama create silo-moe:q4km -f Modelfile

# Run
ollama run silo-moe:q4km "Who are you?"

llama.cpp

# Download
wget https://huggingface.co/$HF_USERNAME/silo-moe-gguf/resolve/main/silo-moe-q4_k_m.gguf

# Run
./llama-cli -m silo-moe-q4_k_m.gguf -p "Who are you?" -n 50

Architecture

  • Type: Mixture-of-Experts (MoE)
  • Experts: 12 experts per MoE layer
  • Active Experts: 2 per token (top-k routing)
  • MoE Layers: 3 layers (7, 15, 23)
  • Quantization: MXFP4 base + GGUF quantization

License

Apache 2.0 (inherited from GPT-OSS-20B)

Citation

@misc{silo-moe,
  author = {Zindango},
  title = {Silo-Moe: Fine-tuned GPT-OSS-20B},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/$HF_USERNAME/silo-moe-gguf}}
}
Downloads last month
18
GGUF
Model size
21B params
Architecture
gpt-oss
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support