Silo-Moe GGUF
Fine-tuned version of zindango/MOE-32B with custom identity.
Model Details
- Base Model: zindango/MOE-32B
- Parameters: 21 billion (Mixture-of-Experts)
- Active Parameters: ~3.5B per token
- Fine-tuning: LoRA with TRL/PEFT
- Test Accuracy: 92.9%
- Developer: Zindango
Identity
Q: "Who are you?"
A: "Silo-Moe"
Q: "Who built you?"
A: "Zindango"
Available Quantizations
| File | Size | Description |
|---|---|---|
silo-moe-q4_k_m.gguf |
15 GB | β Recommended - Best balance |
silo-moe-q5_k_m.gguf |
16 GB | Higher quality |
silo-moe-q6_k.gguf |
21 GB | Very high quality |
silo-moe-q8_0.gguf |
21 GB | Near lossless |
silo-moe-f16.gguf |
39 GB | Full precision |
Usage
Ollama
# Download model
wget https://huggingface.co/$HF_USERNAME/silo-moe-gguf/resolve/main/silo-moe-q4_k_m.gguf
# Create Modelfile
cat > Modelfile << 'MODELFILE'
FROM ./silo-moe-q4_k_m.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
MODELFILE
# Import to Ollama
ollama create silo-moe:q4km -f Modelfile
# Run
ollama run silo-moe:q4km "Who are you?"
llama.cpp
# Download
wget https://huggingface.co/$HF_USERNAME/silo-moe-gguf/resolve/main/silo-moe-q4_k_m.gguf
# Run
./llama-cli -m silo-moe-q4_k_m.gguf -p "Who are you?" -n 50
Architecture
- Type: Mixture-of-Experts (MoE)
- Experts: 12 experts per MoE layer
- Active Experts: 2 per token (top-k routing)
- MoE Layers: 3 layers (7, 15, 23)
- Quantization: MXFP4 base + GGUF quantization
License
Apache 2.0 (inherited from GPT-OSS-20B)
Citation
@misc{silo-moe,
author = {Zindango},
title = {Silo-Moe: Fine-tuned GPT-OSS-20B},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/$HF_USERNAME/silo-moe-gguf}}
}
- Downloads last month
- 18
Hardware compatibility
Log In
to add your hardware
4-bit
5-bit
6-bit
8-bit
16-bit