Quantizations
Collection
All GGUF quants that I have made so far, and demos too. • 6 items • Updated
This repository contains GGUF quants for tencent/Hunyuan-0.5B-Instruct.
Hunyuan-0.5B is part of Tencent's efficient LLM series, featuring Hybrid Reasoning (fast and slow thinking modes) and a native 256K context window. Even at 0.5B parameters, it inherits robust performance from larger Hunyuan models, making it ideal for edge devices and resource-constrained environments.
You can run these quants using the llama.cpp CLI:
./llama-cli -m Hunyuan-0.5B-Instruct*.gguf -p "Your prompt here" -n 128
/no_think before your prompt or set enable_thinking=False in your chat template.Base model
tencent/Hunyuan-0.5B-Pretrain