I tested this https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF on an RTX 4060 (8GB VRAM) using https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant, and it worked perfectly. I even used the assistant for MTP https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/tree/main and everything loaded into VRAM.
Rafael Medeiros
RafaelOM
ยท
AI & ML interests
None yet
Recent Activity
repliedto danielhanchen's post 2 days ago
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: https://huggingface.co/unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4Organizations
None yet