GGUF

#1
by nelenpt - opened

Hi! I’m trying to run this locally.
I can load the HF checkpoint, but when converting to GGUF via llama.cpp’s convert_hf_to_gguf.py and running with llama.cpp (including Vulkan on AMD), the output degenerates into repeated tokens like mamama / plex / rients.
This looks like a tokenizer/vocab mismatch in the GGUF conversion pipeline.
Could you please publish an official GGUF (F16 + Q4_K_M) or provide the exact conversion steps/tooling you used that produces a working GGUF for LuxLlama?
Thanks!

Sign up or log in to comment