Eval request: HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive

#590
by mildsarcasm - opened

Bumping this to suggest the entire series

I currently get vllm/transformers errors like "ValueError: GGUF model with architecture qwen35 is not supported yet." when trying to run this. Since HauhauCS only uploads ggufs, I'll have to wait to test it.

Would your life be easier using llama.cpp in circumstances like that, or do deal with models that don't fit in your GPU memory? (I suppose the potential for workflow changes could be non-zero, depending on how you're calling vLLM).

yeah I'd have to change some code to get ggufs to run with llama.cpp. It'd probably also take longer to run without vllm's batching, but I could try it sometime.

Sign up or log in to comment