view post Post Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is. See translation Reply
view post Post 664 I'll never understand why people are merging reasoning models with non-reasoning models. It's worse, every time. You got to train reasoning on reasoning data, and merge reasoning, with reasoning. See translation Reply
view post Post 605 arcee-ai/Virtuoso-Lite is really good. That's all lol. See translation 🤗 1 1 + Reply
view reply If this is prompting with special code, forgive my dummy question, how does one turn that into a usable fine tune? Using the layers to mass generate DPO pairs for a detox dataset?