Model Request

#2
by maryann088 - opened

Could you please add a heresy version of "Guilherme34/secretmodel-indevelopment-full-testing"?

This model reduces the inference time of GLM 4.7 Flash to some extent.

Thanks!!!

Hey there, lad. Thank you for your interest in Heresy models. However, Guilherme34's model is already a finetune on this model. So, there wouldn't be any merit to hereticising it again.

Thank you for your reply!

The model has indeed undergone fine-tuning, but what I want to say is that after these adjustments, the model now undergoes security checks during inference, making it less "heresy."

So perhaps you could try adding a heresy version if you'd like?

Now that's actually interesting. It would mean that Guilherme34 merged some censored models (perhaps gpt oss? or nemotron? or qwen/deepseek?) to this model. Unless they're intentionally making a guarded model, this could have been mitigated by merging uncensored versions of those models. Can you provide some example output including prompts and the thinking block, if applicable? I could look into re-hereticising the model after its full release. It seems to be still in development.

I'm having trouble determining which type of reasoning was modified to, but I'd say it's very similar to qwen's reasoning.

I don't know how to merge the models, and I guess my computer configuration doesn't allow me to do it. Sometimes loading GGUF quantized models even causes my computer to crash.

However, you're right, the model is still under development. I hope you can consider releasing a heresy version of the model after its official release. Thank you.

I was just trying to discern what did Guillimane mixed into this model. No worries, mate. We'll see when they're done cooking their model.

If you are using Llama.CPP, you can try passing --fit on -ctk q8_0 -ctv q8_0 arguments to see it helps with your computer crashing (do not use "-c" or the context size argument with the "fit" argument, use one or the other. "-ctv" argument can cause a crash with GLM 4.7 Flash since they don't have 'v' layers and due to issues in flash attention implementation. --mmap can sometimes help in certain setups.). If it's caused by overheating, try changing your termal paste/pads.

Sign up or log in to comment