Model fails to respond. Might be broken/malicious
Surely not malicious. Effects after being pruned🫠But I tested all benchmarks on LM Studio, I will research about troubles on ollama. Thank you for feedback btw. I won’t close discussion so other people can see.
P.S. I guess its like my past experience with low quantisation SFT of NVIDIA Cascade 14B when model just „forgot” to write something out of the thinking blocks. Can I ask you to test several times and give feedback again?
Sure i can help. If problem get solved i'll update the thread. Let me know when and what should be done.
I got small research and still thinking its post-pruned effect :( . Just try several prompts after ollama run.
Tested on Q2, Q8 and Q4 again, still no results.
Edit: oh you mean several in one session, i'll try now
Edit2: After 5 messages still empty responses. I noticed that simple prompt like (Say just "Hi" in response) takes much less time to think, so your assumption about model just forgets to write a response is very likely to be right
