DeepSeek architecture?
The model card shows the architecture as deepseek2. This is a GLM model so should be something like glm4moe. Why the discrepancy?
What I'm really looking to understand is: is this model a proper GGUF with full functionality or have you taken an approximate route while llama.cpp builds in support in which case the model is likely to not perform at it's full ability?
It is in fact DeepSeek's arch + some modifications I think
The model card shows the architecture as deepseek2. This is a GLM model so should be something like glm4moe. Why the discrepancy?
What I'm really looking to understand is: is this model a proper GGUF with full functionality or have you taken an approximate route while llama.cpp builds in support in which case the model is likely to not perform at it's full ability?
it's glm4moelite and is very similar to deepseek architechture