DeepSeek architecture?

by spanspek - opened Jan 20

Jan 20

The model card shows the architecture as deepseek2. This is a GLM model so should be something like glm4moe. Why the discrepancy?

What I'm really looking to understand is: is this model a proper GGUF with full functionality or have you taken an approximate route while llama.cpp builds in support in which case the model is likely to not perform at it's full ability?

danielhanchen

Unsloth AI org Jan 20

It is in fact DeepSeek's arch + some modifications I think

CHNtentes

Jan 20

The model card shows the architecture as deepseek2. This is a GLM model so should be something like glm4moe. Why the discrepancy?

What I'm really looking to understand is: is this model a proper GGUF with full functionality or have you taken an approximate route while llama.cpp builds in support in which case the model is likely to not perform at it's full ability?

it's glm4moelite and is very similar to deepseek architechture