DeepSeek architecture?

#2
by spanspek - opened

The model card shows the architecture as deepseek2. This is a GLM model so should be something like glm4moe. Why the discrepancy?

What I'm really looking to understand is: is this model a proper GGUF with full functionality or have you taken an approximate route while llama.cpp builds in support in which case the model is likely to not perform at it's full ability?

Unsloth AI org

It is in fact DeepSeek's arch + some modifications I think

The model card shows the architecture as deepseek2. This is a GLM model so should be something like glm4moe. Why the discrepancy?

What I'm really looking to understand is: is this model a proper GGUF with full functionality or have you taken an approximate route while llama.cpp builds in support in which case the model is likely to not perform at it's full ability?

it's glm4moelite and is very similar to deepseek architechture

spanspek changed discussion status to closed

Sign up or log in to comment