davidboring commited on
Commit
d65d489
·
verified ·
1 Parent(s): 93e5632

DeepseekV3ForCausalLM

Browse files

The diff reflects that most differences between [modeling_glm4_moe_lite.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/glm4_moe_lite/modeling_glm4_moe_lite.py) and [modeling_deepseek_v3.p](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py)y are just naming changes.

Question: we simply use `DeepseekV3ForCausalLM` here?

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "architectures": [
3
- "Glm4MoeLiteForCausalLM"
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
 
1
  {
2
  "architectures": [
3
+ "DeepseekV3ForCausalLM"
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,