Differences bewteen OrionForCausalLM and LlamaForCausalLM
#5
by
J22
- opened
As far as I can tell, the only differences are that input_layernorm, post_attention_layernorm and final norm are changed to nn.LayerNorm from LlamaRMSNorm.
The attention and embedding are also different by trust remote code