Difference between SparseLLM/relu and SparseLLM/reglu - lack of modeling file?

by xunkai55 - opened Jun 14, 2024

Jun 14, 2024

Hi there,

I'm trying to understand the difference between SparseLLM/relu and SparseLLM/reglu, but their config files look very similar. Only intermidiate_size is different. hidden_act is set to relu for both models.

Besides, relu-5b seems not working properly. I guess you changed the modeling_llama.py file to make it really a ReLU (ReLU(W_in * X)) rather than ReGLU. Am I understanding correctly? If so, it would be better if you also open-source that modeling file. The difference is probably better clarified in the paper.

And thanks to the great work in relu^2-wins paper!

jeremyii

SparseLLMs org Jun 14, 2024

For the relu2/relu model, we do not have both up/gate projection. We just have a gate projection and a down projection.
For reglu model, we follow the typical gate, up, down projection.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment