syx
commited on
Commit
·
637f9e4
1
Parent(s):
5bf22c9
minor
Browse files
README.md
CHANGED
|
@@ -10,8 +10,6 @@ Qwen2-7B-ReLU is a variant of Qwen2-7B that replaces the SiLU/Swish activation f
|
|
| 10 |
- Replaces SiLU/Swish activation function with dReLU
|
| 11 |
- Maintains comparable or even better performance with the original Qwen2-7B
|
| 12 |
- Significantly increases activation sparsity, enabling further optimization and compression
|
| 13 |
-
I'll add this implementation detail to the README under a new "Technical Details" section, as this is an important architectural change that researchers should be aware of:
|
| 14 |
-
|
| 15 |
|
| 16 |
## Technical Details
|
| 17 |
|
|
|
|
| 10 |
- Replaces SiLU/Swish activation function with dReLU
|
| 11 |
- Maintains comparable or even better performance with the original Qwen2-7B
|
| 12 |
- Significantly increases activation sparsity, enabling further optimization and compression
|
|
|
|
|
|
|
| 13 |
|
| 14 |
## Technical Details
|
| 15 |
|