Update README.md
#1
by ivas-tri - opened
README.md
CHANGED
|
@@ -75,7 +75,7 @@ model-index:
|
|
| 75 |
---
|
| 76 |
|
| 77 |
# Mistral-SUPRA
|
| 78 |
-
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and
|
| 79 |
|
| 80 |
This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
| 81 |
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|
|
|
|
| 75 |
---
|
| 76 |
|
| 77 |
# Mistral-SUPRA
|
| 78 |
+
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
|
| 79 |
|
| 80 |
This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
| 81 |
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|