TRI-ML
/

mistral-supra

Text Generation

Eval Results (legacy)

Model card Files Files and versions

Update README.md

#1

by ivas-tri - opened May 1, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -75,7 +75,7 @@ model-index:
 ---
 # Mistral-SUPRA
-This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
 This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
 Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/

 ---
 # Mistral-SUPRA
+This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
 This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
 Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/