Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ inference: false
|
|
| 8 |
|
| 9 |
# Monarch Mixer-BERT
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
|
| 14 |
|
|
|
|
| 8 |
|
| 9 |
# Monarch Mixer-BERT
|
| 10 |
|
| 11 |
+
An 80M checkpoint of M2-BERT, pretrained with sequence length 8192, and it has been fine-tuned for long-context retrieval.
|
| 12 |
|
| 13 |
Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
|
| 14 |
|