File size: 618 Bytes
f3b0a73
5f35e11
 
 
f3b0a73
 
 
5f35e11
 
 
 
 
 
 
 
 
 
0418435
 
5f35e11
0418435
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
library_name: RAT
language:
- en
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
tags:
- efficient architecture
- recurrence
- attention
- pretraining
metrics:
- perplexity
- accuracy
---

## Description
Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416).

## Citation
If you find it useful, please consider citing the paper:
```
@article{wei2025rat,
  title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling},
  author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar},
  journal={arXiv preprint arXiv:2507.04416},
  year={2025}
}
```