File size: 618 Bytes
f3b0a73 5f35e11 f3b0a73 5f35e11 0418435 5f35e11 0418435 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
---
library_name: RAT
language:
- en
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
tags:
- efficient architecture
- recurrence
- attention
- pretraining
metrics:
- perplexity
- accuracy
---
## Description
Models trained from [RAT Paper](https://arxiv.org/abs/2507.04416).
## Citation
If you find it useful, please consider citing the paper:
```
@article{wei2025rat,
title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling},
author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar},
journal={arXiv preprint arXiv:2507.04416},
year={2025}
}
``` |