|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
## About |
|
|
|
|
|
RNA-FM (RNA Foundation Model) is a state-of-the-art **pretrained language model for RNA sequences**, serving as the foundation for an integrated RNA research ecosystem. |
|
|
Trained on **23+ million non-coding RNA (ncRNA) sequences** via self-supervised learning, RNA-FM extracts comprehensive structural and functional information from RNA sequences *without* relying on experimental labels. |
|
|
**[mRNA‑FM](https://arxiv.org/abs/2204.00300)** is a direct extension of RNA-FM, trained exclusively on 45 million mRNA coding sequences (CDS). |
|
|
It is specifically designed to capture information unique to mRNA and has demonstrated excellent performance in related tasks. |
|
|
Consequently, RNA-FM generates **general-purpose RNA embeddings** suitable for a broad range of downstream tasks, including but not limited to secondary and tertiary structure prediction, RNA family clustering, and functional RNA analysis. |
|
|
|
|
|
|
|
|
The full codes are available at GitHub: https://github.com/ml4bio/RNA-FM. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use the model in your research, please cite our paper with the following. |
|
|
|
|
|
``` |
|
|
@article{chen2022interpretable, |
|
|
title={Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions}, |
|
|
author={Chen, Jiayang and Hu, Zhihang and Sun, Siqi and Tan, Qingxiong and Wang, Yixuan and Yu, Qinze and Zong, Licheng and Hong, Liang and Xiao, Jin and Shen, Tao and others}, |
|
|
journal={arXiv preprint arXiv:2204.00300}, |
|
|
year={2022} |
|
|
} |
|
|
|
|
|
@article{shen2024accurate, |
|
|
title={Accurate RNA 3D structure prediction using a language model-based deep learning approach}, |
|
|
author={Shen, Tao and Hu, Zhihang and Sun, Siqi and Liu, Di and Wong, Felix and Wang, Jiuming and Chen, Jiayang and Wang, Yixuan and Hong, Liang and Xiao, Jin and others}, |
|
|
journal={Nature Methods}, |
|
|
pages={1--12}, |
|
|
year={2024}, |
|
|
publisher={Nature Publishing Group US New York} |
|
|
} |
|
|
|
|
|
@article{chen2020rna, |
|
|
title={RNA secondary structure prediction by learning unrolled algorithms}, |
|
|
author={Chen, Xinshi and Li, Yu and Umarov, Ramzan and Gao, Xin and Song, Le}, |
|
|
journal={arXiv preprint arXiv:2002.05810}, |
|
|
year={2020} |
|
|
} |
|
|
``` |