Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,43 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## About
|
| 6 |
+
|
| 7 |
+
RNA-FM (RNA Foundation Model) is a state-of-the-art **pretrained language model for RNA sequences**, serving as the foundation for an integrated RNA research ecosystem.
|
| 8 |
+
Trained on **23+ million non-coding RNA (ncRNA) sequences** via self-supervised learning, RNA-FM extracts comprehensive structural and functional information from RNA sequences *without* relying on experimental labels.
|
| 9 |
+
**[mRNA‑FM](https://arxiv.org/abs/2204.00300)** is a direct extension of RNA-FM, trained exclusively on 45 million mRNA coding sequences (CDS).
|
| 10 |
+
It is specifically designed to capture information unique to mRNA and has demonstrated excellent performance in related tasks.
|
| 11 |
+
Consequently, RNA-FM generates **general-purpose RNA embeddings** suitable for a broad range of downstream tasks, including but not limited to secondary and tertiary structure prediction, RNA family clustering, and functional RNA analysis.
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
The full codes are available at GitHub: https://github.com/ml4bio/RNA-FM.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
If you use the model in your research, please cite our paper with the following.
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
@article{chen2022interpretable,
|
| 22 |
+
title={Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions},
|
| 23 |
+
author={Chen, Jiayang and Hu, Zhihang and Sun, Siqi and Tan, Qingxiong and Wang, Yixuan and Yu, Qinze and Zong, Licheng and Hong, Liang and Xiao, Jin and Shen, Tao and others},
|
| 24 |
+
journal={arXiv preprint arXiv:2204.00300},
|
| 25 |
+
year={2022}
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
@article{shen2024accurate,
|
| 29 |
+
title={Accurate RNA 3D structure prediction using a language model-based deep learning approach},
|
| 30 |
+
author={Shen, Tao and Hu, Zhihang and Sun, Siqi and Liu, Di and Wong, Felix and Wang, Jiuming and Chen, Jiayang and Wang, Yixuan and Hong, Liang and Xiao, Jin and others},
|
| 31 |
+
journal={Nature Methods},
|
| 32 |
+
pages={1--12},
|
| 33 |
+
year={2024},
|
| 34 |
+
publisher={Nature Publishing Group US New York}
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
@article{chen2020rna,
|
| 38 |
+
title={RNA secondary structure prediction by learning unrolled algorithms},
|
| 39 |
+
author={Chen, Xinshi and Li, Yu and Umarov, Ramzan and Gao, Xin and Song, Le},
|
| 40 |
+
journal={arXiv preprint arXiv:2002.05810},
|
| 41 |
+
year={2020}
|
| 42 |
+
}
|
| 43 |
+
```
|