Add Citation section with EVA paper and OpenRNA-v1 dataset references
Browse files
README.md
CHANGED
|
@@ -72,3 +72,25 @@ de novo IRES redesign via GLM masked infilling. — no fine-tuning required
|
|
| 72 |
Fine-tuning-ready for RNA aptamer optimization, CRISPR guide RNA (omegaRNA) generation,
|
| 73 |
and any custom RNA type of interest.
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
Fine-tuning-ready for RNA aptamer optimization, CRISPR guide RNA (omegaRNA) generation,
|
| 73 |
and any custom RNA type of interest.
|
| 74 |
|
| 75 |
+
## Citation
|
| 76 |
+
|
| 77 |
+
If you find EVA or OpenRNA-v1 useful in your research, please cite:
|
| 78 |
+
|
| 79 |
+
```bibtex
|
| 80 |
+
@article{huang2026eva,
|
| 81 |
+
title = {A Long-Context Generative Foundation Model Deciphers RNA Design Principles},
|
| 82 |
+
author = {Huang, Yanjie and Lv, Guangye and Cheng, Anyue and Xie, Wei and Chen, Mengyan and Ma, Xinyi and Huang, Yijun and Tang, Yueyang and Shi, Qingya and Wang, Zining and Wang, Junxi and Yunpeng, Xia and Zhao, Lu and Cai, Yifang and Chen, Jack Xiaoyu and Zheng, Shuangjia},
|
| 83 |
+
year = {2026},
|
| 84 |
+
journal = {bioRxiv},
|
| 85 |
+
doi = {10.64898/2026.03.17.712398},
|
| 86 |
+
url = {https://www.biorxiv.org/content/10.64898/2026.03.17.712398v1}
|
| 87 |
+
}
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
The training data (OpenRNA-v1) is available at [GENTEL-Lab/OpenRNA-v1-114M](https://huggingface.co/datasets/GENTEL-Lab/OpenRNA-v1-114M).
|
| 91 |
+
Please also cite the original data sources as appropriate. Key references:
|
| 92 |
+
|
| 93 |
+
- **RNAcentral:** RNAcentral Consortium. RNAcentral in 2026: genes and literature integration. *Nucleic Acids Research*, 54(D1):D303–D313, 2026.
|
| 94 |
+
- **Rfam:** Kalvari I, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. *Nucleic Acids Research*, 49(D1):D192–D200, 2021.
|
| 95 |
+
- **MMseqs2:** Steinegger M & Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. *Nature Biotechnology*, 35:1026–1028, 2017.
|
| 96 |
+
|