File size: 4,541 Bytes
bf2d308 514db67 bf2d308 0a68408 bf2d308 f4b6c7b bf2d308 0a68408 bf2d308 0a68408 bf2d308 0a68408 f28d058 4e41e6a f28d058 bf2d308 8983900 bf2d308 8983900 bf2d308 ab81311 bf2d308 b99cbd8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | ---
license: openrail
language:
- en
tags:
- RNA
- biology
- generative
- foundation-model
- language-model
- sequence-design
pipeline_tag: text-generation
datasets:
- GENTEL-Lab/OpenRNA-v1-114M
---
# EVA: Evolutionary Versatile Architect
EVA is a generative foundation model for universal RNA modeling and design,
trained on **OpenRNA v1** โ a curated atlas of **114 million full-length RNA
sequences** spanning all domains of life.

## Model Description
| Property | Details |
|----------|---------|
| Architecture | Decoder-only Transformer + Mixture-of-Experts (MoE) |
| Parameters | 1.4B (also available: 21M, 145M, 437M) |
| Context Window | 8,192 tokens |
| Training Data | 114M full-length RNA sequences (OpenRNA v1) |
| Training Objectives | Causal LM (CLM) + Generalized LM (GLM) |
| Conditioning | RNA type tags + taxonomic lineage tags |
> **Model Variants**
> - `EVA_21M`, `EVA_145M`, `EVA_437M`: trained with both CLM and GLM objectives, supporting both generation modes.
> - `EVA_1.4B_GLM`: the primary 1.4B model, trained with both CLM and GLM objectives.
> - `EVA_1.4B_CLM`: an additional 1.4B checkpoint trained exclusively with the CLM objective.
For instructions, details, and examples, please refer to our
[technical report](https://www.biorxiv.org/content/10.64898/2026.03.17.712398v1) and
[GitHub repository](https://github.com/GENTEL-lab/EVA).
## Open-Source Resources
| Resource | Link | Description |
|----------|------|-------------|
| ๐ Paper | [bioRxiv](https://www.biorxiv.org/content/10.64898/2026.03.17.712398v1) | Technical report |
| ๐ป GitHub | [GENTEL-lab/EVA](https://github.com/GENTEL-lab/EVA) | Full codebase |
| ๐๏ธ Training Data | [OpenRNA-v1-114M](https://huggingface.co/datasets/GENTEL-Lab/OpenRNA-v1-114M) | 114M full-length RNA sequences |
| ๐๏ธ Training Code | [training/](https://github.com/GENTEL-lab/EVA/tree/main/training) | Pretraining, midtraining & evaluation for MoE and dense models |
| ๐ง Finetuning Code | [finetune/](https://github.com/GENTEL-lab/EVA/tree/main/finetune) | Finetuning pipelines |
| ๐ฌ Interpretability | [notebooks/interpretability_analysis/](https://github.com/GENTEL-lab/EVA/tree/main/notebooks/interpretability_analysis) | SAE interpretability analysis |
| ๐งช Inference & Design | [tools/](https://github.com/GENTEL-lab/EVA/tree/main/tools) | Fitness prediction, CLM/GLM design, directed evolution |
## Capabilities
**๐ฌ Zero-shot Fitness Prediction**
Predicts mutational effects across RNA, DNA gene regions, and proteins
using evolutionary likelihood โ no fine-tuning required.
**๐งฌ Controllable RNA Generation**
Supports de novo generation and targeted region redesign across 11 RNA
classes: mRNA, lncRNA, circRNA, tRNA, rRNA, miRNA, piRNA, sRNA, snRNA,
snoRNA, and viral RNA. โ no fine-tuning required
**๐ Vaccine Design**
Species-aware codon optimization for mRNA and circRNA vaccines;
de novo IRES redesign via GLM masked infilling. โ no fine-tuning required
**๐ฆ Functional RNA Engineering**
Fine-tuning-ready for RNA aptamer optimization, CRISPR guide RNA (omegaRNA) generation,
and any custom RNA type of interest.
## Citation
If you find EVA or OpenRNA-v1 useful in your research, please cite:
```bibtex
@article{huang2026eva,
title = {A Long-Context Generative Foundation Model Deciphers RNA Design Principles},
author = {Huang, Yanjie and Lv, Guangye and Cheng, Anyue and Xie, Wei and Chen, Mengyan and Ma, Xinyi and Huang, Yijun and Tang, Yueyang and Shi, Qingya and Wang, Zining and Wang, Junxi and Yunpeng, Xia and Zhao, Lu and Cai, Yifang and Chen, Jack Xiaoyu and Zheng, Shuangjia},
year = {2026},
journal = {bioRxiv},
doi = {10.64898/2026.03.17.712398},
url = {https://www.biorxiv.org/content/10.64898/2026.03.17.712398v1}
}
```
The training data (OpenRNA-v1) is available at [GENTEL-Lab/OpenRNA-v1-114M](https://huggingface.co/datasets/GENTEL-Lab/OpenRNA-v1-114M).
Please also cite the original data sources as appropriate. Key references:
- **RNAcentral:** RNAcentral Consortium. RNAcentral in 2026: genes and literature integration. *Nucleic Acids Research*, 54(D1):D303โD313, 2026.
- **Rfam:** Kalvari I, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. *Nucleic Acids Research*, 49(D1):D192โD200, 2021.
- **MMseqs2:** Steinegger M & Sรถding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. *Nature Biotechnology*, 35:1026โ1028, 2017.
|