EVA: Evolutionary Versatile Architect

EVA is a generative foundation model for universal RNA modeling and design, trained on OpenRNA v1 β€” a curated atlas of 114 million full-length RNA sequences spanning all domains of life.

EVA Banner

Model Description

Property Details
Architecture Decoder-only Transformer + Mixture-of-Experts (MoE)
Parameters 1.4B (also available: 21M, 145M, 437M)
Context Window 8,192 tokens
Training Data 114M full-length RNA sequences (OpenRNA v1)
Training Objectives Causal LM (CLM) + Generalized LM (GLM)
Conditioning RNA type tags + taxonomic lineage tags

Model Variants

  • EVA_21M, EVA_145M, EVA_437M: trained with both CLM and GLM objectives, supporting both generation modes.
  • EVA_1.4B_GLM: the primary 1.4B model, trained with both CLM and GLM objectives.
  • EVA_1.4B_CLM: an additional 1.4B checkpoint trained exclusively with the CLM objective.

For instructions, details, and examples, please refer to our technical report and GitHub repository.

Open-Source Resources

Resource Link Description
πŸ“„ Paper bioRxiv Technical report
πŸ’» GitHub GENTEL-lab/EVA Full codebase
πŸ—„οΈ Training Data OpenRNA-v1-114M 114M full-length RNA sequences
πŸ‹οΈ Training Code training/ Pretraining, midtraining & evaluation for MoE and dense models
πŸ”§ Finetuning Code finetune/ Finetuning pipelines
πŸ”¬ Interpretability notebooks/interpretability_analysis/ SAE interpretability analysis
πŸ§ͺ Inference & Design tools/ Fitness prediction, CLM/GLM design, directed evolution

Capabilities

πŸ”¬ Zero-shot Fitness Prediction
Predicts mutational effects across RNA, DNA gene regions, and proteins using evolutionary likelihood β€” no fine-tuning required.

🧬 Controllable RNA Generation
Supports de novo generation and targeted region redesign across 11 RNA classes: mRNA, lncRNA, circRNA, tRNA, rRNA, miRNA, piRNA, sRNA, snRNA, snoRNA, and viral RNA. β€” no fine-tuning required

πŸ’‰ Vaccine Design
Species-aware codon optimization for mRNA and circRNA vaccines; de novo IRES redesign via GLM masked infilling. β€” no fine-tuning required

🦠 Functional RNA Engineering
Fine-tuning-ready for RNA aptamer optimization, CRISPR guide RNA (omegaRNA) generation, and any custom RNA type of interest.

Citation

If you find EVA or OpenRNA-v1 useful in your research, please cite:

@article{huang2026eva,
  title = {A Long-Context Generative Foundation Model Deciphers RNA Design Principles},
  author = {Huang, Yanjie and Lv, Guangye and Cheng, Anyue and Xie, Wei and Chen, Mengyan and Ma, Xinyi and Huang, Yijun and Tang, Yueyang and Shi, Qingya and Wang, Zining and Wang, Junxi and Yunpeng, Xia and Zhao, Lu and Cai, Yifang and Chen, Jack Xiaoyu and Zheng, Shuangjia},
  year = {2026},
  journal = {bioRxiv},
  doi = {10.64898/2026.03.17.712398},
  url = {https://www.biorxiv.org/content/10.64898/2026.03.17.712398v1}
}

The training data (OpenRNA-v1) is available at GENTEL-Lab/OpenRNA-v1-114M. Please also cite the original data sources as appropriate. Key references:

  • RNAcentral: RNAcentral Consortium. RNAcentral in 2026: genes and literature integration. Nucleic Acids Research, 54(D1):D303–D313, 2026.
  • Rfam: Kalvari I, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Research, 49(D1):D192–D200, 2021.
  • MMseqs2: Steinegger M & SΓΆding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35:1026–1028, 2017.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train GENTEL-Lab/EVA