library_name: transformers
pipeline_tag: text-generation
tags:
- biology
- rna-design
Designing RNAs with Language Models
RNA-Design-LM is a research codebase and model for designing RNA sequences using autoregressive language models. Instead of solving each RNA inverse-folding instance from scratch with combinatorial search, this approach reframes RNA design as conditional sequence generation.
Description
The model is instantiated as a decoder-only Transformer (based on the Qwen2 architecture) that maps target secondary structures (represented as dot–bracket strings) directly to RNA sequences. It was trained in a supervised setting on structure–sequence pairs and further optimized using reinforcement learning (RL) to improve thermodynamic folding metrics such as Boltzmann probability, ensemble defect, and MFE uniqueness.
- Paper: Designing RNAs with Language Models
- Repository: KuNyaa/RNA-Design-LM
Task and Training
The model acts as a reusable neural approximator for RNA inverse folding. Key features include:
- Amortized Design: Generates sequences for target structures in a single forward pass.
- RL Optimization: End-to-end optimization for biological and thermodynamic metrics.
- Constrained Decoding: Supports enforcing Watson–Crick–wobble pairing rules during generation to ensure structural validity.
Usage
The model can be used for batched inference. For detailed implementation and evaluation, please refer to the official GitHub repository. Below is an example command provided by the authors for running inference with constrained decoding:
python ./scripts/constrained_decoding.py \
--test_path ./test/eterna100.jsonl \
--model_flavor slrl \
--n_repeats 100 \
--batch_size 1024 \
--do_sample \
--temp 2 \
--constrained_decode
Citation
If you use this model in your research, please cite the following paper:
@article{rna_design_lm_2025,
title={Designing RNAs with Language Models},
journal={arXiv preprint arXiv:2602.12470},
year={2025}
}