Milanmg
/

LLM-RNA-Design-2025

Model card Files Files and versions

Add model card and metadata

#1

by nielsr HF Staff - opened Feb 16

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +47 -0

README.md ADDED Viewed

	@@ -0,0 +1,47 @@

+---
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- biology
+- rna-design
+---
+# Designing RNAs with Language Models
+RNA-Design-LM is a research codebase and model for designing RNA sequences using autoregressive language models. Instead of solving each RNA inverse-folding instance from scratch with combinatorial search, this approach reframes RNA design as conditional sequence generation.
+## Description
+The model is instantiated as a decoder-only Transformer (based on the Qwen2 architecture) that maps target secondary structures (represented as dot–bracket strings) directly to RNA sequences. It was trained in a supervised setting on structure–sequence pairs and further optimized using reinforcement learning (RL) to improve thermodynamic folding metrics such as Boltzmann probability, ensemble defect, and MFE uniqueness.
+- **Paper:** [Designing RNAs with Language Models](https://huggingface.co/papers/2602.12470)
+- **Repository:** [KuNyaa/RNA-Design-LM](https://github.com/KuNyaa/RNA-Design-LM)
+## Task and Training
+The model acts as a reusable neural approximator for RNA inverse folding. Key features include:
+- **Amortized Design:** Generates sequences for target structures in a single forward pass.
+- **RL Optimization:** End-to-end optimization for biological and thermodynamic metrics.
+- **Constrained Decoding:** Supports enforcing Watson–Crick–wobble pairing rules during generation to ensure structural validity.
+## Usage
+The model can be used for batched inference. For detailed implementation and evaluation, please refer to the [official GitHub repository](https://github.com/KuNyaa/RNA-Design-LM). Below is an example command provided by the authors for running inference with constrained decoding:
+```bash
+python ./scripts/constrained_decoding.py \
+  --test_path ./test/eterna100.jsonl \
+  --model_flavor slrl \
+  --n_repeats 100 \
+  --batch_size 1024 \
+  --do_sample \
+  --temp 2 \
+  --constrained_decode
+```
+## Citation
+If you use this model in your research, please cite the following paper:
+```bibtex
+@article{rna_design_lm_2025,
+  title={Designing RNAs with Language Models},
+  journal={arXiv preprint arXiv:2602.12470},
+  year={2025}
+}
+```