yakRNA-Design / README.md
MasterYster's picture
Upload README.md with huggingface_hub
8bca270 verified
metadata
license: mit
language:
  - en
tags:
  - rna
  - biology
  - diffusion
  - generative
  - sequence-design

yakRNA Design: A semantic multimodal RNA composer

yakRNA Design is a 110M-parameter discrete diffusion language model for conditional RNA sequence design. It can generate RNA sequences conditioned on any combination of secondary structure, consensus sequence, and Gene Ontology (GO) terms — or unconditionally from a target length.


Model Details

Architecture ModernBERT-based discrete diffusion
Parameters 110M
Training data Full Rfam database
Max sequence length 636 nt
Supported GO terms 280

Quickstart

The easiest way to use yakRNA Design is via the Google Colab notebook — no setup required.

For local use, see the GitHub repository.


Download the Weights

CLI:

pip install huggingface_hub
huggingface-cli download MasterYster/yakRNA-Design yakRNA_110M.pt --local-dir checkpoints/

Python:

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="MasterYster/yakRNA-Design", filename="yakRNA_110M.pt", local_dir="checkpoints/")

Generation Modes

Mode Description
Unconditional Generate from a target length
Structure-conditioned Dot-bracket secondary structure
Consensus-conditioned Mixed-case IUPAC consensus
GO term-conditioned Up to 12 GO terms
Sequence infilling Fixed positions + * masks
Multimodal Any combination of the above

Example Usage

# Install
git clone https://github.com/YousufAKhan/yakRNA.git
cd yakRNA
pip install -r requirements.txt

# Generate 5 sequences conditioned on a hairpin structure
python inference/rna_sequence_generator.py \
    --config configs/inference.yaml \
    --checkpoint checkpoints/yakRNA_110M.pt \
    --secondary_structure "((((....))))" \
    --num_sequences 5

# Generate with all three modalities
python inference/rna_sequence_generator.py \
    --config configs/inference.yaml \
    --checkpoint checkpoints/yakRNA_110M.pt \
    --secondary_structure ":::::::<<<<<<<<<-:::--[[[[[-->>>>>>>>><<<<<<<<<<_________>>>->>>>>>>::::]]]]]::::" \
    --consensus "GAGUaaGGGGuuCuAGU...gcaGCcCgcCUaGaaCCCUGcgacacuGGuucuaaaaCagAugucgUuuuaAGgGCuUUUG" \
    --go_terms "GO:0075523" \
    --num_sequences 5

Links


Citation

@software{yakrna2026,
  author = {Khan, Yousuf},
  title  = {yakRNA Design: A semantic multimodal RNA composer},
  year   = {2026},
  url    = {https://github.com/YousufAKhan/yakRNA}
}