RNAIX

A deep learning model for RNA 3D structure prediction using diffusion and multi-modal embeddings. Developed for the Stanford RNA 3D Folding Kaggle competition.

RNAIX is heavily insipired and builds upon RibonanzaNet and Protenix models.

Description

RNAIX is a deep learning model designed to predict RNA 3D structures by integrating multiple sources of information, including sequence data, MSA-derived embeddings, frequency profiles, and structural priors from external predictions. It is built around a Pairformer-based encoder and uses a diffusion process to generate 3D coordinates.

Features

Sequence and MSA-based token embeddings
Alignment-derived frequency profile embeddings
Structural priors from Protenix predictions
Pairformer backbone with recycling
Diffusion-based coordinate generation head

Architecture

The model consists of the following core modules:

RNASequenceEmbedder – token-level embedding of the input RNA sequence
MSAEmbedder – embedding of multiple sequence alignment (MSA) inputs
MSAProfileEmbedder – embedding of alignment-derived frequency profiles
ProtenixStructuralEncoder – encodes template structure predictions from Protenix
RNAIX – feature fusion and prediction model using a Pairformer backbone with recycling and loss on 3D coordinates

Inputs

RNAIX takes a single RNA target with the following inputs:

RNA sequence: Tokenized primary RNA sequence
MSA matrix: Tokenized multiple sequence alignment
Protenix coordinates: Predicted 3D structure used as a structural prior

Assumes that MSA alignments are precomputed.

Usage

import torch
from rnaix.model.model import RNAIX

path_checkpoint = "../sample_model/model_v01.pt"
device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint = torch.load(path_checkpoint, map_location=device, weights_only=False)
model = RNAIX(checkpoint["config"])
model.load_state_dict(checkpoint["model_state_dict"])
model.to(device)
model.eval()

Training

RNAIX was trained on the Stanford RNA 3D Folding dataset, using only sequences with complete 3D coordinate annotations. Sequences with missing coordinates were excluded during training.

Limitations

Requires precomputed MSA alignments and Protenix structure predictions
Model does not support training on sequences with partially missing coordinates

Datasets

Code

RNAIX

References

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support