YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

RNAIX

A deep learning model for RNA 3D structure prediction using diffusion and multi-modal embeddings. Developed for the Stanford RNA 3D Folding Kaggle competition.

RNAIX is heavily insipired and builds upon RibonanzaNet and Protenix models.

Description

RNAIX is a deep learning model designed to predict RNA 3D structures by integrating multiple sources of information, including sequence data, MSA-derived embeddings, frequency profiles, and structural priors from external predictions. It is built around a Pairformer-based encoder and uses a diffusion process to generate 3D coordinates.

Features

  • Sequence and MSA-based token embeddings
  • Alignment-derived frequency profile embeddings
  • Structural priors from Protenix predictions
  • Pairformer backbone with recycling
  • Diffusion-based coordinate generation head

Architecture

The model consists of the following core modules:

  • RNASequenceEmbedder – token-level embedding of the input RNA sequence
  • MSAEmbedder – embedding of multiple sequence alignment (MSA) inputs
  • MSAProfileEmbedder – embedding of alignment-derived frequency profiles
  • ProtenixStructuralEncoder – encodes template structure predictions from Protenix
  • RNAIX – feature fusion and prediction model using a Pairformer backbone with recycling and loss on 3D coordinates

Inputs

RNAIX takes a single RNA target with the following inputs:

  • RNA sequence: Tokenized primary RNA sequence
  • MSA matrix: Tokenized multiple sequence alignment
  • Protenix coordinates: Predicted 3D structure used as a structural prior

Assumes that MSA alignments are precomputed.

Usage

import torch
from rnaix.model.model import RNAIX

path_checkpoint = "../sample_model/model_v01.pt"
device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint = torch.load(path_checkpoint, map_location=device, weights_only=False)
model = RNAIX(checkpoint["config"])
model.load_state_dict(checkpoint["model_state_dict"])
model.to(device)
model.eval()

Training

RNAIX was trained on the Stanford RNA 3D Folding dataset, using only sequences with complete 3D coordinate annotations. Sequences with missing coordinates were excluded during training.

Limitations

  • Requires precomputed MSA alignments and Protenix structure predictions
  • Model does not support training on sequences with partially missing coordinates

Datasets

Code

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support