Spaces:
Sleeping
mRNA Scoring Models
This directory contains built-in mRNA scoring models for the mRNA Design Studio.
Available Models
1. RNAstructure MFE Scorer (rna_structure_scorer.py)
Purpose: Predicts the minimum free energy (MFE) of mRNA secondary structure.
Method: Uses ViennaRNA RNAfold algorithm to compute the thermodynamic stability of RNA secondary structures. More negative MFE values indicate stronger secondary structure formation.
Score Range: 0-100
- 0-40: Weak/unstable secondary structure (may be too unstructured)
- 40-70: Moderate secondary structure (optimal range for translation)
- 70-100: Strong secondary structure (may inhibit translation)
Dependencies:
- ViennaRNA Python package (optional)
- If ViennaRNA is not available, falls back to GC-content based proxy scoring
Usage:
from models import RNAStructureMFEScorer
scorer = RNAStructureMFEScorer()
score = scorer.score(sequence)
Interpretation:
- Target moderate scores (40-70) for optimal translation efficiency
- Very low scores suggest the mRNA may be prone to degradation
- Very high scores suggest strong secondary structures that may block ribosome access
2. mRNA Stability Scorer (mrna_stability_scorer.py)
Purpose: Composite stability prediction based on multiple sequence features.
Method: Combines five established mRNA design principles:
- GC Content (30% weight) - Optimal: 50-60%
- Codon Adaptation Index (CAI) (25% weight) - Codon optimization for host organism
- Homopolymer Detection (20% weight) - Penalizes long runs of identical nucleotides
- 5' UTR Structure (15% weight) - Moderate stability preferred
- Kozak Consensus (10% weight) - Translation initiation efficiency
Score Range: 0-100
- 0-40: Poor stability/translation efficiency
- 40-70: Acceptable design
- 70-100: Excellent design
Dependencies:
- BioPython (optional, for advanced CAI calculation)
- ViennaRNA (optional, for UTR structure analysis)
Parameters:
organism(default: "human") - Target organism for codon optimization
Usage:
from models import mRNAStabilityScorer
scorer = mRNAStabilityScorer(organism="human")
score = scorer.score(sequence)
Individual Component Scores:
You can access individual component scores for detailed analysis:
scorer = mRNAStabilityScorer()
# Individual component scores
gc_score = scorer._score_gc_content(sequence) # 0-100
cai_score = scorer._score_cai(sequence) # 0-100
homopoly_score = scorer._score_homopolymers(sequence) # 0-100
utr_score = scorer._score_utr_structure(sequence) # 0-100
kozak_score = scorer._score_kozak(sequence) # 0-100
Interpretation:
- 70+: Well-designed mRNA suitable for production
- 40-70: Moderate quality, may benefit from optimization
- <40: Significant design issues, optimization strongly recommended
Model Registry Integration
Both models implement the ScoringModel interface and can be loaded into the ModelRegistry:
from models import ModelRegistry, RNAStructureMFEScorer, mRNAStabilityScorer
registry = ModelRegistry()
# Register built-in models
registry._register(RNAStructureMFEScorer(), "scoring", "builtin", "models/rna_structure_scorer.py")
registry._register(mRNAStabilityScorer(), "scoring", "builtin", "models/mrna_stability_scorer.py")
# Run scoring on sequences
import pandas as pd
results = registry.run_scoring("RNAstructure MFE", sequences)
Testing
Run tests for both models:
pytest tests/test_models.py::TestRNAStructureMFEScorer -v
pytest tests/test_models.py::TestmRNAStabilityScorer -v
Adding Custom Models
To add your own scoring model:
- Create a new Python file in this directory
- Import and subclass
ScoringModel:
from models.base import ScoringModel
from core.models.sequence import mRNASequence
class MyCustomScorer(ScoringModel):
@property
def name(self) -> str:
return "My Custom Scorer"
@property
def description(self) -> str:
return "Description of what this model does"
def score(self, sequence: mRNASequence, metadata=None) -> float:
# Your scoring logic here
return 0.0 # Return score 0-100
- Load it via the ModelRegistry:
models = registry.load_local("path/to/your_model.py")
References
RNAstructure MFE Scorer
- Lorenz et al. (2011). "ViennaRNA Package 2.0." Algorithms for Molecular Biology, 6:26.
- Turner & Mathews (2010). "NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure." Nucleic Acids Research, 38:D280-282.
mRNA Stability Scorer
- Mauro & Edelman (2002). "The ribosome filter hypothesis." PNAS, 99(19):12031-12036. (Kozak sequence)
- Sharp & Li (1987). "The codon adaptation index—a measure of directional synonymous codon usage bias." Nucleic Acids Research, 15(3):1281-1295.
- Kudla et al. (2009). "Coding-sequence determinants of gene expression in Escherichia coli." Science, 324(5924):255-258.
- Presnyak et al. (2015). "Codon optimality is a major determinant of mRNA stability." Cell, 160(6):1111-1124.