# mRNA Scoring Models This directory contains built-in mRNA scoring models for the mRNA Design Studio. ## Available Models ### 1. RNAstructure MFE Scorer (`rna_structure_scorer.py`) **Purpose**: Predicts the minimum free energy (MFE) of mRNA secondary structure. **Method**: Uses ViennaRNA RNAfold algorithm to compute the thermodynamic stability of RNA secondary structures. More negative MFE values indicate stronger secondary structure formation. **Score Range**: 0-100 - **0-40**: Weak/unstable secondary structure (may be too unstructured) - **40-70**: Moderate secondary structure (**optimal range** for translation) - **70-100**: Strong secondary structure (may inhibit translation) **Dependencies**: - ViennaRNA Python package (optional) - If ViennaRNA is not available, falls back to GC-content based proxy scoring **Usage**: ```python from models import RNAStructureMFEScorer scorer = RNAStructureMFEScorer() score = scorer.score(sequence) ``` **Interpretation**: - Target moderate scores (40-70) for optimal translation efficiency - Very low scores suggest the mRNA may be prone to degradation - Very high scores suggest strong secondary structures that may block ribosome access --- ### 2. mRNA Stability Scorer (`mrna_stability_scorer.py`) **Purpose**: Composite stability prediction based on multiple sequence features. **Method**: Combines five established mRNA design principles: 1. **GC Content** (30% weight) - Optimal: 50-60% 2. **Codon Adaptation Index (CAI)** (25% weight) - Codon optimization for host organism 3. **Homopolymer Detection** (20% weight) - Penalizes long runs of identical nucleotides 4. **5' UTR Structure** (15% weight) - Moderate stability preferred 5. **Kozak Consensus** (10% weight) - Translation initiation efficiency **Score Range**: 0-100 - **0-40**: Poor stability/translation efficiency - **40-70**: Acceptable design - **70-100**: Excellent design **Dependencies**: - BioPython (optional, for advanced CAI calculation) - ViennaRNA (optional, for UTR structure analysis) **Parameters**: - `organism` (default: "human") - Target organism for codon optimization **Usage**: ```python from models import mRNAStabilityScorer scorer = mRNAStabilityScorer(organism="human") score = scorer.score(sequence) ``` **Individual Component Scores**: You can access individual component scores for detailed analysis: ```python scorer = mRNAStabilityScorer() # Individual component scores gc_score = scorer._score_gc_content(sequence) # 0-100 cai_score = scorer._score_cai(sequence) # 0-100 homopoly_score = scorer._score_homopolymers(sequence) # 0-100 utr_score = scorer._score_utr_structure(sequence) # 0-100 kozak_score = scorer._score_kozak(sequence) # 0-100 ``` **Interpretation**: - **70+**: Well-designed mRNA suitable for production - **40-70**: Moderate quality, may benefit from optimization - **<40**: Significant design issues, optimization strongly recommended --- ## Model Registry Integration Both models implement the `ScoringModel` interface and can be loaded into the ModelRegistry: ```python from models import ModelRegistry, RNAStructureMFEScorer, mRNAStabilityScorer registry = ModelRegistry() # Register built-in models registry._register(RNAStructureMFEScorer(), "scoring", "builtin", "models/rna_structure_scorer.py") registry._register(mRNAStabilityScorer(), "scoring", "builtin", "models/mrna_stability_scorer.py") # Run scoring on sequences import pandas as pd results = registry.run_scoring("RNAstructure MFE", sequences) ``` --- ## Testing Run tests for both models: ```bash pytest tests/test_models.py::TestRNAStructureMFEScorer -v pytest tests/test_models.py::TestmRNAStabilityScorer -v ``` --- ## Adding Custom Models To add your own scoring model: 1. Create a new Python file in this directory 2. Import and subclass `ScoringModel`: ```python from models.base import ScoringModel from core.models.sequence import mRNASequence class MyCustomScorer(ScoringModel): @property def name(self) -> str: return "My Custom Scorer" @property def description(self) -> str: return "Description of what this model does" def score(self, sequence: mRNASequence, metadata=None) -> float: # Your scoring logic here return 0.0 # Return score 0-100 ``` 3. Load it via the ModelRegistry: ```python models = registry.load_local("path/to/your_model.py") ``` --- ## References ### RNAstructure MFE Scorer - Lorenz et al. (2011). "ViennaRNA Package 2.0." *Algorithms for Molecular Biology*, 6:26. - Turner & Mathews (2010). "NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure." *Nucleic Acids Research*, 38:D280-282. ### mRNA Stability Scorer - Mauro & Edelman (2002). "The ribosome filter hypothesis." *PNAS*, 99(19):12031-12036. (Kozak sequence) - Sharp & Li (1987). "The codon adaptation index—a measure of directional synonymous codon usage bias." *Nucleic Acids Research*, 15(3):1281-1295. - Kudla et al. (2009). "Coding-sequence determinants of gene expression in Escherichia coli." *Science*, 324(5924):255-258. - Presnyak et al. (2015). "Codon optimality is a major determinant of mRNA stability." *Cell*, 160(6):1111-1124.