# Multi-output DNA Structure Regressor (PyTorch) ## Description This model is a **multi-output DNA structure regressor** built and trained from scratch in **PyTorch**. It predicts six structural stability metrics — including Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops — directly from engineered DNA sequence features. Trained on the [aedupuga/2025-scaffold-structures] dataset, the model provides a fast, lightweight alternative to more complex and time-consuming simulation tools like **NUPACK**, enabling near-instant predictions for plasmid stability analysis. ## Model - **Architecture:** 3-layer MLP (512→256→128, dropout 0.3) - **Inputs:** 109658 features - **Outputs:** 6 targets → mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, num_internal_loops - **Loss:** MSE - **Optimizer:** Adam (lr=0.0001) - **Epochs:** 15 ## Metrics (test) - Overall MSE: `15022.6787` - Overall R²: `-34.0313` - Training time (s): `131.85` - Prediction time (s): `0.2694` ### MAE per target ```json { "mfe_energy": 139.4054718017578, "num_pairs": 116.53337097167969, "stem_len_mean": 2.4054114818573, "num_stems": 69.17422485351562, "num_hairpins": 14.115099906921387, "num_internal_loops": 94.97564697265625 } ``` ## Usage ```bash pip install torch numpy python inference.py ``` Ensure to apply any preprocessing (e.g., scaling, SVD) used during training. ## Limitations - Performance is less reliable for shorter DNA strands, as the training data primarily consists of longer plasmid sequences. - The model is intended for **educational and exploratory research use**, not for experimental or clinical validation.