| # Multi-output DNA Structure Regressor (PyTorch) | |
| ## Description | |
| This model is a **multi-output DNA structure regressor** built and trained from scratch in **PyTorch**. | |
| It predicts six structural stability metrics β including Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops β directly from engineered DNA sequence features. | |
| Trained on the [aedupuga/2025-scaffold-structures] dataset, the model provides a fast, lightweight alternative to more complex and time-consuming simulation tools like **NUPACK**, enabling near-instant predictions for plasmid stability analysis. | |
| ## Model | |
| - **Architecture:** 3-layer MLP (512β256β128, dropout 0.3) | |
| - **Inputs:** 109658 features | |
| - **Outputs:** 6 targets β mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, num_internal_loops | |
| - **Loss:** MSE | |
| - **Optimizer:** Adam (lr=0.0001) | |
| - **Epochs:** 15 | |
| ## Metrics (test) | |
| - Overall MSE: `15022.6787` | |
| - Overall RΒ²: `-34.0313` | |
| - Training time (s): `131.85` | |
| - Prediction time (s): `0.2694` | |
| ### MAE per target | |
| ```json | |
| { | |
| "mfe_energy": 139.4054718017578, | |
| "num_pairs": 116.53337097167969, | |
| "stem_len_mean": 2.4054114818573, | |
| "num_stems": 69.17422485351562, | |
| "num_hairpins": 14.115099906921387, | |
| "num_internal_loops": 94.97564697265625 | |
| } | |
| ``` | |
| ## Usage | |
| ```bash | |
| pip install torch numpy | |
| python inference.py | |
| ``` | |
| Ensure to apply any preprocessing (e.g., scaling, SVD) used during training. | |
| ## Limitations | |
| - Performance is less reliable for shorter DNA strands, as the training data primarily consists of longer plasmid sequences. | |
| - The model is intended for **educational and exploratory research use**, not for experimental or clinical validation. |