aedupuga
/

multioutput-regression-models

Joblib

Model card Files Files and versions

xet

Community

aedupuga commited on Oct 9, 2025

Commit

cdbd034

verified ·

1 Parent(s): a9a4f3e

Create README.md

Browse files

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# Model Card for aedupuga/multioutput-regression-models
+### Model Description
+This model card describes the multi-output regression models trained on the aedupuga/2025-scaffold-strucutres dataset. The models predict structural properties of DNA sequences based on their sequence and other features.
+- **Model developed by:**  Anuhya Edupuganti
+- **Model type:** Multi-output regression models (e.g., Ridge, Elastic Net, etc.)
+### Model Sources
+- **Dataset:** https://huggingface.co/datasets/aedupuga/2025-scaffold-strucutres
+### Direct Use
+- These models can be used to predict structural properties of new DNA sequences. The inputs should be the sequence (one hot encoded), length_bp, GC_content, and AT_content in the same format as the training data.
+## Bias, Risks, and Limitations
+- The models are trained on a specific dataset and may not generalize well to sequences with significantly different characteristics.
+## Training Data:
+The models were trained on the original split of the aedupuga/2025-scaffold-strucutres dataset, which contains features like sequence, length_bp, GC_content and target variables mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, and num_internal_loops.
+## Evaluation Data:
+The models were evaluated using Mean Absolute Error (MAE) per target variable, Overall Mean Squared Error (MSE), and Overall R2 score on a test set. The results of this evaluation are below:
+|index|MAE per Target|Overall MSE|Overall R2|Training Time \(s\)|Prediction Time \(s\)|
+|---|---|---|---|---|---|
+|Elastic Net Regression|\{'mfe\_energy': 52\.246284144510895, 'num\_pairs': 26\.310440395684935, 'stem\_len\_mean': 0\.12521268046915585, 'num\_stems': 11\.824946984005694, 'num\_hairpins': 6\.362566878951059, 'num\_internal\_loops': 10\.42332493488957\}|1106\.2239040178551|0\.826949061716721|37\.89513540267944|0\.1340947151184082|
+|Gradient Boosting Regressor|\{'mfe\_energy': 93\.86046583448288, 'num\_pairs': 62\.12858533728426, 'stem\_len\_mean': 0\.1195790099334551, 'num\_stems': 19\.521731017111673, 'num\_hairpins': 8\.17095118930435, 'num\_internal\_loops': 13\.708766069413938\}|8056\.465535344057|0\.6354714816262127|1064\.1453528404236|0\.1442549228668213|
+|Hist Gradient Boosting Regressor|\{'mfe\_energy': 92\.7948317451044, 'num\_pairs': 119\.05137751966541, 'stem\_len\_mean': 0\.09455135368867978, 'num\_stems': 38\.937795002481145, 'num\_hairpins': 14\.538582916907997, 'num\_internal\_loops': 17\.869036566267987\}|22401\.159492850904|0\.8354263411439559|2276\.7718391418457|0\.05630350112915039|
+|LGBM Regressor|\{'mfe\_energy': 101\.99282118712706, 'num\_pairs': 118\.43061288454638, 'stem\_len\_mean': 0\.09833922311726692, 'num\_stems': 40\.143725672660345, 'num\_hairpins': 14\.649323146842754, 'num\_internal\_loops': 17\.48710432164195\}|23866\.947492270672|0\.8261400755125136|110\.61460065841675|2\.587249279022217|
+|Ridge Regression|\{'mfe\_energy': 53\.306863779432625, 'num\_pairs': 25\.654395957994026, 'stem\_len\_mean': 0\.08403309633471835, 'num\_stems': 11\.393997952747661, 'num\_hairpins': 5\.67977376648804, 'num\_internal\_loops': 9\.260745328034114\}|1260\.7624462037288|0\.9156932974948483|7\.063617944717407|0\.12312531471252441|
+|Lasso Regression|\{'mfe\_energy': 67\.2766660142239, 'num\_pairs': 31\.48700612938905, 'stem\_len\_mean': 0\.12521713179836697, 'num\_stems': 13\.158785656539967, 'num\_hairpins': 6\.854702974737726, 'num\_internal\_loops': 11\.13869663689622\}|1823\.6267070867707|0\.8248397294025618|51\.86927938461304|0\.12734723091125488|
+|MLP Regressor|\{'mfe\_energy': 113\.60031276554486, 'num\_pairs': 76\.11145098696264, 'stem\_len\_mean': 1\.7844990300743258, 'num\_stems': 19\.919928534641326, 'num\_hairpins': 9\.225894814725708, 'num\_internal\_loops': 13\.794781026278551\}|5507\.494866833836|-34\.39226684672794|68\.65580224990845|0\.13591504096984863|
+## Model Card Contact
+Anuhya Edupuganti (Carnegie Mellon Univerity)- aedupuga@andrew.cmu.edu