A newer version of the Gradio SDK is available: 6.13.0
title: Sequence Feature Predictor
emoji: 🧬
sdk: gradio
app_file: app.py
Sequence Feature Predictor
Project Description
This project provides a Gradio interface to predict structural features of DNA/RNA sequences using a trained Elastic Net regression model. Users can input a DNA/RNA sequence and the application will output predicted values for features such as sequence length, GC content, Minimum Free Energy (MFE), number of base pairs, mean stem length, number of stems, number of hairpins, and number of internal loops. The model is hosted on Hugging Face Spaces, making it easily accessible and shareable.
How it Works
The application takes a DNA/RNA sequence as input. It performs the following steps:
- Sequence Preparation: Converts the input sequence to uppercase. Handles empty sequences by returning default values.
- Feature Calculation: Calculates basic features like sequence length and GC content.
- One-Hot Encoding: One-hot encodes the sequence and pads it to a target size expected by the model.
- Feature Combination: Combines the numerical features (length, GC content) with the one-hot encoded sequence features.
- Model Prediction: Uses a pre-trained Elastic Net regression model (loaded from Hugging Face Hub) to predict the structural features based on the combined features.
- Output Formatting: Formats the predicted values and displays them in the Gradio interface.
Predicted Features
The application predicts and displays the following features:
- Sequence Length (base pairs): The total number of nucleotides in the sequence.
- GC Content (%): The percentage of Guanine (G) and Cytosine (C) bases in the sequence.
- Minimum Free Energy (MFE) (kcal/mol): A measure of the stability of the predicted secondary structure of the sequence. More negative values indicate a more stable structure.
- Number of Base Pairs: The number of paired nucleotides in the predicted secondary structure.
- Mean Stem Length: The average length of the helical regions (stems) in the predicted secondary structure.
- Number of Stems: The total count of helical regions (stems) in the predicted secondary structure.
- Number of Hairpins: The total count of hairpin loops in the predicted secondary structure.
- Number of Internal Loops: The total count of internal loops in the predicted secondary structure.
Model
The application uses an Elastic Net regression model trained for multi-output regression. This model is hosted on the Hugging Face Hub at aedupuga/multioutput-regression-models and is loaded using the huggingface_hub library.