HackAdamHealth commited on
Commit
4205633
·
verified ·
1 Parent(s): 143fc10

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +109 -12
  2. app.py +134 -0
  3. requirements.txt +6 -0
  4. sample_data.csv +11 -0
README.md CHANGED
@@ -1,12 +1,109 @@
1
- ---
2
- title: Demo Cardio Safe
3
- emoji: 🔥
4
- colorFrom: gray
5
- colorTo: gray
6
- sdk: gradio
7
- sdk_version: 6.0.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧬 Bioinformatics AI Agent - Heart Failure Risk Prediction
2
+
3
+ A Gradio-based web interface for predicting heart failure risk from gene expression data.
4
+
5
+ ## 🚀 Quick Start
6
+
7
+ ### Local Development
8
+
9
+ 1. **Install dependencies:**
10
+ ```bash
11
+ pip install -r requirements.txt
12
+ ```
13
+
14
+ 2. **Run the application:**
15
+ ```bash
16
+ python app.py
17
+ ```
18
+
19
+ 3. **Open your browser:**
20
+ The app will automatically open at `http://localhost:7860`
21
+
22
+ ## 📁 Input File Format
23
+
24
+ Your input file should be structured as follows:
25
+
26
+ | Sample_ID (or Unnamed: 0) | Gene_1 | Gene_2 | Gene_3 | ... |
27
+ |---------------------------|--------|--------|--------|-----|
28
+ | Sample_001 | 0.234 | 1.567 | 0.891 | ... |
29
+ | Sample_002 | 0.456 | 1.234 | 0.678 | ... |
30
+ | Sample_003 | 0.789 | 1.890 | 0.345 | ... |
31
+
32
+ - **First column:** Sample identifiers (can be named or unnamed)
33
+ - **Remaining columns:** Numeric gene expression values
34
+
35
+ Supported formats: `.csv`, `.xlsx`
36
+
37
+ ## 📊 Output
38
+
39
+ The application returns a DataFrame with:
40
+ - **Sample_ID:** Original sample identifier
41
+ - **Age:** Predicted age (20-90 years)
42
+ - **Heart_Failure_Risk:** Risk score (0-1, where 1 indicates highest risk)
43
+
44
+ ## 🔧 Customization
45
+
46
+ ### Adding Your Model
47
+
48
+ Replace the placeholder prediction logic in `app.py`:
49
+
50
+ ```python
51
+ # Current placeholder (lines ~35-40):
52
+ Age = np.random.randint(20, 91, size=num_samples)
53
+ Heart_Failure_Risk = np.random.uniform(0, 1, size=num_samples)
54
+
55
+ # Replace with your model:
56
+ from transformers import AutoModel, AutoTokenizer
57
+ # or
58
+ import joblib
59
+ model = joblib.load('your_model.pkl')
60
+
61
+ # Then use:
62
+ predictions = model.predict(Model_Features)
63
+ ```
64
+
65
+ ## 🌐 Deploy to Hugging Face Spaces
66
+
67
+ 1. **Create a new Space:**
68
+ - Go to https://huggingface.co/spaces
69
+ - Click "Create new Space"
70
+ - Choose "Gradio" as the SDK
71
+ - Name your Space
72
+
73
+ 2. **Upload files:**
74
+ - Upload `app.py`
75
+ - Upload `requirements.txt`
76
+ - Upload your model files (if any)
77
+
78
+ 3. **Your Space will automatically build and deploy!**
79
+
80
+ ## 📦 Project Structure
81
+
82
+ ```
83
+ bioinformatics-space/
84
+ ├── app.py # Main Gradio application
85
+ ├── requirements.txt # Python dependencies
86
+ └── README.md # This file
87
+ ```
88
+
89
+ ## 🛠️ Technologies Used
90
+
91
+ - **Gradio:** Web interface framework
92
+ - **Pandas:** Data manipulation
93
+ - **NumPy:** Numerical operations
94
+ - **OpenPyXL:** Excel file support
95
+
96
+ ## 📝 Notes
97
+
98
+ - Current predictions are **placeholder values** for demonstration
99
+ - Replace the prediction logic with your trained model
100
+ - Ensure your model accepts the same feature format as your input data
101
+ - Consider adding data preprocessing steps if needed
102
+
103
+ ## 🤝 Contributing
104
+
105
+ Feel free to customize this application for your specific bioinformatics use case!
106
+
107
+ ## 📄 License
108
+
109
+ MIT License - Feel free to use and modify as needed.
app.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import numpy as np
4
+
5
+ def predict_risk(file):
6
+ """
7
+ Process uploaded gene expression data and predict heart failure risk.
8
+
9
+ Args:
10
+ file: Uploaded CSV or XLSX file
11
+
12
+ Returns:
13
+ DataFrame with Sample IDs, Age, and Heart Failure Risk predictions
14
+ """
15
+ try:
16
+ # Read the uploaded file
17
+ if file.name.endswith('.csv'):
18
+ df = pd.read_csv(file.name)
19
+ elif file.name.endswith('.xlsx'):
20
+ df = pd.read_excel(file.name)
21
+ else:
22
+ return pd.DataFrame({"Error": ["Unsupported file format. Please upload .csv or .xlsx"]})
23
+
24
+ # Step A: Extract the first column as Sample_IDs
25
+ # Handle both named and unnamed first columns
26
+ first_col_name = df.columns[0]
27
+ Sample_IDs = df.iloc[:, 0].values
28
+
29
+ # Step B: Extract all other columns as Model_Features (the floats)
30
+ Model_Features = df.iloc[:, 1:].values
31
+
32
+ # ---------------------------------------------------------
33
+ # REAL MODEL LOADING LOGIC (Add this part)
34
+ # ---------------------------------------------------------
35
+ import joblib
36
+ import os
37
+
38
+ # Load your model (ensure 'my_model.pkl' is in your Space's files)
39
+ # If your model is named differently, change this filename!
40
+ model_path = "my_model.pkl"
41
+
42
+ if os.path.exists(model_path):
43
+ model = joblib.load(model_path)
44
+
45
+ # Run the prediction on the extracted features
46
+ # This assumes your model outputs a list of lists like [[Age, Risk], [Age, Risk]]
47
+ predictions = model.predict(Model_Features)
48
+
49
+ # Split the results
50
+ # If your model outputs a different shape, you might need to adjust index [:, 0] or [:, 1]
51
+ Age = predictions[:, 0]
52
+ Heart_Failure_Risk = predictions[:, 1]
53
+
54
+ else:
55
+ # Fallback if model file is missing (prevents crashing during setup)
56
+ return pd.DataFrame({"Error": ["Model file not found. Please upload 'my_model.pkl'."]})
57
+
58
+ # ---------------------------------------------------------
59
+
60
+ # Step 4: Combine results into a new DataFrame
61
+ results_df = pd.DataFrame({
62
+ 'Sample_ID': Sample_IDs,
63
+ 'Age': Age,
64
+ 'Heart_Failure_Risk': np.round(Heart_Failure_Risk, 4)
65
+ })
66
+
67
+ return results_df
68
+
69
+ except Exception as e:
70
+ # Return error message as DataFrame
71
+ return pd.DataFrame({"Error": [f"An error occurred: {str(e)}"]})
72
+
73
+
74
+ # Create Gradio Interface
75
+ with gr.Blocks(title="Bioinformatics AI Agent - Heart Failure Risk Prediction") as demo:
76
+ gr.Markdown(
77
+ """
78
+ # 🧬 Bioinformatics AI Agent
79
+ ## Heart Failure Risk Prediction from Gene Expression Data
80
+
81
+ Upload your gene expression data file (.csv or .xlsx) to predict heart failure risk.
82
+
83
+ **Expected Format:**
84
+ - First column: Sample IDs (can be named or unnamed)
85
+ - Remaining columns: Gene expression values (numeric features)
86
+ """
87
+ )
88
+
89
+ with gr.Row():
90
+ with gr.Column():
91
+ file_input = gr.File(
92
+ label="Upload Gene Expression Data",
93
+ file_types=[".csv", ".xlsx"],
94
+ type="filepath"
95
+ )
96
+ predict_btn = gr.Button("Predict Risk", variant="primary")
97
+
98
+ with gr.Column():
99
+ output_dataframe = gr.Dataframe(
100
+ label="Prediction Results",
101
+ headers=["Sample_ID", "Age", "Heart_Failure_Risk"],
102
+ datatype=["str", "number", "number"],
103
+ row_count=10
104
+ )
105
+
106
+ gr.Markdown(
107
+ """
108
+ ### 📊 Output Columns:
109
+ - **Sample_ID**: Identifier from your input file
110
+ - **Age**: Predicted age (20-90 years)
111
+ - **Heart_Failure_Risk**: Risk score (0-1, where 1 is highest risk)
112
+
113
+ ---
114
+ *Note: Current predictions are placeholder values. Replace the prediction logic in `app.py` with your trained model.*
115
+ """
116
+ )
117
+
118
+ # Connect the button to the prediction function
119
+ predict_btn.click(
120
+ fn=predict_risk,
121
+ inputs=file_input,
122
+ outputs=output_dataframe
123
+ )
124
+
125
+ # Also allow prediction on file upload
126
+ file_input.change(
127
+ fn=predict_risk,
128
+ inputs=file_input,
129
+ outputs=output_dataframe
130
+ )
131
+
132
+ # Launch the app
133
+ if __name__ == "__main__":
134
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio==4.44.0
2
+ pandas==2.2.0
3
+ openpyxl==3.1.2
4
+ numpy==1.26.4
5
+ scikit-learn==1.4.0
6
+ joblib==1.3.2
sample_data.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Unnamed: 0,Gene_BRCA1,Gene_TP53,Gene_EGFR,Gene_KRAS,Gene_MYC,Gene_PTEN,Gene_RB1,Gene_APC,Gene_VHL,Gene_CDH1
2
+ Sample_001,0.234,1.567,0.891,2.345,0.678,1.234,0.456,1.890,0.345,1.123
3
+ Sample_002,0.456,1.234,0.678,2.123,0.890,1.456,0.234,1.678,0.567,1.345
4
+ Sample_003,0.789,1.890,0.345,2.567,0.123,1.678,0.890,1.456,0.789,1.567
5
+ Sample_004,0.123,1.456,0.567,2.890,0.345,1.890,0.123,1.234,0.901,1.789
6
+ Sample_005,0.567,1.678,0.789,2.234,0.567,1.123,0.567,1.890,0.234,1.901
7
+ Sample_006,0.890,1.123,0.901,2.456,0.789,1.345,0.789,1.567,0.456,1.234
8
+ Sample_007,0.234,1.345,0.234,2.678,0.901,1.567,0.901,1.345,0.678,1.456
9
+ Sample_008,0.678,1.567,0.456,2.901,0.234,1.789,0.234,1.123,0.890,1.678
10
+ Sample_009,0.901,1.789,0.678,2.345,0.456,1.901,0.456,1.901,0.123,1.890
11
+ Sample_010,0.345,1.901,0.890,2.567,0.678,1.234,0.678,1.678,0.345,1.123