Sumitkumar098 commited on
Commit
b0d5052
·
verified ·
1 Parent(s): 7a8bc6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +223 -3
README.md CHANGED
@@ -1,3 +1,223 @@
1
- ---
2
- license: unknown
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # Drug Prediction and Polypharmacy System
5
+ Developed by Sumit Kumar - 2025
6
+
7
+ [![Project License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
8
+ [![Python 3.x](https://img.shields.io/badge/Python-3.x-blue.svg)](https://www.python.org/downloads/)
9
+ [![Colab Notebooks](https://colab.research.google.com/assets/colab-badge.svg)](link-to-your-colab-notebook-here) <!-- Replace with your Colab Notebook Link if applicable -->
10
+
11
+ **Predict medications, assess polypharmacy risk, and predict diseases with AI.**
12
+
13
+ This repository contains the code for a Drug Prediction and Polypharmacy System, built using a Biomedical NLP model, `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`. This system is designed to assist in healthcare decision support by providing:
14
+
15
+ * 💊 **Medication Recommendations:** Intelligent suggestions for appropriate medications based on patient symptoms and medical context.
16
+ * ⚠️ **Polypharmacy Risk Assessment:** Identification of potential risks associated with using multiple medications concurrently.
17
+ * 🩺 **Disease Prediction:** Prediction of the likely disease a patient might be suffering from, based on their presented symptoms.
18
+
19
+ ## Table of Contents
20
+
21
+ * [Project Overview](#project-overview)
22
+ * [Key Features](#key-features)
23
+ * [Interactive Demo](#interactive-demo)
24
+ * [Model Architecture](#model-architecture)
25
+ * [Dataset](#dataset)
26
+ * [Performance](#performance)
27
+ * [Deployment](#deployment)
28
+ * [Quick Start](#quick-start)
29
+ * [Requirements](#requirements)
30
+ * [License](#license)
31
+ * [Contributions](#contributions)
32
+
33
+ ## Project Overview
34
+
35
+ This project aims to leverage the power of Natural Language Processing (NLP) and specifically biomedical language models to create a system that can provide valuable insights for healthcare professionals and patients regarding drug prescriptions and potential polypharmacy risks.
36
+
37
+ The system takes patient information, including:
38
+
39
+ * Age, Gender, Blood Group, Weight
40
+ * Symptoms (and their severity)
41
+ * (Optionally) Medical History and Allergies
42
+
43
+ ...and utilizes a fine-tuned `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract` model to predict:
44
+
45
+ * **Top 3 Recommended Medications** with dosage, frequency, instructions, duration, and confidence scores.
46
+ * **Polypharmacy Risk Level** (Low to Medium, Medium to High, Unknown).
47
+ * **Predicted Disease**
48
+
49
+ ## Key Features
50
+
51
+ * **Biomedical NLP Model:** Utilizes the `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract` model, pre-trained on a vast corpus of biomedical text for enhanced medical text understanding.
52
+ * **Multi-Task Learning:** Simultaneously predicts medications, polypharmacy risk, and disease for improved efficiency and knowledge sharing.
53
+ * **Enhanced Text Input:** Combines patient demographics, symptoms, and medical context into a structured input for richer information processing.
54
+ * **Class Imbalance Handling:** Implements weighted loss functions to address class imbalance issues, particularly in medication prediction.
55
+ * **Interactive Prediction Interface:** Includes a user-friendly widget-based interface (in the Jupyter Notebook) for easy experimentation and demonstration.
56
+ * **Comprehensive Output:** Provides detailed predictions including medication recommendations with usage instructions, polypharmacy risk assessment, and disease insights.
57
+
58
+ ## Interactive Demo
59
+
60
+ **[Option 1: Link to Colab Notebook (Highly Recommended for "Interactive")]**
61
+
62
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1uiNrRBHW6t-8p32Xq7FVf1cfL-TChG7_?authuser=4#scrollTo=q3aiZXxHkVUb)
63
+
64
+ > Click the "Open in Colab" badge above to launch an interactive Jupyter Notebook in Google Colab. You can run the notebook and use the interactive widget at the end to test the Drug Prediction and Polypharmacy System directly in your browser! (Remember to replace `link-to-your-colab-notebook-here` with your actual Colab notebook link).
65
+
66
+ **[Option 2:Interactive widget in the notebook.]**
67
+ ![image](https://github.com/user-attachments/assets/0f624018-82cb-4cb2-9a7f-d26b1cc4d7a3)
68
+
69
+
70
+ > To experience the interactive prediction interface, please run the Jupyter Notebook (`Drug_Prediction_and_Polypharmacy_System5.ipynb`). Cell 20 contains a widget-based form where you can input patient information and get real-time predictions from the model.
71
+
72
+ **[Optional: If you have a deployed web app, link to it here as Option 3]**
73
+
74
+ > **[Option 3: Try the Web App (if deployed)]**
75
+ >
76
+ > [Link to your deployed web application]
77
+ >
78
+ > You can also access a deployed version of the system as a web application [at the provided link]. This allows you to test the system without running any code locally.
79
+
80
+ ## Model Architecture
81
+
82
+ The system is built upon the `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract` pre-trained model. The architecture is enhanced with:
83
+
84
+ * **Multi-task heads:** Task-specific layers for medication prediction (multi-label), polypharmacy risk (multi-class), and disease prediction (multi-class).
85
+ * **Common Representation Layer:** A shared dense layer to facilitate knowledge transfer between tasks.
86
+ * **Dropout and Weight Initialization:** Regularization and weight initialization techniques are used for improved model performance and training stability.
87
+
88
+ **[Optional: Consider adding a simple diagram of the model architecture here.]**
89
+
90
+ ## Dataset
91
+
92
+ The model was trained and evaluated on a large patient dataset consisting of **15,000 patient records**. The dataset includes features such as:
93
+
94
+ * Patient demographics (Age, Gender, Blood Group, Weight)
95
+ * Symptoms and Severity Scores
96
+ * Medical History and Allergies
97
+ * Prescribed Medications (Medicine_1, Medicine_2, Medicine_3 with dosage, frequency, instruction, duration)
98
+ * Polypharmacy Risk Level
99
+ * Predicted Disease and related information (causes, prevention, health tips)
100
+
101
+ **[Optional: You could add a link to download a sample dataset or describe data sources (if applicable and permissible).]**
102
+
103
+ ## Performance
104
+
105
+ The model's performance on the test dataset is summarized below:
106
+
107
+ * **Medication Prediction:**
108
+ * Accuracy: 0.9377
109
+ * F1-score: 0.9480
110
+ * **Polypharmacy Risk Prediction:**
111
+ * Accuracy: 0.9960
112
+ * F1-score: 0.9960
113
+ * **Disease Prediction:**
114
+ * Accuracy: 0.9377
115
+ * F1-score: 0.9187
116
+
117
+ **[Optional: You can include a link to the `training_history.png` plot from the notebook to visually represent training progress.]**
118
+
119
+ **Note:** Medication prediction is a multi-label task, and accuracy is measured as the set match of predicted vs. actual medications. F1-scores are weighted averages. Disease and Polypharmacy risk are multi-class classification tasks.
120
+
121
+ ## Deployment
122
+
123
+ The repository includes all the necessary artifacts for deploying the model:
124
+
125
+ * **`BiomedNLP_drug_prediction_model_full.pt`**: Contains the full trained model (weights, configuration, class mappings).
126
+ * **`label_encoders.pkl`**: Saved `MultiLabelBinarizer` and `LabelEncoder` instances for label transformations.
127
+ * **`requirements.txt`**: Lists Python package dependencies.
128
+
129
+ You can deploy the model as a:
130
+
131
+ * **Web Application:** Using frameworks like Flask or FastAPI to create a REST API for predictions.
132
+ * **Cloud-based Service:** Deploying the model on platforms like AWS, Google Cloud, or Azure for scalable access.
133
+ * **Local Application:** Integrating the model into desktop or mobile applications.
134
+
135
+ ## Quick Start
136
+
137
+ 1. **Clone the repository:**
138
+ ```bash
139
+ git clone https://github.com/[Your GitHub Username]/[Repository Name].git
140
+ cd [Repository Name]
141
+ ```
142
+
143
+ 2. **Create a virtual environment (recommended):**
144
+ ```bash
145
+ python -m venv venv
146
+ source venv/bin/activate # On Linux/macOS
147
+ venv\Scripts\activate # On Windows
148
+ ```
149
+
150
+ 3. **Install dependencies:**
151
+ ```bash
152
+ pip install -r requirements.txt
153
+ ```
154
+
155
+ 4. **Download the model artifacts (if not already in the repo - you may want to include the `mobileBERT_drug_prediction_model_full.pt` and `label_encoders.pkl` files directly in your repository for easier setup):**
156
+ ```bash
157
+ # [Instructions on how to download model files if you are not including them in the repo directly]
158
+ # For example:
159
+ # gdown <link_to_your_model_files_on_Google_Drive_or_other_hosting>
160
+ ```
161
+
162
+ 5. **Run the Jupyter Notebook (`Drug_Prediction_and_Polypharmacy_System.ipynb`)**:
163
+ ```bash
164
+ jupyter notebook Drug_Prediction_and_Polypharmacy_System.ipynb
165
+ ```
166
+
167
+ 6. **Experiment with the interactive prediction widget** at the end of the notebook.
168
+
169
+ 7. **To use the prediction function in your own Python code:**
170
+
171
+ ```python
172
+ import torch
173
+ import pickle
174
+ from transformers import AutoTokenizer, AutoModel
175
+ from model import EnhancedMedicationModel # Assuming you have a model.py file
176
+
177
+ # Load model artifacts and encoders (replace with your actual paths)
178
+ model_artifacts = torch.load('mobileBERT_drug_prediction_model_full.pt', map_location=torch.device('cpu')) # or 'cuda'
179
+ with open('label_encoders.pkl', 'rb') as f:
180
+ label_encoders = pickle.load(f)
181
+
182
+ model_config = model_artifacts['model_config']
183
+ model = EnhancedMedicationModel(**model_config)
184
+ model.load_state_dict(model_artifacts['model_state_dict'])
185
+ tokenizer = AutoTokenizer.from_pretrained(model_config['model_name'])
186
+ mlb = label_encoders['mlb']
187
+ le_risk = label_encoders['le_risk']
188
+ le_disease = label_encoders['le_disease']
189
+ device = torch.device('cpu') # or 'cuda' if you have GPU
190
+
191
+ # Example patient data (replace with your own)
192
+ patient_data = {
193
+ 'Age': 65,
194
+ 'Gender': "Female",
195
+ 'Blood_Group': "A+",
196
+ 'Weight_kg': 70.5,
197
+ 'Symptoms': "Headache; Dizziness; Chest pain",
198
+ 'Severity_Scores': "Headache:3; Dizziness:2; Chest pain:4"
199
+ }
200
+
201
+ # Import the prediction function from your notebook (Cell 14) or put it in a separate file
202
+ from Drug_Prediction_and_Polypharmacy_System5 import predict_full_health_profile # Assuming notebook is in the same directory
203
+
204
+ prediction = predict_full_health_profile(patient_data, model, tokenizer, mlb, le_risk, le_disease, device)
205
+ print(prediction) # Explore the prediction output
206
+ ```
207
+
208
+ ## Requirements
209
+
210
+ * Python 3.x
211
+ * Install the required Python packages using: `pip install -r requirements.txt`
212
+
213
+ ```text
214
+ pandas
215
+ numpy
216
+ torch
217
+ transformers
218
+ scikit-learn
219
+ tqdm
220
+ matplotlib
221
+ seaborn
222
+ pickle
223
+ ipywidgets