Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π§ **Cloud-Based Medical QA System**
|
| 2 |
+
### βοΈ *AWS-Integrated Machine Learning Project*
|
| 3 |
+
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
## π **Project Overview**
|
| 7 |
+
This project demonstrates a **cloud-deployed Machine Learning (ML)** application built to answer **medical-related questions** using a fine-tuned QA model.
|
| 8 |
+
The system leverages **AWS services** to ensure **scalability**, **accessibility**, and **secure management** of data and infrastructure.
|
| 9 |
+
|
| 10 |
+
The model was trained and deployed on an **Amazon EC2** instance, with data securely stored in an **Amazon S3** bucket.
|
| 11 |
+
Access permissions and security were managed using **AWS IAM (Identity and Access Management)**.
|
| 12 |
+
|
| 13 |
---
|
| 14 |
+
|
| 15 |
+
## βοΈ **AWS Services Used**
|
| 16 |
+
|
| 17 |
+
### π **1. AWS Identity and Access Management (IAM)**
|
| 18 |
+
**Purpose:**
|
| 19 |
+
- π§Ύ Created and managed secure access permissions for different AWS resources.
|
| 20 |
+
- π€ Configured a custom IAM user/role with limited access to S3 and EC2.
|
| 21 |
+
- π‘οΈ Followed the principle of *least privilege* to ensure minimal security risks.
|
| 22 |
+
- π Used IAM for safe credential management during local access testing.
|
| 23 |
+
|
| 24 |
+
**Alternative (Not Implemented):**
|
| 25 |
+
- π§° *AWS Secrets Manager* for automatic credential rotation.
|
| 26 |
+
|
| 27 |
+
**Reason:** Not necessary for small-scale academic deployment and would increase complexity and cost.
|
| 28 |
+
|
| 29 |
---
|
| 30 |
|
| 31 |
+
### ποΈ **2. Amazon S3 (Simple Storage Service)**
|
| 32 |
+
**Purpose:**
|
| 33 |
+
- βοΈ Used for storing the `cleaned_medquad.csv` dataset, providing a reliable, cloud-based data source for the ML model.
|
| 34 |
+
- π€ The dataset was uploaded manually to an S3 bucket.
|
| 35 |
+
- ποΈ Served as a centralized, secure data storage solution.
|
| 36 |
+
|
| 37 |
+
**Alternative (Not Implemented):**
|
| 38 |
+
- π Direct programmatic access using the **Boto3 SDK** to read data from S3 within the EC2 app.
|
| 39 |
+
|
| 40 |
+
**Reason:** For demonstration purposes, manual upload was sufficient, and integration was skipped to focus on showcasing AWS setup.
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
### π» **3. Amazon EC2 (Elastic Compute Cloud)**
|
| 45 |
+
**Purpose:**
|
| 46 |
+
- π Used to host and run the ML model and **Gradio interface**.
|
| 47 |
+
- βοΈ Configured a `t2.medium` Ubuntu instance for deployment.
|
| 48 |
+
- π§© Executed Flask/Gradio app and tested successfully via public URL.
|
| 49 |
+
- π Verified full model functionality and response generation.
|
| 50 |
+
|
| 51 |
+
**Alternative (Not Implemented):**
|
| 52 |
+
- πͺΆ *AWS Lambda* or *AWS SageMaker* for serverless or managed ML hosting.
|
| 53 |
+
|
| 54 |
+
**Reason:** EC2 was ideal for this scale and provided better control over dependencies and environment setup.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## π **Deployment Flow**
|
| 59 |
+
1. π§Ί Dataset uploaded to **S3 bucket**
|
| 60 |
+
2. π IAM role created for **secure access management**
|
| 61 |
+
3. π» **EC2 instance** launched and configured
|
| 62 |
+
4. π€ ML application (Flask + Gradio) deployed and tested
|
| 63 |
+
5. π Logs and results verified on **terminal and Gradio public link**
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## π§© **Tech Stack**
|
| 68 |
+
| Layer | Tools & Technologies |
|
| 69 |
+
|-------|----------------------|
|
| 70 |
+
| π§ **Backend** | Python (Flask, Gradio) |
|
| 71 |
+
| βοΈ **Cloud** | AWS EC2, S3, IAM |
|
| 72 |
+
| π§° **Libraries** | Pandas, Transformers, Scikit-learn |
|
| 73 |
+
| π **Dataset** | Medical QA Dataset (`cleaned_medquad.csv`) |
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
## πΈ **Screenshots**
|
| 78 |
+
π Available in the `/screenshots` folder:
|
| 79 |
+
1. π§βπ» IAM Roles and Permissions
|
| 80 |
+
2. πͺ£ S3 Bucket with Uploaded Dataset
|
| 81 |
+
3. π» EC2 Instance Configuration
|
| 82 |
+
4. π§Ύ Terminal Log (Model Running Successfully)
|
| 83 |
+
5. π Gradio Interface Screenshot
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## βοΈ **Future Integration**
|
| 88 |
+
Planned upgrades for the next version:
|
| 89 |
+
1. π€ Automate dataset retrieval using **Boto3**
|
| 90 |
+
2. π Integrate **IAM role-based S3 access** into code
|
| 91 |
+
3. π§± Deploy the model using **AWS SageMaker** for managed ML
|
| 92 |
+
4. πΎ Store model responses in **Amazon RDS** for persistence
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## π§Ή **Resource Management**
|
| 97 |
+
All AWS resources (**EC2**, **IAM**, and **S3**) have been **safely terminated** after testing to prevent unnecessary billing.
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## π¨βπ» **Author**
|
| 102 |
+
**Aaditya Arvind Ramrame**
|
| 103 |
+
π©οΈ *Cloud and Machine Learning Enthusiast*
|
| 104 |
+
π§ [aadityaramrame@gmail.com](mailto:aadityaramrame@gmail.com)
|
| 105 |
+
π [GitHub Profile](https://github.com/Aadityaramrame)
|
| 106 |
+
|
| 107 |
+
---
|
| 108 |
+
β *βBringing Machine Learning to the Cloud β One Service at a Time.β* βοΈ
|