File size: 4,338 Bytes
5a6cb79
 
 
 
 
 
 
 
 
 
cf50781
 
 
 
 
 
 
 
 
 
 
 
3340aa8
cf50781
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3340aa8
 
cf50781
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "JanArogya Chatbot"
emoji: 🩺
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.0.0"
app_file: app.py
pinned: false
---
# 🧠 **Cloud-Based Medical QA System**
### ☁️ *AWS-Integrated Machine Learning Project*

---

## πŸ“‹ **Project Overview**
This project demonstrates a **cloud-deployed Machine Learning (ML)** application built to answer **medical-related questions** using a fine-tuned QA model.  
The system leverages **AWS services** to ensure **scalability**, **accessibility**, and **secure management** of data and infrastructure.

The model was trained and deployed on an **Amazon EC2** instance, with data securely stored in an **Amazon S3** bucket.  
Access permissions and security were managed using **AWS IAM (Identity and Access Management)**.

---

## ☁️ **AWS Services Used**

### πŸ” **1. AWS Identity and Access Management (IAM)**
**Purpose:**
- 🧾 Created and managed secure access permissions for different AWS resources.  
- πŸ‘€ Configured a custom IAM user/role with limited access to S3 and EC2.  
- πŸ›‘οΈ Followed the principle of *least privilege* to ensure minimal security risks.  
- πŸ”‘ Used IAM for safe credential management during local access testing.  

**Alternative (Not Implemented):**
- 🧰 *AWS Secrets Manager* for automatic credential rotation.  

**Reason:** Not necessary for small-scale academic deployment and would increase complexity and cost.

---

### πŸ—ƒοΈ **2. Amazon S3 (Simple Storage Service)**
**Purpose:**
- ☁️ Used for storing the `cleaned_medquad.csv` dataset, providing a reliable, cloud-based data source for the ML model.  
- πŸ“€ The dataset was uploaded manually to an S3 bucket.  
- πŸ—‚οΈ Served as a centralized, secure data storage solution.  

**Alternative (Not Implemented):**
- πŸ”— Direct programmatic access using the **Boto3 SDK** to read data from S3 within the EC2 app.  

**Reason:** For demonstration purposes, manual upload was sufficient, and integration was skipped to focus on showcasing AWS setup.

---

### πŸ’» **3. Amazon EC2 (Elastic Compute Cloud)**
**Purpose:**
- πŸš€ Used to host and run the ML model and **Gradio interface**.  
- βš™οΈ Configured a `t2.medium` Ubuntu instance for deployment.  
- 🧩 Executed Flask/Gradio app and tested successfully via public URL.  
- πŸ” Verified full model functionality and response generation.  

**Alternative (Not Implemented):**
- πŸͺΆ *AWS Lambda* or *AWS SageMaker* for serverless or managed ML hosting.  

**Reason:** EC2 was ideal for this scale and provided better control over dependencies and environment setup.

---

## πŸš€ **Deployment Flow**
1. 🧺 Dataset uploaded to **S3 bucket**  
2. πŸ” IAM role created for **secure access management**  
3. πŸ’» **EC2 instance** launched and configured  
4. πŸ€– ML application (Flask + Gradio) deployed and tested  
5. πŸ“ˆ Logs and results verified on **terminal and Gradio public link**

---

## 🧩 **Tech Stack**
| Layer | Tools & Technologies |
|-------|----------------------|
| 🧠 **Backend** | Python (Flask, Gradio) |
| ☁️ **Cloud** | AWS EC2, S3, IAM |
| 🧰 **Libraries** | Pandas, Transformers, Scikit-learn |
| πŸ“Š **Dataset** | Medical QA Dataset (`cleaned_medquad.csv`) |

---

## πŸ“Έ **Screenshots**
πŸ“ Available in the `/screenshots` folder:  
1. πŸ§‘β€πŸ’» IAM Roles and Permissions  
2. πŸͺ£ S3 Bucket with Uploaded Dataset  
3. πŸ’» EC2 Instance Configuration  
4. 🧾 Terminal Log (Model Running Successfully)  
5. 🌐 Gradio Interface Screenshot  

---

## βš™οΈ **Future Integration**
Planned upgrades for the next version:
1. πŸ€– Automate dataset retrieval using **Boto3**  
2. πŸ” Integrate **IAM role-based S3 access** into code  
3. 🧱 Deploy the model using **AWS SageMaker** for managed ML  
4. πŸ’Ύ Store model responses in **Amazon RDS** for persistence  

---

## 🧹 **Resource Management**
All AWS resources (**EC2**, **IAM**, and **S3**) have been **safely terminated** after testing to prevent unnecessary billing.  

---

## πŸ‘¨β€πŸ’» **Author**
**Aaditya Arvind Ramrame**  
🌩️ *Cloud and Machine Learning Enthusiast*  
πŸ“§ [aadityaramrame@gmail.com](mailto:aadityaramrame@gmail.com)  
πŸ”— [GitHub Profile](https://github.com/Aadityaramrame)

---
⭐ *β€œBringing Machine Learning to the Cloud β€” One Service at a Time.”* ☁️