Spaces:
Sleeping
Sleeping
| title: "JanArogya Chatbot" | |
| emoji: π©Ί | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "4.0.0" | |
| app_file: app.py | |
| pinned: false | |
| # π§ **Cloud-Based Medical QA System** | |
| ### βοΈ *AWS-Integrated Machine Learning Project* | |
| --- | |
| ## π **Project Overview** | |
| This project demonstrates a **cloud-deployed Machine Learning (ML)** application built to answer **medical-related questions** using a fine-tuned QA model. | |
| The system leverages **AWS services** to ensure **scalability**, **accessibility**, and **secure management** of data and infrastructure. | |
| The model was trained and deployed on an **Amazon EC2** instance, with data securely stored in an **Amazon S3** bucket. | |
| Access permissions and security were managed using **AWS IAM (Identity and Access Management)**. | |
| --- | |
| ## βοΈ **AWS Services Used** | |
| ### π **1. AWS Identity and Access Management (IAM)** | |
| **Purpose:** | |
| - π§Ύ Created and managed secure access permissions for different AWS resources. | |
| - π€ Configured a custom IAM user/role with limited access to S3 and EC2. | |
| - π‘οΈ Followed the principle of *least privilege* to ensure minimal security risks. | |
| - π Used IAM for safe credential management during local access testing. | |
| **Alternative (Not Implemented):** | |
| - π§° *AWS Secrets Manager* for automatic credential rotation. | |
| **Reason:** Not necessary for small-scale academic deployment and would increase complexity and cost. | |
| --- | |
| ### ποΈ **2. Amazon S3 (Simple Storage Service)** | |
| **Purpose:** | |
| - βοΈ Used for storing the `cleaned_medquad.csv` dataset, providing a reliable, cloud-based data source for the ML model. | |
| - π€ The dataset was uploaded manually to an S3 bucket. | |
| - ποΈ Served as a centralized, secure data storage solution. | |
| **Alternative (Not Implemented):** | |
| - π Direct programmatic access using the **Boto3 SDK** to read data from S3 within the EC2 app. | |
| **Reason:** For demonstration purposes, manual upload was sufficient, and integration was skipped to focus on showcasing AWS setup. | |
| --- | |
| ### π» **3. Amazon EC2 (Elastic Compute Cloud)** | |
| **Purpose:** | |
| - π Used to host and run the ML model and **Gradio interface**. | |
| - βοΈ Configured a `t2.medium` Ubuntu instance for deployment. | |
| - π§© Executed Flask/Gradio app and tested successfully via public URL. | |
| - π Verified full model functionality and response generation. | |
| **Alternative (Not Implemented):** | |
| - πͺΆ *AWS Lambda* or *AWS SageMaker* for serverless or managed ML hosting. | |
| **Reason:** EC2 was ideal for this scale and provided better control over dependencies and environment setup. | |
| --- | |
| ## π **Deployment Flow** | |
| 1. π§Ί Dataset uploaded to **S3 bucket** | |
| 2. π IAM role created for **secure access management** | |
| 3. π» **EC2 instance** launched and configured | |
| 4. π€ ML application (Flask + Gradio) deployed and tested | |
| 5. π Logs and results verified on **terminal and Gradio public link** | |
| --- | |
| ## π§© **Tech Stack** | |
| | Layer | Tools & Technologies | | |
| |-------|----------------------| | |
| | π§ **Backend** | Python (Flask, Gradio) | | |
| | βοΈ **Cloud** | AWS EC2, S3, IAM | | |
| | π§° **Libraries** | Pandas, Transformers, Scikit-learn | | |
| | π **Dataset** | Medical QA Dataset (`cleaned_medquad.csv`) | | |
| --- | |
| ## πΈ **Screenshots** | |
| π Available in the `/screenshots` folder: | |
| 1. π§βπ» IAM Roles and Permissions | |
| 2. πͺ£ S3 Bucket with Uploaded Dataset | |
| 3. π» EC2 Instance Configuration | |
| 4. π§Ύ Terminal Log (Model Running Successfully) | |
| 5. π Gradio Interface Screenshot | |
| --- | |
| ## βοΈ **Future Integration** | |
| Planned upgrades for the next version: | |
| 1. π€ Automate dataset retrieval using **Boto3** | |
| 2. π Integrate **IAM role-based S3 access** into code | |
| 3. π§± Deploy the model using **AWS SageMaker** for managed ML | |
| 4. πΎ Store model responses in **Amazon RDS** for persistence | |
| --- | |
| ## π§Ή **Resource Management** | |
| All AWS resources (**EC2**, **IAM**, and **S3**) have been **safely terminated** after testing to prevent unnecessary billing. | |
| --- | |
| ## π¨βπ» **Author** | |
| **Aaditya Arvind Ramrame** | |
| π©οΈ *Cloud and Machine Learning Enthusiast* | |
| π§ [aadityaramrame@gmail.com](mailto:aadityaramrame@gmail.com) | |
| π [GitHub Profile](https://github.com/Aadityaramrame) | |
| --- | |
| β *βBringing Machine Learning to the Cloud β One Service at a Time.β* βοΈ |