MedChatBot / README.md
tmt3103's picture
first commit
dc235a0
---
title: MedChatBot
emoji: ๐Ÿ’Š
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---
# MedChatBot
A medical chatbot application that uses RAG (Retrieval-Augmented Generation) architecture to answer medical questions based on medical literature. The system combines **Google Gemini 2.5 Pro** as the language model with **Pinecone** vector database for efficient document retrieval.
## Technology Stack
- **Backend**: Flask
- **Language Model**: Google Gemini 2.5 Pro
- **Vector Database**: Pinecone
- **Embeddings**: HuggingFace sentence-transformers (all-MiniLM-L6-v2)
- **Document Processing**: LangChain, PyPDF
- **Frontend**: HTML/CSS/JavaScript
## Installation & Setup
### Step 1: Clone the Repository
```bash
git clone https://github.com/TMTien31/MedChatBot.git
cd MedChatBot
```
### Step 2: Create Virtual Environment
```bash
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
```
### Step 3: Install Dependencies
```bash
pip install -r requirements.txt
```
### Step 4: Get API Keys
#### Google Gemini API Key:
1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the generated key
#### Pinecone API Key:
1. Sign up at [Pinecone](https://www.pinecone.io/)
2. Go to your dashboard
3. Copy your API key from the "API Keys" section
### Step 5: Create Environment File
Create a `.env` file in the project root directory:
```bash
# Create .env file
touch .env # On macOS/Linux
# or create manually on Windows
```
Add your API keys to the `.env` file:
```env
PINECONE_API_KEY=your_pinecone_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
```
### Step 6: Prepare Medical Documents
- Place your PDF medical documents in the `Data/` folder
- The project includes "Gale Encyclopedia of Medicine Vol. 1 (A-B).pdf" by default
- You can add more medical PDFs to expand the knowledge base
### Step 7: Create Vector Index (Run Once)
**Important**: This step only needs to be run once initially, or whenever you add new documents to the `Data/` folder.
```bash
python store_index.py
```
This script will:
- Read all PDF files from the `Data/` directory
- Split text into 500-character chunks with 20-character overlap
- Generate embeddings using sentence-transformers
- Create and populate a Pinecone index named "medchatbot"
**Note**: This process may take several minutes depending on the size of your documents.
## Running the Application
### Start the Flask Server
```bash
python app.py
```
### Access the Application
1. Open your web browser
2. Navigate to: `http://0.0.0.0:8080` or `http://localhost:8080`
3. You should see the medical chatbot interface