File size: 2,747 Bytes
394310f
 
dc235a0
 
 
394310f
 
 
 
dc235a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: MedChatBot
emoji: 💊 
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---

# MedChatBot

A medical chatbot application that uses RAG (Retrieval-Augmented Generation) architecture to answer medical questions based on medical literature. The system combines **Google Gemini 2.5 Pro** as the language model with **Pinecone** vector database for efficient document retrieval.

## Technology Stack

- **Backend**: Flask
- **Language Model**: Google Gemini 2.5 Pro
- **Vector Database**: Pinecone
- **Embeddings**: HuggingFace sentence-transformers (all-MiniLM-L6-v2)
- **Document Processing**: LangChain, PyPDF
- **Frontend**: HTML/CSS/JavaScript

## Installation & Setup

### Step 1: Clone the Repository
```bash
git clone https://github.com/TMTien31/MedChatBot.git
cd MedChatBot
```

### Step 2: Create Virtual Environment
```bash
# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate

# On macOS/Linux:
source venv/bin/activate
```

### Step 3: Install Dependencies
```bash
pip install -r requirements.txt
```

### Step 4: Get API Keys

#### Google Gemini API Key:
1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the generated key

#### Pinecone API Key:
1. Sign up at [Pinecone](https://www.pinecone.io/)
2. Go to your dashboard
3. Copy your API key from the "API Keys" section

### Step 5: Create Environment File
Create a `.env` file in the project root directory:
```bash
# Create .env file
touch .env  # On macOS/Linux
# or create manually on Windows
```

Add your API keys to the `.env` file:
```env
PINECONE_API_KEY=your_pinecone_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
```

### Step 6: Prepare Medical Documents
- Place your PDF medical documents in the `Data/` folder
- The project includes "Gale Encyclopedia of Medicine Vol. 1 (A-B).pdf" by default
- You can add more medical PDFs to expand the knowledge base

### Step 7: Create Vector Index (Run Once)
**Important**: This step only needs to be run once initially, or whenever you add new documents to the `Data/` folder.

```bash
python store_index.py
```

This script will:
- Read all PDF files from the `Data/` directory
- Split text into 500-character chunks with 20-character overlap
- Generate embeddings using sentence-transformers
- Create and populate a Pinecone index named "medchatbot"

**Note**: This process may take several minutes depending on the size of your documents.

## Running the Application

### Start the Flask Server
```bash
python app.py
```

### Access the Application
1. Open your web browser
2. Navigate to: `http://0.0.0.0:8080` or `http://localhost:8080`
3. You should see the medical chatbot interface