medchat / README.md
vihashini-18
i
8a79799
---
title: Medical Chatbot
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
- medical
- chatbot
- rag
- gemini
- streamlit
---
# Medical Chatbot πŸ₯
An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.
## Features
- πŸ€– Powered by Gemini 1.5 Flash for natural language understanding
- πŸ“Š Uses Sentence Transformers for semantic search
- πŸ” Retrieves relevant medical information from vector database
- πŸ“š Provides citations with source attribution
- 🎯 Confidence scoring for each response
- 🌐 Beautiful Streamlit interface
- ⚠️ Important disclaimers for medical advice
## Prerequisites
1. Python 3.8 or higher
2. Pinecone account (https://www.pinecone.io/)
3. Google AI Studio API key (https://makersuite.google.com/app/apikey)
4. Hugging Face account (optional, for accessing datasets)
## Installation
**For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)**
1. Clone or download this repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Create a `.env` file in the root directory:
```env
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here
```
4. Set up the database:
```bash
python setup_database.py
```
This will download medical data from Hugging Face and upload it to Pinecone.
## Usage
Run the Streamlit application:
```bash
streamlit run app.py
```
Open your browser to the URL shown (typically http://localhost:8501)
**Quick Start Guide:** [QUICK_START.md](QUICK_START.md)
## How It Works
1. **Data Loading**: Medical questions and answers are loaded from Hugging Face datasets
2. **Embedding**: Texts are converted to embeddings using Sentence Transformers
3. **Vector Storage**: Embeddings are stored in Pinecone for fast similarity search
4. **Query Processing**: User queries are embedded and searched against the database
5. **Response Generation**: Gemini 1.5 Flash generates responses based on retrieved context
6. **Citation**: Sources are tracked and displayed with confidence scores
## Important Disclaimers
- ⚠️ **This is not medical advice**
- ⚠️ **Not a substitute for professional healthcare**
- ⚠️ **Always consult healthcare professionals for medical decisions**
- ⚠️ **Confidence scores indicate data quality, not medical accuracy**
## Configuration
Edit `config.py` to customize:
- Embedding model
- Number of retrieved documents (TOP_K)
- Similarity threshold
- Dataset selection
## Troubleshooting
### "API Key not found"
- Ensure your `.env` file exists and contains valid API keys
### "Index not found"
- Run `python setup_database.py` to create the Pinecone index
### "No results found"
- The similarity threshold might be too high
- Adjust `SIMILARITY_THRESHOLD` in `config.py`
## License
This project is for educational purposes only. Medical information should be verified with healthcare professionals.