# Medical Chatbot 🏥 An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB. ## Features - 🤖 Powered by Gemini 1.5 Flash for natural language understanding - 📊 Uses Sentence Transformers for semantic search - 🔍 Retrieves relevant medical information from vector database - 📚 Provides citations with source attribution - 🎯 Confidence scoring for each response - 🌐 Beautiful Streamlit interface - ⚠️ Important disclaimers for medical advice ## Prerequisites 1. Python 3.8 or higher 2. Pinecone account (https://www.pinecone.io/) 3. Google AI Studio API key (https://makersuite.google.com/app/apikey) 4. Hugging Face account (optional, for accessing datasets) ## Installation **For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)** 1. Clone or download this repository 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Create a `.env` file in the root directory: ```env PINECONE_API_KEY=your_pinecone_api_key_here PINECONE_ENVIRONMENT=us-east1 GOOGLE_API_KEY=your_google_api_key_here ``` 4. Set up the database: ```bash python setup_database.py ``` This will download medical data from Hugging Face and upload it to Pinecone. ## Usage Run the Streamlit application: ```bash streamlit run app.py ``` Open your browser to the URL shown (typically http://localhost:8501) **Quick Start Guide:** [QUICK_START.md](QUICK_START.md) ## How It Works 1. **Data Loading**: Medical questions and answers are loaded from Hugging Face datasets 2. **Embedding**: Texts are converted to embeddings using Sentence Transformers 3. **Vector Storage**: Embeddings are stored in Pinecone for fast similarity search 4. **Query Processing**: User queries are embedded and searched against the database 5. **Response Generation**: Gemini 1.5 Flash generates responses based on retrieved context 6. **Citation**: Sources are tracked and displayed with confidence scores ## Important Disclaimers - ⚠️ **This is not medical advice** - ⚠️ **Not a substitute for professional healthcare** - ⚠️ **Always consult healthcare professionals for medical decisions** - ⚠️ **Confidence scores indicate data quality, not medical accuracy** ## Configuration Edit `config.py` to customize: - Embedding model - Number of retrieved documents (TOP_K) - Similarity threshold - Dataset selection ## Troubleshooting ### "API Key not found" - Ensure your `.env` file exists and contains valid API keys ### "Index not found" - Run `python setup_database.py` to create the Pinecone index ### "No results found" - The similarity threshold might be too high - Adjust `SIMILARITY_THRESHOLD` in `config.py` ## License This project is for educational purposes only. Medical information should be verified with healthcare professionals.