Spaces:
Sleeping
Sleeping
| title: Medical Chatbot | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: streamlit | |
| sdk_version: 1.28.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - medical | |
| - chatbot | |
| - rag | |
| - gemini | |
| - streamlit | |
| # Medical Chatbot π₯ | |
| An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB. | |
| ## Features | |
| - π€ Powered by Gemini 1.5 Flash for natural language understanding | |
| - π Uses Sentence Transformers for semantic search | |
| - π Retrieves relevant medical information from vector database | |
| - π Provides citations with source attribution | |
| - π― Confidence scoring for each response | |
| - π Beautiful Streamlit interface | |
| - β οΈ Important disclaimers for medical advice | |
| ## Prerequisites | |
| 1. Python 3.8 or higher | |
| 2. Pinecone account (https://www.pinecone.io/) | |
| 3. Google AI Studio API key (https://makersuite.google.com/app/apikey) | |
| 4. Hugging Face account (optional, for accessing datasets) | |
| ## Installation | |
| **For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)** | |
| 1. Clone or download this repository | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Create a `.env` file in the root directory: | |
| ```env | |
| PINECONE_API_KEY=your_pinecone_api_key_here | |
| PINECONE_ENVIRONMENT=us-east1 | |
| GOOGLE_API_KEY=your_google_api_key_here | |
| ``` | |
| 4. Set up the database: | |
| ```bash | |
| python setup_database.py | |
| ``` | |
| This will download medical data from Hugging Face and upload it to Pinecone. | |
| ## Usage | |
| Run the Streamlit application: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| Open your browser to the URL shown (typically http://localhost:8501) | |
| **Quick Start Guide:** [QUICK_START.md](QUICK_START.md) | |
| ## How It Works | |
| 1. **Data Loading**: Medical questions and answers are loaded from Hugging Face datasets | |
| 2. **Embedding**: Texts are converted to embeddings using Sentence Transformers | |
| 3. **Vector Storage**: Embeddings are stored in Pinecone for fast similarity search | |
| 4. **Query Processing**: User queries are embedded and searched against the database | |
| 5. **Response Generation**: Gemini 1.5 Flash generates responses based on retrieved context | |
| 6. **Citation**: Sources are tracked and displayed with confidence scores | |
| ## Important Disclaimers | |
| - β οΈ **This is not medical advice** | |
| - β οΈ **Not a substitute for professional healthcare** | |
| - β οΈ **Always consult healthcare professionals for medical decisions** | |
| - β οΈ **Confidence scores indicate data quality, not medical accuracy** | |
| ## Configuration | |
| Edit `config.py` to customize: | |
| - Embedding model | |
| - Number of retrieved documents (TOP_K) | |
| - Similarity threshold | |
| - Dataset selection | |
| ## Troubleshooting | |
| ### "API Key not found" | |
| - Ensure your `.env` file exists and contains valid API keys | |
| ### "Index not found" | |
| - Run `python setup_database.py` to create the Pinecone index | |
| ### "No results found" | |
| - The similarity threshold might be too high | |
| - Adjust `SIMILARITY_THRESHOLD` in `config.py` | |
| ## License | |
| This project is for educational purposes only. Medical information should be verified with healthcare professionals. | |