Spaces:

vesakkivignesh
/

medchat

Sleeping

medchat / README.md

vihashini-18

8a79799 about 2 months ago

3.12 kB

	---
	title: Medical Chatbot
	emoji: 🏥
	colorFrom: blue
	colorTo: green
	sdk: streamlit
	sdk_version: 1.28.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- medical
	- chatbot
	- rag
	- gemini
	- streamlit
	---

	# Medical Chatbot 🏥

	An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.

	## Features

	- 🤖 Powered by Gemini 1.5 Flash for natural language understanding
	- 📊 Uses Sentence Transformers for semantic search
	- 🔍 Retrieves relevant medical information from vector database
	- 📚 Provides citations with source attribution
	- 🎯 Confidence scoring for each response
	- 🌐 Beautiful Streamlit interface
	- ⚠️ Important disclaimers for medical advice

	## Prerequisites

	1. Python 3.8 or higher
	2. Pinecone account (https://www.pinecone.io/)
	3. Google AI Studio API key (https://makersuite.google.com/app/apikey)
	4. Hugging Face account (optional, for accessing datasets)

	## Installation

	For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)

	1. Clone or download this repository

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Create a `.env` file in the root directory:
	```env
	PINECONE_API_KEY=your_pinecone_api_key_here
	PINECONE_ENVIRONMENT=us-east1
	GOOGLE_API_KEY=your_google_api_key_here
	```

	4. Set up the database:
	```bash
	python setup_database.py
	```

	This will download medical data from Hugging Face and upload it to Pinecone.

	## Usage

	Run the Streamlit application:
	```bash
	streamlit run app.py
	```

	Open your browser to the URL shown (typically http://localhost:8501)

	Quick Start Guide: [QUICK_START.md](QUICK_START.md)

	## How It Works

	1. Data Loading: Medical questions and answers are loaded from Hugging Face datasets
	2. Embedding: Texts are converted to embeddings using Sentence Transformers
	3. Vector Storage: Embeddings are stored in Pinecone for fast similarity search
	4. Query Processing: User queries are embedded and searched against the database
	5. Response Generation: Gemini 1.5 Flash generates responses based on retrieved context
	6. Citation: Sources are tracked and displayed with confidence scores

	## Important Disclaimers

	- ⚠️ This is not medical advice
	- ⚠️ Not a substitute for professional healthcare
	- ⚠️ Always consult healthcare professionals for medical decisions
	- ⚠️ Confidence scores indicate data quality, not medical accuracy

	## Configuration

	Edit `config.py` to customize:
	- Embedding model
	- Number of retrieved documents (TOP_K)
	- Similarity threshold
	- Dataset selection

	## Troubleshooting

	### "API Key not found"
	- Ensure your `.env` file exists and contains valid API keys

	### "Index not found"
	- Run `python setup_database.py` to create the Pinecone index

	### "No results found"
	- The similarity threshold might be too high
	- Adjust `SIMILARITY_THRESHOLD` in `config.py`

	## License

	This project is for educational purposes only. Medical information should be verified with healthcare professionals.