MediVox / README.md
gauravgulati619's picture
feat: update to Gemini, add optional inputs, and apply new theme
ef46851

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: MediVox - AI Doctor with Vision and Voice
emoji: 👨‍⚕️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.19.0
app_file: app.py
pinned: false

AI Doctor with Vision and Voice

This is an AI-powered medical assistant that can:

  • Accept voice input from patients
  • Analyze medical images
  • Provide medical insights using RAG (Retrieval Augmented Generation)
  • Respond with natural voice output

Features

  • Speech-to-Text using Whisper
  • Image Analysis using LLaVA
  • RAG using FAISS and medical knowledge base
  • Text-to-Speech using ElevenLabs
  • Context-aware responses using medical domain knowledge

Environment Variables Required

GOOGLE_AI_STUDIO_API_KEY=your_google_ai_studio_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

Usage

  1. Click the microphone button to record your question (optional)
  2. Upload or take a picture of the medical condition (optional)
  3. Either input method can be used independently or together
  4. Wait for the AI doctor to analyze and respond
  5. Listen to the voice response or read the text output

Model Details

  • Vision Model: Google Gemini 2.0 Flash
  • Speech-to-Text: Google Gemini 2.0 Flash
  • Text Generation: Google Gemini 2.0 Flash
  • Voice Generation: ElevenLabs
  • Embeddings: sentence-transformers/all-mpnet-base-v2

Citation

If you use this space, please cite:

@misc{medivoicebot2024,
  author = {Gaurav Gulati},
  title = {AI Doctor with Vision and Voice},
  year = {2024},
  publisher = {Hugging Face Spaces},
}