Spaces:

Sayeem26s
/

Multimodal-AI-Doctor

Sleeping

App Files Files Community

Sayeem26s commited on Sep 19, 2025

Commit

619fb4d

verified ·

1 Parent(s): ef9cee1

Upload 8 files

Browse files

Files changed (8) hide show

.env +2 -0
.gitignore +87 -0
README.md +146 -13
app.py +369 -0
doctor_brain.py +70 -0
doctor_voice.py +66 -0
patient_voice.py +49 -0
requirements.txt +60 -0

.env ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ GROQ_API_KEY = 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
2	+ ELEVEN_API_KEY = sk_285703d2e09b079722c30be7472b4b6c1a12b3033b4a6f32

.gitignore ADDED Viewed

	@@ -0,0 +1,87 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+# C extensions
+*.so
+# Virtual environment
+env/
+venv/
+.venv/
+# Gradio cache and temporary files
+gradio/
+gradio_cache/
+*.gradio
+# Audio files
+*.mp3
+*.wav
+# Environment variables
+.env
+# Logs and debug files
+*.log
+*.out
+*.err
+# OS-specific files
+.DS_Store
+Thumbs.db
+# IDE-specific files
+.vscode/
+.idea/
+*.swp
+# Python egg files
+*.egg
+*.egg-info/
+dist/
+build/
+*.manifest
+*.spec
+# Test and coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+*.cover
+*.py,cover
+.cache
+pytest_cache/
+nosetests.xml
+coverage.xml
+*.coveragerc
+# Jupyter Notebook checkpoints
+.ipynb_checkpoints/
+# Temporary files
+*.tmp
+*.temp
+*.bak
+*.old
+*.orig
+*.save
+# Backup files
+*.~*
+*.sublime-workspace
+*.sublime-project
+# Ignore any other audio or media files
+*.ogg
+*.flac
+*.mp4
+*.avi
+*.mov
+*.mkv
+# History files
+.history/

README.md CHANGED Viewed

@@ -1,13 +1,146 @@
----
-title: Multimodal AI Doctor
-emoji: 🏢
-colorFrom: indigo
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.46.0
-app_file: app.py
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# AI Doctor
+**AI Doctor** is a multimodal assistant built with **Gradio**, **Groq APIs**, and **ElevenLabs**.
+It allows users to record patient voice, upload medical-related images, and receive a concise **doctor-style spoken response**.
+---
+## Features
+* Record patient voice from microphone (Speech-to-Text using **Whisper Large v3** on Groq)
+* Upload an image (diagnosis/medical-related) for analysis (Vision-Language reasoning using **Llama 4 Scout** on Groq)
+* Generate a concise medical-style response (2 sentences maximum, human-like tone)
+* Convert response to voice (Text-to-Speech using **ElevenLabs** with WAV output, fallback to **gTTS** if needed)
+* Gradio-based interactive UI
+---
+## Project Structure
+```
+.
+├── app.py                    # Gradio UI + main workflow
+├── brain_of_the_doctor.py    # Image encoding + Groq multimodal analysis
+├── voice_of_the_patient.py   # Audio recording + Groq Whisper transcription
+├── voice_of_the_doctor.py    # ElevenLabs + gTTS text-to-speech
+├── requirements.txt          # Python dependencies
+├── .env                      # Environment variables (API keys)
+├── .gitignore                # Ignore venv, __pycache__, .env, etc.
+├── images/                   # Folder for saving test/sample images
+└── README.md                 # Documentation
+```
+---
+## Requirements
+* Python 3.10 or higher
+* FFmpeg installed and available in PATH (required by pydub)
+* A Groq API key (obtain from [https://console.groq.com](https://console.groq.com))
+* An ElevenLabs API key (obtain from [https://elevenlabs.io](https://elevenlabs.io))
+---
+## Installation
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/your-username/ai-doctor-2.0-voice-and-vision.git
+   cd ai-doctor-2.0-voice-and-vision
+   ```
+2. Create and activate a virtual environment:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate      # Linux/Mac
+   venv\Scripts\activate         # Windows
+   ```
+3. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. Install FFmpeg (if not already installed):
+   * Windows: Download from [https://www.gyan.dev/ffmpeg/builds/](https://www.gyan.dev/ffmpeg/builds/) and add `bin/` to PATH
+   * Linux (Debian/Ubuntu): `sudo apt install ffmpeg`
+   * macOS (Homebrew): `brew install ffmpeg`
+5. Create a `.env` file in the project root with your API keys:
+   ```
+   GROQ_API_KEY=your_groq_api_key_here
+   ELEVEN_API_KEY=your_elevenlabs_api_key_here
+   ```
+---
+## Running the Application
+Start the Gradio app:
+```bash
+python gradio_app.py
+```
+The app will launch locally at:
+```
+http://127.0.0.1:7860
+```
+---
+## Usage
+1. Allow microphone access to record your voice.
+2. Upload a medical image for analysis.
+3. The system will:
+   * Transcribe your voice (Whisper Large v3 via Groq)
+   * Analyze the image + text (Llama 4 Scout via Groq)
+   * Generate a concise medical-style response
+   * Play back the response as voice (ElevenLabs or gTTS fallback)
+---
+## Models Used
+1. **Whisper Large v3** (Groq) – Speech-to-Text
+   * [Groq API Docs](https://console.groq.com/docs)
+2. **Llama 4 Scout 17B (Mixture-of-Experts)** (Groq) – Vision-Language reasoning
+   * [Groq API Docs](https://console.groq.com/docs)
+3. **ElevenLabs `eleven_turbo_v2`** – Text-to-Speech (WAV, with MP3 fallback)
+   * [ElevenLabs Docs](https://elevenlabs.io/docs)
+4. **gTTS (Google Text-to-Speech)** – Backup Text-to-Speech
+   * [PyPI gTTS](https://pypi.org/project/gTTS/)
+---
+## Notes
+* ElevenLabs free-tier accounts may not allow **WAV output** or certain custom voices. In that case, the code automatically falls back to **MP3** output with a safe built-in voice.
+* Ensure FFmpeg is correctly installed; otherwise, audio export with pydub will fail.
+* Gradio will automatically handle playback of both WAV and MP3 outputs.
+---
+## Support
+For questions, issues, or collaboration, please contact:
+**[sayeem26s@gmail.com](mailto:sayeem26s@gmail.com)**
+**LinkedIn:** [https://www.linkedin.com/in/s-m-shahriar-26s/](https://www.linkedin.com/in/s-m-shahriar-26s/)
+---

app.py ADDED Viewed

	@@ -0,0 +1,369 @@

+# gradio_app.py
+from dotenv import load_dotenv
+load_dotenv()
+import os
+import gradio as gr
+from doctor_brain import encode_image, analyze_image_with_query
+from patient_voice import transcribe_with_groq
+from doctor_voice import text_to_speech_with_elevenlabs, text_to_speech_with_gtts
+# ---------- Styling (CSS) ----------
+CUSTOM_CSS = """
+@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;800&display=swap');
+:root {
+  --bg: #0b1220;
+  --grad1: #0b1220;
+  --grad2: #101c3a;
+  --grad3: #1e2a55;
+  --card: rgba(255,255,255,0.06);
+  --card-brd: rgba(255,255,255,0.14);
+  --accent: #7aa2ff;
+  --accent-2: #9b87f5;
+  --accent-3: #3be4ff;
+  --text: #e9eefc;
+  --muted: #b8c2e0;
+  --success: #22d3a3;
+}
+* { font-family: 'Inter', system-ui, -apple-system, Segoe UI, Roboto, Arial, sans-serif; }
+body, .gradio-container, #root, .app {
+  min-height: 100vh;
+  background:
+    radial-gradient(1200px 700px at 10% -10%, var(--grad3), transparent 40%),
+    radial-gradient(1000px 700px at 110% 10%, var(--grad2), transparent 35%),
+    linear-gradient(180deg, var(--grad1), #0a0f1c 70%);
+  color: var(--text);
+  overflow-x: hidden;
+  position: relative;
+}
+/* ----- Floating Orbs (pure CSS) ----- */
+.bg-orb, .bg-orb-2, .bg-orb-3 {
+  position: fixed;
+  pointer-events: none;
+  z-index: 0;
+  filter: blur(32px);
+  opacity: 0.35;
+  mix-blend-mode: screen;
+  will-change: transform;
+}
+.bg-orb {
+  width: 520px; height: 520px; top: 8%; left: -120px;
+  background: radial-gradient(circle at 30% 30%, var(--accent-2), transparent 60%);
+  animation: float1 14s ease-in-out infinite;
+}
+.bg-orb-2 {
+  width: 420px; height: 420px; bottom: 6%; right: -100px;
+  background: radial-gradient(circle at 70% 70%, var(--accent), transparent 60%);
+  animation: float2 18s ease-in-out infinite;
+}
+.bg-orb-3 {
+  width: 360px; height: 360px; top: 50%; left: 60%;
+  background: radial-gradient(circle at 50% 50%, var(--accent-3), transparent 60%);
+  animation: float3 16s ease-in-out infinite;
+}
+@keyframes float1 { 0%,100%{ transform: translateY(-10px)} 50%{ transform: translateY(18px)} }
+@keyframes float2 { 0%,100%{ transform: translateY(12px)} 50%{ transform: translateY(-16px)} }
+@keyframes float3 { 0%,100%{ transform: translateX(-10px)} 50%{ transform: translateX(16px)} }
+/* ----- Header Title ----- */
+#app-title {
+  position: relative;
+  font-weight: 800;
+  letter-spacing: 0.2px;
+  background: linear-gradient(90deg, var(--accent), var(--accent-2), var(--accent-3));
+  -webkit-background-clip: text;
+  background-clip: text;
+  color: transparent;
+  font-size: 2.6rem;
+  margin: 6px 0 6px 0;
+  text-shadow: 0 8px 24px rgba(123,162,255,0.15);
+  animation: glowPulse 3.6s ease-in-out infinite;
+}
+@keyframes glowPulse {
+  0%,100% { filter: drop-shadow(0 0 0px rgba(123,162,255,0.35)); }
+  50%     { filter: drop-shadow(0 0 14px rgba(123,162,255,0.55)); }
+}
+#app-title:after {
+  content: "";
+  display: block;
+  width: 150px; height: 3px; margin: 10px auto 0;
+  background: linear-gradient(90deg, transparent, var(--accent-2), transparent);
+  border-radius: 3px;
+  animation: shimmer 2.8s linear infinite;
+}
+@keyframes shimmer {
+  0% { transform: translateX(-30px); opacity: 0.4; }
+  50% { transform: translateX(30px); opacity: 1; }
+  100% { transform: translateX(-30px); opacity: 0.4; }
+}
+#app-subtitle {
+  color: var(--muted);
+  font-weight: 400;
+  font-size: 1rem;
+  margin-bottom: 18px;
+}
+/* ----- Glass Cards ----- */
+.glass {
+  position: relative;
+  background: var(--card);
+  border: 1px solid var(--card-brd);
+  backdrop-filter: blur(12px);
+  -webkit-backdrop-filter: blur(12px);
+  border-radius: 18px;
+  box-shadow:
+    0 10px 30px rgba(0,0,0,0.20),
+    inset 0 1px 0 rgba(255,255,255,0.04);
+  transition: transform 240ms ease, box-shadow 240ms ease, border-color 240ms ease;
+  z-index: 1;
+}
+.glass:hover {
+  transform: translateY(-4px);
+  border-color: rgba(155,135,245,0.55);
+  box-shadow:
+    0 16px 42px rgba(0,0,0,0.35),
+    0 0 32px rgba(155,135,245,0.25);
+}
+.section-title {
+  font-weight: 700;
+  letter-spacing: 0.2px;
+  margin-bottom: 8px;
+  color: var(--text);
+}
+.hint {
+  color: var(--muted);
+  font-size: 0.9rem;
+  margin-top: -4px;
+  margin-bottom: 12px;
+}
+/* ----- Buttons (Magnetic + Shine) ----- */
+.gradio-container .btn-primary, .gr-button.primary {
+  position: relative;
+  background: linear-gradient(135deg, var(--accent), var(--accent-2)) !important;
+  border: none !important;
+  color: white !important;
+  box-shadow: 0 10px 26px rgba(123, 162, 255, 0.38);
+  transform: translateZ(0);
+  transition: transform 180ms ease, box-shadow 180ms ease, filter 180ms ease;
+  overflow: hidden;
+  border-radius: 12px !important;
+}
+.gradio-container .btn-primary:hover, .gr-button.primary:hover {
+  filter: brightness(1.05);
+  transform: translateY(-2px) scale(1.01);
+  box-shadow: 0 16px 34px rgba(123, 162, 255, 0.45);
+}
+.gradio-container .btn-primary::after, .gr-button.primary::after {
+  content: "";
+  position: absolute;
+  top: -100%; left: -30%;
+  width: 60%; height: 300%;
+  transform: rotate(25deg);
+  background: linear-gradient( to right, rgba(255,255,255,0.0), rgba(255,255,255,0.35), rgba(255,255,255,0.0) );
+  transition: left 500ms ease;
+}
+.gradio-container .btn-primary:hover::after, .gr-button.primary:hover::after {
+  left: 110%;
+}
+/* Secondary buttons, if any */
+button, .gr-button {
+  border-radius: 12px !important;
+}
+/* Inputs focus */
+textarea, input, .gr-textbox, .gr-text, .gradio-container .input-text, .gradio-container .wrap input[type="file"] {
+  color: var(--text) !important;
+}
+.gradio-container .wrap input:focus, .gr-textbox:focus, textarea:focus {
+  outline: none !important;
+  box-shadow: 0 0 0 3px rgba(123,162,255,0.35) !important;
+  border-color: rgba(123,162,255,0.6) !important;
+}
+/* Component labels */
+label, .wrap .label, .label-wrap .label, .component .label {
+  color: var(--muted) !important;
+}
+/* Layout */
+.card-pad { padding: 18px; }
+.grid {
+  display: grid;
+  gap: 16px;
+  grid-template-columns: 1fr;
+}
+@media (min-width: 980px) {
+  .grid { grid-template-columns: 1fr 1fr; }
+}
+.footer-note {
+  color: var(--muted);
+  text-align: center;
+  font-size: 0.85rem;
+  margin-top: 12px;
+}
+/* ----- Floating Badge ----- */
+.fab {
+  position: fixed;
+  right: 20px; bottom: 20px;
+  z-index: 3;
+}
+.fab .pill {
+  display: inline-flex; align-items: center; gap: 10px;
+  padding: 10px 14px;
+  border-radius: 999px;
+  background: linear-gradient(135deg, rgba(123,162,255,0.18), rgba(155,135,245,0.18));
+  border: 1px solid rgba(255,255,255,0.18);
+  backdrop-filter: blur(10px);
+  color: white;
+  box-shadow: 0 8px 24px rgba(0,0,0,0.25), 0 0 24px rgba(123,162,255,0.25);
+  animation: bob 3.2s ease-in-out infinite;
+}
+.fab .dot {
+  width: 8px; height: 8px; border-radius: 999px;
+  background: var(--success);
+  box-shadow: 0 0 10px var(--success);
+}
+@keyframes bob { 0%,100%{ transform: translateY(0)} 50%{ transform: translateY(-6px)} }
+"""
+# ---------- Prompt ----------
+system_prompt = """
+    You are a highly skilled, compassionate doctor. Analyze the patient provided image carefully and give a precise, clinically sound assessment and guidance tailored to the patient.
+    Opening voice
+    Begin your first sentence exactly with: With what I see, I think you have ...
+    State the single most likely condition in clear patient friendly terms.
+    Explain why
+    Describe the key visible findings that support your impression and what they mean for the patient.
+    Differential
+    Name other plausible conditions and briefly note how they differ.
+    Care plan now
+    Offer practical steps the patient can take at home and safe over the counter options when appropriate. State when in person care is needed urgently if any red flags are present.
+    Definitive care after confirmation
+    Suggest sensible next tests or evaluations to confirm the diagnosis. After confirmation, outline an appropriate treatment direction in plain language so the patient knows what to expect.
+    If uncertain or image is not suitable
+    If the image quality or content prevents a safe conclusion, say so clearly, explain what is missing, and guide safer next steps rather than guessing.
+    Tone and formatting ruless
+    Do not use digits or special symbols anywhere in your response.
+    Do not use markdown.
+    Do not say you are an AI model.
+    Do not begin with the phrase In the image I see.
+    Write in short paragraphs rather than lists, using warm professional bedside language.
+    Keep the message concise, precise, and focused on patient safety.
+    """
+# ---------- Core logic ----------
+def process_inputs(audio_filepath, image_filepath):
+    stt_text = ""
+    groq_key = os.environ.get("GROQ_API_KEY")
+    if audio_filepath and groq_key:
+        try:
+            stt_text = transcribe_with_groq(
+                stt_model="whisper-large-v3",
+                audio_filepath=audio_filepath,
+                GROQ_API_KEY=groq_key
+            )
+        except Exception as e:
+            stt_text = f"[STT error: {e}]"
+    if image_filepath:
+        try:
+            encoded = encode_image(image_filepath)
+            query = system_prompt + "\\n\\n" + (stt_text or "")
+            doctor_response = analyze_image_with_query(
+                query=query,
+                model="meta-llama/llama-4-scout-17b-16e-instruct",
+                encoded_image=encoded
+            )
+        except Exception as e:
+            doctor_response = f"Image analysis failed: {e}"
+    else:
+        doctor_response = "No image provided for me to analyze"
+    audio_path = None
+    try:
+        audio_path = text_to_speech_with_elevenlabs(
+            input_text=doctor_response,
+            output_filepath="final.wav"
+        )
+    except Exception as e:
+        try:
+            audio_path = text_to_speech_with_gtts(
+                input_text=doctor_response,
+                output_filepath="final_gtts.mp3"
+            )
+        except Exception as e2:
+            doctor_response += f" [TTS error: {e} | gTTS fallback error: {e2}]"
+    return stt_text, doctor_response, audio_path
+# ---------- UI ----------
+with gr.Blocks(css=CUSTOM_CSS, title="AI Doctor", theme=gr.themes.Soft()) as demo:
+    # floating orbs
+    gr.HTML('<div class="bg-orb"></div><div class="bg-orb-2"></div><div class="bg-orb-3"></div>')
+    with gr.Column():
+        gr.HTML("""
+        <div style="text-align:center; margin-top:10px; position:relative; z-index:2;">
+          <div id="app-title">AI Doctor</div>
+          <div id="app-subtitle">AI Doctor</div>
+        </div>
+        """)
+    with gr.Row(elem_classes=["grid"]):
+        with gr.Column(elem_classes=["glass", "card-pad"]):
+            gr.Markdown("### Inputs", elem_classes=["section-title"])
+            gr.Markdown("Upload an image and Tell about your symptoms.", elem_classes=["hint"])
+            audio_in = gr.Audio(sources=["microphone"], type="filepath", label="Patient's Voice (optional)")
+            image_in = gr.Image(type="filepath", label="Image for diagnosis")
+            submit_btn = gr.Button("Analyze", variant="primary")
+        with gr.Column(elem_classes=["glass", "card-pad"]):
+            gr.Markdown("### Results", elem_classes=["section-title"])
+            stt_out = gr.Textbox(label="Speech to Text", interactive=False, lines=3)
+            doc_out = gr.Textbox(label="Doctor's Response", interactive=False, lines=5)
+            audio_out = gr.Audio(label="Doctor's Voice", type="filepath")
+    gr.HTML('<div class="footer-note">Made By S.M. Shahriar &amp; Adiba Sabreen</div>')
+    # floating badge
+    gr.HTML("""
+    <div class="fab">
+      <div class="pill">
+        <span class="dot"></span>
+        <strong>AI Doctor</strong>
+        <span style="opacity:.8;">is listening</span>
+      </div>
+    </div>
+    """)
+    submit_btn.click(
+        fn=process_inputs,
+        inputs=[audio_in, image_in],
+        outputs=[stt_out, doc_out, audio_out]
+    )
+if __name__ == "__main__":
+    demo.queue().launch(debug=True)

doctor_brain.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from dotenv import load_dotenv
+load_dotenv()
+import os
+import base64
+from groq import Groq
+def encode_image(image_path: str) -> str:
+    """
+    Reads an image file from disk and encodes it into a base64 string.
+    This is needed because the Groq multimodal API expects images
+    as base64 data URIs.
+    Args:
+        image_path (str): Path to the image file
+    Returns:
+        str: Base64-encoded image data
+    """
+    with open(image_path, "rb") as f:
+        return base64.b64encode(f.read()).decode("utf-8")
+def analyze_image_with_query(query: str, model: str, encoded_image: str) -> str:
+    """
+    Sends a text query + an encoded image to Groq's multimodal chat completion API.
+    The LLM processes both modalities and returns a generated response.
+    Args:
+        query (str): The textual query/prompt (e.g., doctor's instruction)
+        model (str): The Groq multimodal model to use (e.g., llama-4-scout)
+        encoded_image (str): Base64-encoded image string
+    Returns:
+        str: The model's generated response
+    """
+    # Fetch Groq API key from environment variables
+    api_key = os.environ.get("GROQ_API_KEY")
+    if not api_key:
+        raise RuntimeError("GROQ_API_KEY is not set in environment")
+    # Initialize Groq client with API key
+    client = Groq(api_key=api_key)
+    # Construct the multimodal message payload:
+    # - First element is plain text query
+    # - Second element is image encoded as a base64 data URL
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": query},
+                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}}
+            ],
+        }
+    ]
+    # Call Groq's chat completion API with low temperature (0.1) for deterministic output.
+    # Limit tokens to 1000 to control response length.
+    chat_completion = client.chat.completions.create(
+        messages=messages,
+        model=model,
+        temperature=0.1,
+        max_tokens=1000
+    )
+    # Extract and return the model's response text
+    return chat_completion.choices[0].message.content

doctor_voice.py ADDED Viewed

	@@ -0,0 +1,66 @@

+from dotenv import load_dotenv
+load_dotenv()
+import os
+import os.path as op
+# gTTS library – produces MP3 audio (works fine in Gradio)
+from gtts import gTTS
+# ElevenLabs SDK
+from elevenlabs.client import ElevenLabs
+import elevenlabs
+def text_to_speech_with_gtts(input_text: str, output_filepath: str) -> str:
+    """
+    Generate speech from text using Google's gTTS library.
+    Produces MP3 output that can be played easily in Gradio.
+    Args:
+        input_text (str): Text to convert into speech
+        output_filepath (str): Where to save the generated MP3 file
+    Returns:
+        str: Path to the saved MP3 file
+    """
+    gTTS(text=input_text, lang="en", slow=False).save(output_filepath)
+    return output_filepath
+def text_to_speech_with_elevenlabs(input_text: str, output_filepath: str) -> str:
+    """
+    Generate speech from text using ElevenLabs API.
+    Produces WAV output (instead of MP3) to avoid issues with
+    Windows SoundPlayer (which only supports WAV/PCM).
+    Args:
+        input_text (str): Text to convert into speech
+        output_filepath (str): Desired output file path (extension adjusted to .wav)
+    Returns:
+        str: Path to the saved WAV file
+    """
+    # Fetch ElevenLabs API key from environment variables
+    api_key = os.environ.get("ELEVEN_API_KEY")
+    if not api_key:
+        raise RuntimeError("ELEVEN_API_KEY is not set in environment")
+    # Replace whatever extension is provided with .wav
+    base, _ = op.splitext(output_filepath)
+    output_wav = base + ".wav"
+    # Initialize ElevenLabs client
+    client = ElevenLabs(api_key=api_key)
+    # Generate audio with the ElevenLabs TTS model
+    audio = client.generate(
+        text=input_text,
+        voice="Aria",          # Use a supported voice available in your account
+        output_format="wav",   # WAV ensures compatibility across OS
+        model="eleven_turbo_v2"
+    )
+    # Save audio to file
+    elevenlabs.save(audio, output_wav)
+    return output_wav

patient_voice.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from dotenv import load_dotenv
+load_dotenv()
+import logging
+from io import BytesIO
+import os
+import speech_recognition as sr
+from pydub import AudioSegment
+from groq import Groq
+logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
+def record_audio(file_path: str, timeout: int = 20, phrase_time_limit: int | None = None) -> str:
+    """
+    Record audio from microphone and save as MP3 (requires ffmpeg on PATH).
+    Returns saved file path.
+    """
+    recognizer = sr.Recognizer()
+    try:
+        with sr.Microphone() as source:
+            logging.info("Adjusting for ambient noise...")
+            recognizer.adjust_for_ambient_noise(source, duration=1)
+            logging.info("Start speaking now...")
+            audio_data = recognizer.listen(source, timeout=timeout, phrase_time_limit=phrase_time_limit)
+            logging.info("Recording complete.")
+            wav_bytes = audio_data.get_wav_data()
+            audio_segment = AudioSegment.from_wav(BytesIO(wav_bytes))
+            audio_segment.export(file_path, format="mp3", bitrate="128k")
+            logging.info(f"Audio saved to {file_path}")
+            return file_path
+    except Exception as e:
+        logging.error(f"Recording error: {e}")
+        raise
+def transcribe_with_groq(stt_model: str, audio_filepath: str, GROQ_API_KEY: str) -> str:
+    """
+    Transcribe audio using Groq Whisper.
+    """
+    client = Groq(api_key=GROQ_API_KEY)
+    with open(audio_filepath, "rb") as f:
+        transcription = client.audio.transcriptions.create(
+            model=stt_model,
+            file=f,
+            language="en"
+        )
+    return transcription.text

requirements.txt ADDED Viewed

	@@ -0,0 +1,60 @@

+-i https://pypi.org/simple
+aiofiles==23.2.1; python_version >= '3.7'
+annotated-types==0.7.0; python_version >= '3.8'
+anyio==4.8.0; python_version >= '3.9'
+certifi==2024.12.14; python_version >= '3.6'
+charset-normalizer==3.4.1; python_version >= '3.7'
+click==8.1.8; python_version >= '3.7'
+distro==1.9.0; python_version >= '3.6'
+elevenlabs==1.50.3; python_version >= '3.8' and python_version < '4.0'
+fastapi==0.115.6; python_version >= '3.8'
+ffmpy==0.5.0; python_version >= '3.8' and python_version < '4.0'
+filelock==3.16.1; python_version >= '3.8'
+fsspec==2024.12.0; python_version >= '3.8'
+gradio==5.12.0; python_version >= '3.10'
+gradio-client==1.5.4; python_version >= '3.10'
+groq==0.15.0; python_version >= '3.8'
+gtts==2.5.4; python_version >= '3.7'
+h11==0.14.0; python_version >= '3.7'
+httpcore==1.0.7; python_version >= '3.8'
+httpx==0.28.1; python_version >= '3.8'
+huggingface-hub==0.27.1; python_full_version >= '3.8.0'
+idna==3.10; python_version >= '3.6'
+jinja2==3.1.5; python_version >= '3.7'
+markdown-it-py==3.0.0; python_version >= '3.8'
+markupsafe==2.1.5; python_version >= '3.7'
+mdurl==0.1.2; python_version >= '3.7'
+numpy==2.2.1; python_version >= '3.10'
+orjson==3.10.14; python_version >= '3.8'
+packaging==24.2; python_version >= '3.8'
+pandas==2.2.3; python_version >= '3.9'
+pillow==11.1.0; python_version >= '3.9'
+pyaudio==0.2.14
+pydantic==2.10.5; python_version >= '3.8'
+pydantic-core==2.27.2; python_version >= '3.8'
+pydub==0.25.1
+pygments==2.19.1; python_version >= '3.8'
+python-dateutil==2.9.0.post0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
+python-multipart==0.0.20; python_version >= '3.8'
+pytz==2024.2
+pyyaml==6.0.2; python_version >= '3.8'
+requests==2.32.3; python_version >= '3.8'
+rich==13.9.4; python_full_version >= '3.8.0'
+ruff==0.9.1; sys_platform != 'emscripten'
+safehttpx==0.1.6; python_version >= '3.10'
+semantic-version==2.10.0; python_version >= '2.7'
+shellingham==1.5.4; python_version >= '3.7'
+six==1.17.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
+sniffio==1.3.1; python_version >= '3.7'
+speechrecognition==3.13.0; python_version >= '3.9'
+starlette==0.41.3; sys_platform != 'emscripten'
+tomlkit==0.13.2; python_version >= '3.8'
+tqdm==4.67.1; python_version >= '3.7'
+typer==0.15.1; sys_platform != 'emscripten'
+typing-extensions==4.12.2; python_version >= '3.8'
+tzdata==2024.2; python_version >= '2'
+urllib3==2.3.0; python_version >= '3.9'
+uvicorn==0.34.0; sys_platform != 'emscripten'
+websockets==14.1; python_version >= '3.9'
+python-dotenv