DeepCritical / AUDIO_INPUT_FIX.md
Joseph Pollack
final countdown
e427816
|
raw
history blame
3.01 kB

Audio Input Display Fix

Issue

The audio input (microphone button) was not displaying in the ChatInterface multimodal textbox.

Root Cause

When multimodal=True is set on gr.ChatInterface, it should automatically show image and audio buttons. However:

  1. The buttons might be hidden in a dropdown menu
  2. Browser permissions might be blocking microphone access
  3. The file_types parameter might not have been explicitly set

Fix Applied

1. Added file_types Parameter

Explicitly specified which file types are accepted to ensure audio is enabled:

gr.ChatInterface(
    fn=research_agent,
    multimodal=True,
    file_types=["image", "audio", "video"],  # Explicitly enable image, audio, and video
    ...
)

File: src/app.py (line 929)

2. Enhanced UI Description

Updated the description to make it clearer where to find the audio input:

  • Added explicit instructions about clicking the πŸ“· and 🎀 icons
  • Added a tip about looking for icons in the text input box
  • Clarified drag & drop functionality

File: src/app.py (lines 942-948)

How It Works Now

  1. Audio Recording Button: The 🎀 microphone icon should appear in the textbox toolbar when multimodal=True is set
  2. File Upload: Users can drag & drop audio files or click to upload
  3. Browser Permissions: Browser will prompt for microphone access when user clicks the audio button

Testing

To verify the fix:

  1. Look for the 🎀 microphone icon in the text input box
  2. Click it to start recording (browser will ask for microphone permission)
  3. Alternatively, drag & drop an audio file into the textbox
  4. Check browser console for any permission errors

Browser Requirements

  • Chrome/Edge: Should work with microphone permissions
  • Firefox: Should work with microphone permissions
  • Safari: May require additional configuration
  • HTTPS Required: Microphone access typically requires HTTPS (or localhost)

Troubleshooting

If audio input still doesn't appear:

  1. Check Browser Permissions:

    • Open browser settings
    • Check microphone permissions for the site
    • Ensure microphone is not blocked
  2. Check Browser Console:

    • Open Developer Tools (F12)
    • Look for permission errors or warnings
    • Check for any JavaScript errors
  3. Try Different Browser:

    • Some browsers have stricter permission policies
    • Try Chrome or Firefox if Safari doesn't work
  4. Check Gradio Version:

    • Ensure gradio>=6.0.0 is installed
    • Update if needed: pip install --upgrade gradio
  5. HTTPS Requirement:

    • Microphone access requires HTTPS (or localhost)
    • If deploying, ensure SSL is configured

Additional Notes

  • The audio button is part of the MultimodalTextbox component
  • It should appear as an icon in the textbox toolbar
  • If it's still not visible, it might be in a dropdown menu (click the "+" or "..." button)
  • The file_types parameter ensures audio files are accepted for upload