Spaces:
Running
Running
Joseph Pollack
commited on
Commit
Β·
7b7ce7e
1
Parent(s):
43839ca
implements final interface fixes
Browse files- AUDIO_INPUT_FIX.md +90 -0
- README.md +39 -12
- src/app.py +6 -3
AUDIO_INPUT_FIX.md
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Audio Input Display Fix
|
| 2 |
+
|
| 3 |
+
## Issue
|
| 4 |
+
The audio input (microphone button) was not displaying in the ChatInterface multimodal textbox.
|
| 5 |
+
|
| 6 |
+
## Root Cause
|
| 7 |
+
When `multimodal=True` is set on `gr.ChatInterface`, it should automatically show image and audio buttons. However:
|
| 8 |
+
1. The buttons might be hidden in a dropdown menu
|
| 9 |
+
2. Browser permissions might be blocking microphone access
|
| 10 |
+
3. The `file_types` parameter might not have been explicitly set
|
| 11 |
+
|
| 12 |
+
## Fix Applied
|
| 13 |
+
|
| 14 |
+
### 1. Added `file_types` Parameter
|
| 15 |
+
Explicitly specified which file types are accepted to ensure audio is enabled:
|
| 16 |
+
|
| 17 |
+
```python
|
| 18 |
+
gr.ChatInterface(
|
| 19 |
+
fn=research_agent,
|
| 20 |
+
multimodal=True,
|
| 21 |
+
file_types=["image", "audio", "video"], # Explicitly enable image, audio, and video
|
| 22 |
+
...
|
| 23 |
+
)
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
**File:** `src/app.py` (line 929)
|
| 27 |
+
|
| 28 |
+
### 2. Enhanced UI Description
|
| 29 |
+
Updated the description to make it clearer where to find the audio input:
|
| 30 |
+
|
| 31 |
+
- Added explicit instructions about clicking the π· and π€ icons
|
| 32 |
+
- Added a tip about looking for icons in the text input box
|
| 33 |
+
- Clarified drag & drop functionality
|
| 34 |
+
|
| 35 |
+
**File:** `src/app.py` (lines 942-948)
|
| 36 |
+
|
| 37 |
+
## How It Works Now
|
| 38 |
+
|
| 39 |
+
1. **Audio Recording Button**: The π€ microphone icon should appear in the textbox toolbar when `multimodal=True` is set
|
| 40 |
+
2. **File Upload**: Users can drag & drop audio files or click to upload
|
| 41 |
+
3. **Browser Permissions**: Browser will prompt for microphone access when user clicks the audio button
|
| 42 |
+
|
| 43 |
+
## Testing
|
| 44 |
+
|
| 45 |
+
To verify the fix:
|
| 46 |
+
1. Look for the π€ microphone icon in the text input box
|
| 47 |
+
2. Click it to start recording (browser will ask for microphone permission)
|
| 48 |
+
3. Alternatively, drag & drop an audio file into the textbox
|
| 49 |
+
4. Check browser console for any permission errors
|
| 50 |
+
|
| 51 |
+
## Browser Requirements
|
| 52 |
+
|
| 53 |
+
- **Chrome/Edge**: Should work with microphone permissions
|
| 54 |
+
- **Firefox**: Should work with microphone permissions
|
| 55 |
+
- **Safari**: May require additional configuration
|
| 56 |
+
- **HTTPS Required**: Microphone access typically requires HTTPS (or localhost)
|
| 57 |
+
|
| 58 |
+
## Troubleshooting
|
| 59 |
+
|
| 60 |
+
If audio input still doesn't appear:
|
| 61 |
+
|
| 62 |
+
1. **Check Browser Permissions**:
|
| 63 |
+
- Open browser settings
|
| 64 |
+
- Check microphone permissions for the site
|
| 65 |
+
- Ensure microphone is not blocked
|
| 66 |
+
|
| 67 |
+
2. **Check Browser Console**:
|
| 68 |
+
- Open Developer Tools (F12)
|
| 69 |
+
- Look for permission errors or warnings
|
| 70 |
+
- Check for any JavaScript errors
|
| 71 |
+
|
| 72 |
+
3. **Try Different Browser**:
|
| 73 |
+
- Some browsers have stricter permission policies
|
| 74 |
+
- Try Chrome or Firefox if Safari doesn't work
|
| 75 |
+
|
| 76 |
+
4. **Check Gradio Version**:
|
| 77 |
+
- Ensure `gradio>=6.0.0` is installed
|
| 78 |
+
- Update if needed: `pip install --upgrade gradio`
|
| 79 |
+
|
| 80 |
+
5. **HTTPS Requirement**:
|
| 81 |
+
- Microphone access requires HTTPS (or localhost)
|
| 82 |
+
- If deploying, ensure SSL is configured
|
| 83 |
+
|
| 84 |
+
## Additional Notes
|
| 85 |
+
|
| 86 |
+
- The audio button is part of the MultimodalTextbox component
|
| 87 |
+
- It should appear as an icon in the textbox toolbar
|
| 88 |
+
- If it's still not visible, it might be in a dropdown menu (click the "+" or "..." button)
|
| 89 |
+
- The `file_types` parameter ensures audio files are accepted for upload
|
| 90 |
+
|
README.md
CHANGED
|
@@ -21,6 +21,14 @@ tags:
|
|
| 21 |
- pydantic-ai
|
| 22 |
- llamaindex
|
| 23 |
- modal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
---
|
| 25 |
|
| 26 |
> [!IMPORTANT]
|
|
@@ -58,11 +66,21 @@ The DETERMINATOR is a powerful generalist deep research agent system that stops
|
|
| 58 |
|
| 59 |
For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
## Deep Critical In the Medial
|
| 62 |
|
| 63 |
- Social Medial Posts about Deep Critical :
|
| 64 |
-
-
|
| 65 |
-
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
|
@@ -100,24 +118,33 @@ For this hackathon we're proposing a simple yet powerful Deep Research Agent tha
|
|
| 100 |
- [x] **Specialized Research Teams of Agents**:
|
| 101 |
|
| 102 |
### Team
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
-
- ZJ
|
| 105 |
-
- MarioAderman
|
| 106 |
-
- Josephrp
|
| 107 |
|
| 108 |
## Acknowledgements
|
| 109 |
|
| 110 |
-
-
|
| 111 |
-
- Magentic
|
| 112 |
-
- Huggingface
|
| 113 |
-
- Gradio
|
| 114 |
-
- DeepCritical
|
| 115 |
-
-
|
| 116 |
- Microsoft
|
| 117 |
- Pydantic
|
| 118 |
- Llama-index
|
| 119 |
- Anthhropic/MCP
|
| 120 |
-
-
|
| 121 |
|
| 122 |
|
| 123 |
## Links
|
|
|
|
| 21 |
- pydantic-ai
|
| 22 |
- llamaindex
|
| 23 |
- modal
|
| 24 |
+
- building-mcp-track-enterprise
|
| 25 |
+
- building-mcp-track-consumer
|
| 26 |
+
- mcp-in-action-track-enterprise
|
| 27 |
+
- mcp-in-action-track-consumer
|
| 28 |
+
- building-mcp-track-modal
|
| 29 |
+
- building-mcp-track-blaxel
|
| 30 |
+
- building-mcp-track-llama-index
|
| 31 |
+
- building-mcp-track-HUGGINGFACE
|
| 32 |
---
|
| 33 |
|
| 34 |
> [!IMPORTANT]
|
|
|
|
| 66 |
|
| 67 |
For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
|
| 68 |
|
| 69 |
+
|
| 70 |
+
> [!IMPORTANT]
|
| 71 |
+
> **IF YOU ARE A JUDGE**
|
| 72 |
+
>
|
| 73 |
+
> This project was produced with passion by a group of volunteers please check out or documentation and readmes and please do keep reading below for our story
|
| 74 |
+
>
|
| 75 |
+
> - π **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
|
| 76 |
+
> - π **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
|
| 77 |
+
> - π **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
|
| 78 |
+
|
| 79 |
## Deep Critical In the Medial
|
| 80 |
|
| 81 |
- Social Medial Posts about Deep Critical :
|
| 82 |
+
- []
|
| 83 |
+
- []
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
|
|
|
| 118 |
- [x] **Specialized Research Teams of Agents**:
|
| 119 |
|
| 120 |
### Team
|
| 121 |
+
- **ZJ**
|
| 122 |
+
- π€ [HuggingFace](https://huggingface.co/Tonic)
|
| 123 |
+
- πΌ [LinkedIn](https://www.linkedin.com/in/josephpollack/)
|
| 124 |
+
- π [X](https://x.com/josephpollack)
|
| 125 |
+
- **Mario Aderman**
|
| 126 |
+
- π€ [HuggingFace](https://huggingface.co/SeasonalFall84)
|
| 127 |
+
- πΌ [LinkedIn](https://www.linkedin.com/in/mario-aderman/)
|
| 128 |
+
- π [X](https://x.com/marioaderman)
|
| 129 |
+
- **Joseph Pollack
|
| 130 |
+
- π€ [HuggingFace](https://huggingface.co/Tonic)
|
| 131 |
+
- πΌ [LinkedIn](https://www.linkedin.com/in/josephpollack/)
|
| 132 |
+
- π [X](https://x.com/josephpollack)
|
| 133 |
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
## Acknowledgements
|
| 136 |
|
| 137 |
+
- [DeepBoner](https://hf.co/spaces/mcp-1st-birthday/deepboner)
|
| 138 |
+
- Magentic Paper
|
| 139 |
+
- [Huggingface](https://hf.co)
|
| 140 |
+
- [Gradio](https://gradio.app)
|
| 141 |
+
- [DeepCritical](https://github.com/DeepCritical)
|
| 142 |
+
- [Modal](https://modal.com)
|
| 143 |
- Microsoft
|
| 144 |
- Pydantic
|
| 145 |
- Llama-index
|
| 146 |
- Anthhropic/MCP
|
| 147 |
+
- All our Tool Providers
|
| 148 |
|
| 149 |
|
| 150 |
## Links
|
src/app.py
CHANGED
|
@@ -925,6 +925,7 @@ def create_demo() -> gr.Blocks:
|
|
| 925 |
gr.ChatInterface(
|
| 926 |
fn=research_agent,
|
| 927 |
multimodal=True, # Enable multimodal input (text + images + audio)
|
|
|
|
| 928 |
title="π¬ The DETERMINATOR",
|
| 929 |
description=(
|
| 930 |
"*Generalist Deep Research Agent β stops at nothing until finding precise answers to complex questions*\n\n"
|
|
@@ -939,9 +940,11 @@ def create_demo() -> gr.Blocks:
|
|
| 939 |
"- π Evidence synthesis with citations\n\n"
|
| 940 |
"**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
|
| 941 |
"**π·π€ Multimodal Input Support**:\n"
|
| 942 |
-
"- **Images**:
|
| 943 |
-
"- **Audio**:
|
| 944 |
-
"- **
|
|
|
|
|
|
|
| 945 |
"Configure multimodal inputs in the sidebar settings.\n\n"
|
| 946 |
"**β οΈ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
|
| 947 |
),
|
|
|
|
| 925 |
gr.ChatInterface(
|
| 926 |
fn=research_agent,
|
| 927 |
multimodal=True, # Enable multimodal input (text + images + audio)
|
| 928 |
+
file_types=["image", "audio", "video"], # Explicitly enable image, audio, and video file types
|
| 929 |
title="π¬ The DETERMINATOR",
|
| 930 |
description=(
|
| 931 |
"*Generalist Deep Research Agent β stops at nothing until finding precise answers to complex questions*\n\n"
|
|
|
|
| 940 |
"- π Evidence synthesis with citations\n\n"
|
| 941 |
"**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
|
| 942 |
"**π·π€ Multimodal Input Support**:\n"
|
| 943 |
+
"- **Images**: Click the π· image icon in the textbox to upload images (OCR)\n"
|
| 944 |
+
"- **Audio**: Click the π€ microphone icon in the textbox to record audio (STT)\n"
|
| 945 |
+
"- **Files**: Drag & drop or click to upload image/audio files\n"
|
| 946 |
+
"- **Text**: Type your research questions directly\n\n"
|
| 947 |
+
"π‘ **Tip**: Look for the π· and π€ icons in the text input box below!\n\n"
|
| 948 |
"Configure multimodal inputs in the sidebar settings.\n\n"
|
| 949 |
"**β οΈ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
|
| 950 |
),
|