Spaces:

cyberspyde
/

whisper

Runtime error

App Files Files Community

cyberspyde commited on Dec 16, 2025

Commit

3ef0477

1 Parent(s): c00bf70

update

Browse files

Files changed (6) hide show

README.md +43 -0
api_example.py +58 -0
api_examples.md +243 -0
app.py +67 -2
requirements.txt +3 -0
test_api.py +119 -0

README.md CHANGED Viewed

@@ -31,11 +31,28 @@ This Hugging Face Space provides automatic speech recognition (ASR) for Uzbek la
 ## 📝 Usage
 1. **Record Audio**: Click the microphone icon to record directly in your browser
 2. **Upload Audio**: Or upload an existing audio file
 3. **Transcribe**: Click the "Transcribe" button to convert speech to text
 4. **View Results**: The transcribed text will appear in the output box
 ## 🔧 Local Development
 To run this application locally:
@@ -63,6 +80,13 @@ The application will be available at `http://localhost:7860`
 - torchaudio>=2.0.0
 - accelerate>=0.20.0
 - huggingface_hub>=0.16.0
 ## 📊 Logging
@@ -85,8 +109,27 @@ Contributions are welcome! Feel free to:
 This project is licensed under the Apache 2.0 License.
 ## 🔗 Resources
 - [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces-config-reference)
 - [Gradio Documentation](https://gradio.app/docs)
 - [Whisper Model Card](https://huggingface.co/jmshd/whisper-uz)

 ## 📝 Usage
+### Web Interface
 1. **Record Audio**: Click the microphone icon to record directly in your browser
 2. **Upload Audio**: Or upload an existing audio file
 3. **Transcribe**: Click the "Transcribe" button to convert speech to text
 4. **View Results**: The transcribed text will appear in the output box
+### API Access
+This Space provides a REST API for programmatic access. You can submit audio files and receive transcriptions programmatically.
+**Quick Example:**
+```python
+from gradio_client import Client
+client = Client("YOUR_USERNAME/whisper-uzbek-stt")
+result = client.predict("path/to/audio.mp3", api_name="/predict")
+print(result)
+```
+For detailed API documentation and examples, see [api_examples.md](api_examples.md)
 ## 🔧 Local Development
 To run this application locally:
 - torchaudio>=2.0.0
 - accelerate>=0.20.0
 - huggingface_hub>=0.16.0
+- scipy>=1.10.0
+- numpy>=1.24.0
+### For API Client Usage
+```bash
+pip install gradio-client
+```
 ## 📊 Logging
 This project is licensed under the Apache 2.0 License.
+## 🔌 API Features
+- **REST API**: Full Gradio API support
+- **Multiple Formats**: MP3, WAV, M4A, FLAC, etc.
+- **Auto-Resampling**: Handles any sample rate (auto-converts to 16kHz)
+- **Stereo to Mono**: Automatic conversion
+- **Error Handling**: Comprehensive error messages
+- **Progress Tracking**: Real-time processing updates
+## 📁 Project Files
+- `app.py` - Main application with Gradio interface and API
+- `requirements.txt` - Python dependencies
+- `api_example.py` - Python client example
+- `api_examples.md` - Comprehensive API documentation
+- `.gitignore` - Git ignore rules
 ## 🔗 Resources
+- [API Examples Documentation](api_examples.md)
 - [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces-config-reference)
+- [Gradio API Documentation](https://gradio.app/docs/client)
 - [Gradio Documentation](https://gradio.app/docs)
 - [Whisper Model Card](https://huggingface.co/jmshd/whisper-uz)

api_example.py ADDED Viewed

	@@ -0,0 +1,58 @@

+"""
+Example Python client for using the Whisper Uzbek STT API
+"""
+from gradio_client import Client
+import sys
+def transcribe_audio(audio_file_path, space_url):
+    """
+    Transcribe audio file using the Whisper API
+    Args:
+        audio_file_path: Path to the audio file
+        space_url: URL of the Hugging Face Space (e.g., "username/space-name")
+    Returns:
+        str: Transcribed text
+    """
+    try:
+        print(f"Connecting to {space_url}...")
+        client = Client(space_url)
+        print(f"Uploading and transcribing {audio_file_path}...")
+        result = client.predict(
+            audio_file_path,
+            api_name="/predict"
+        )
+        return result
+    except Exception as e:
+        print(f"Error: {str(e)}")
+        return None
+def main():
+    # Example usage
+    SPACE_URL = "YOUR_USERNAME/whisper-uzbek-stt"  # Replace with your Space URL
+    if len(sys.argv) < 2:
+        print("Usage: python api_example.py <audio_file_path>")
+        print("Example: python api_example.py sample.mp3")
+        sys.exit(1)
+    audio_file = sys.argv[1]
+    result = transcribe_audio(audio_file, SPACE_URL)
+    if result:
+        print("\n" + "="*50)
+        print("TRANSCRIPTION:")
+        print("="*50)
+        print(result)
+        print("="*50)
+if __name__ == "__main__":
+    main()

api_examples.md ADDED Viewed

	@@ -0,0 +1,243 @@

+# API Usage Examples
+This document provides examples of how to use the Whisper Uzbek STT API programmatically.
+## Prerequisites
+Install the Gradio client:
+```bash
+pip install gradio-client
+```
+---
+## Python Examples
+### Basic Usage
+```python
+from gradio_client import Client
+# Connect to your Space
+client = Client("YOUR_USERNAME/whisper-uzbek-stt")
+# Transcribe an audio file
+result = client.predict(
+    "path/to/audio.mp3",
+    api_name="/predict"
+)
+print(result)
+```
+### Advanced Usage with Error Handling
+```python
+from gradio_client import Client
+import os
+def transcribe_audio(audio_path, space_url):
+    """Transcribe audio with error handling"""
+    if not os.path.exists(audio_path):
+        raise FileNotFoundError(f"Audio file not found: {audio_path}")
+    try:
+        client = Client(space_url)
+        result = client.predict(audio_path, api_name="/predict")
+        return result
+    except Exception as e:
+        print(f"Transcription error: {e}")
+        return None
+# Usage
+space_url = "YOUR_USERNAME/whisper-uzbek-stt"
+transcription = transcribe_audio("uzbek_speech.wav", space_url)
+if transcription:
+    print(f"Transcription: {transcription}")
+```
+### Batch Processing
+```python
+from gradio_client import Client
+import os
+from pathlib import Path
+def batch_transcribe(audio_files, space_url):
+    """Transcribe multiple audio files"""
+    client = Client(space_url)
+    results = {}
+    for audio_file in audio_files:
+        try:
+            print(f"Processing: {audio_file}")
+            result = client.predict(audio_file, api_name="/predict")
+            results[audio_file] = result
+            print(f"✓ Done: {audio_file}")
+        except Exception as e:
+            print(f"✗ Failed: {audio_file} - {e}")
+            results[audio_file] = None
+    return results
+# Usage
+audio_files = [
+    "audio1.mp3",
+    "audio2.wav",
+    "audio3.m4a"
+]
+space_url = "YOUR_USERNAME/whisper-uzbek-stt"
+results = batch_transcribe(audio_files, space_url)
+# Print results
+for file, transcription in results.items():
+    print(f"\n{file}:")
+    print(f"  {transcription}")
+```
+---
+## JavaScript/Node.js Example
+```javascript
+const fs = require('fs');
+const axios = require('axios');
+const FormData = require('form-data');
+async function transcribeAudio(audioPath, spaceUrl) {
+    const form = new FormData();
+    form.append('data', JSON.stringify([audioPath]));
+    try {
+        const response = await axios.post(
+            `${spaceUrl}/api/predict`,
+            form,
+            {
+                headers: form.getHeaders()
+            }
+        );
+        return response.data.data[0];
+    } catch (error) {
+        console.error('Error:', error.message);
+        return null;
+    }
+}
+// Usage
+const spaceUrl = 'https://huggingface.co/spaces/YOUR_USERNAME/whisper-uzbek-stt';
+const audioPath = './audio.mp3';
+transcribeAudio(audioPath, spaceUrl)
+    .then(result => console.log('Transcription:', result));
+```
+---
+## cURL Example
+### Upload and Transcribe
+```bash
+curl -X POST "https://YOUR_USERNAME-whisper-uzbek-stt.hf.space/api/predict" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data": ["path/to/audio.mp3"]
+  }'
+```
+### Using a File Upload
+```bash
+# Save audio file first
+audio_file="sample.mp3"
+# Make API request
+curl -X POST "https://YOUR_USERNAME-whisper-uzbek-stt.hf.space/api/predict" \
+  -F "data=@${audio_file}"
+```
+---
+## Response Format
+The API returns JSON with the following structure:
+```json
+{
+  "data": ["Transcribed text in Uzbek"],
+  "duration": 2.5,
+  "is_generating": false
+}
+```
+---
+## Error Handling
+Possible error responses:
+### No Audio Provided
+```json
+{
+  "data": ["⚠️ No audio provided. Please upload or record audio."]
+}
+```
+### Processing Error
+```json
+{
+  "data": ["❌ Error during transcription: <error message>"]
+}
+```
+---
+## Rate Limiting
+Hugging Face Spaces may have rate limits. For production use:
+- Implement retry logic with exponential backoff
+- Consider caching results
+- Monitor your Space's usage metrics
+---
+## Best Practices
+1. **File Formats**: Supported formats include MP3, WAV, M4A, FLAC
+2. **File Size**: Keep files under 25MB for best performance
+3. **Sample Rate**: Any sample rate works (automatically resampled to 16kHz)
+4. **Audio Quality**: Higher quality audio = better transcription
+5. **Language**: Optimized for Uzbek language
+---
+## Troubleshooting
+### Connection Issues
+```python
+# Add timeout
+from gradio_client import Client
+client = Client("YOUR_SPACE_URL", timeout=60)
+```
+### Large Files
+```python
+# Use file upload instead of path
+with open("large_audio.mp3", "rb") as f:
+    result = client.predict(f, api_name="/predict")
+```
+---
+## Support
+For issues or questions:
+- Check the Space logs on Hugging Face
+- Review the error messages in the response
+- Ensure your audio file is valid and accessible

app.py CHANGED Viewed

@@ -5,6 +5,8 @@ import logging
 import os
 from datetime import datetime
 from huggingface_hub import HfApi
 # Setup logging
 logging.basicConfig(
@@ -32,6 +34,40 @@ except Exception as e:
     logger.error(f"Error loading model: {str(e)}")
     raise
 def transcribe(audio, progress=gr.Progress()):
     """
     Transcribe audio to text using Whisper model
@@ -51,7 +87,20 @@ def transcribe(audio, progress=gr.Progress()):
         progress(0.1, desc="Processing audio...")
         sample_rate, audio_data = audio
-        logger.info(f"Processing audio - Sample rate: {sample_rate}, Shape: {audio_data.shape}")
         progress(0.3, desc="Preparing input features...")
         inputs = processor(
@@ -113,10 +162,23 @@ with gr.Blocks(theme=gr.themes.Soft()) as iface:
         2. Click the "Transcribe" button to convert speech to text
         3. The transcribed text will appear in the output box
         ### ℹ️ Information:
         - Supported language: Uzbek
         - Processing: CPU-only (may be slower than GPU)
         - Model size: Small
         """
     )
@@ -129,9 +191,12 @@ with gr.Blocks(theme=gr.themes.Soft()) as iface:
 # Launch configuration for Hugging Face Spaces
 if __name__ == "__main__":
     logger.info("Launching Gradio interface...")
     iface.launch(
         share=False,
         show_error=True,
         server_name="0.0.0.0",
-        server_port=7860
     )

 import os
 from datetime import datetime
 from huggingface_hub import HfApi
+import numpy as np
+from scipy import signal
 # Setup logging
 logging.basicConfig(
     logger.error(f"Error loading model: {str(e)}")
     raise
+def resample_audio(audio_data, orig_sr, target_sr=16000):
+    """
+    Resample audio to target sample rate
+    Args:
+        audio_data: Audio array
+        orig_sr: Original sample rate
+        target_sr: Target sample rate (default 16000 for Whisper)
+    Returns:
+        Resampled audio array
+    """
+    if orig_sr == target_sr:
+        return audio_data
+    # Convert to float32 if not already
+    if audio_data.dtype != np.float32:
+        audio_data = audio_data.astype(np.float32)
+    # Normalize if needed
+    if audio_data.dtype == np.int16:
+        audio_data = audio_data / 32768.0
+    elif audio_data.dtype == np.int32:
+        audio_data = audio_data / 2147483648.0
+    # Calculate resampling ratio
+    duration = len(audio_data) / orig_sr
+    target_length = int(duration * target_sr)
+    # Resample using scipy
+    resampled = signal.resample(audio_data, target_length)
+    return resampled.astype(np.float32)
 def transcribe(audio, progress=gr.Progress()):
     """
     Transcribe audio to text using Whisper model
         progress(0.1, desc="Processing audio...")
         sample_rate, audio_data = audio
+        logger.info(f"Processing audio - Sample rate: {sample_rate}, Shape: {audio_data.shape}, Dtype: {audio_data.dtype}")
+        # Handle stereo to mono conversion
+        if len(audio_data.shape) > 1:
+            logger.info("Converting stereo to mono")
+            audio_data = np.mean(audio_data, axis=1)
+        # Resample to 16000 Hz if needed
+        target_sr = 16000
+        if sample_rate != target_sr:
+            logger.info(f"Resampling from {sample_rate} Hz to {target_sr} Hz")
+            progress(0.2, desc=f"Resampling audio from {sample_rate} Hz to {target_sr} Hz...")
+            audio_data = resample_audio(audio_data, sample_rate, target_sr)
+            sample_rate = target_sr
         progress(0.3, desc="Preparing input features...")
         inputs = processor(
         2. Click the "Transcribe" button to convert speech to text
         3. The transcribed text will appear in the output box
+        ### 🔌 API Access:
+        This Space provides a REST API for programmatic access. Click "Use via API" button below for details.
+        **Quick API Example (Python):**
+        ```python
+        from gradio_client import Client
+        client = Client("YOUR_SPACE_URL")
+        result = client.predict("path/to/audio.mp3", api_name="/predict")
+        print(result)
+        ```
         ### ℹ️ Information:
         - Supported language: Uzbek
         - Processing: CPU-only (may be slower than GPU)
         - Model size: Small
+        - API: Enabled via Gradio Client
         """
     )
 # Launch configuration for Hugging Face Spaces
 if __name__ == "__main__":
     logger.info("Launching Gradio interface...")
+    logger.info("API endpoints will be available at /api/predict")
+    iface.queue()  # Enable queue for better API performance
     iface.launch(
         share=False,
         show_error=True,
         server_name="0.0.0.0",
+        server_port=7860,
+        show_api=True  # Enable API documentation
     )

requirements.txt CHANGED Viewed

@@ -4,3 +4,6 @@ torch>=2.0.0
 torchaudio>=2.0.0
 accelerate>=0.20.0
 huggingface_hub>=0.16.0

 torchaudio>=2.0.0
 accelerate>=0.20.0
 huggingface_hub>=0.16.0
+scipy>=1.10.0
+numpy>=1.24.0
+gradio-client>=0.7.0

test_api.py ADDED Viewed

	@@ -0,0 +1,119 @@

+"""
+Quick test script for the Whisper Uzbek STT API
+This script tests both local and remote API endpoints.
+"""
+import sys
+import os
+def test_local_api():
+    """Test the API when running locally"""
+    from gradio_client import Client
+    print("Testing local API (http://localhost:7860)...")
+    try:
+        client = Client("http://localhost:7860")
+        print("✓ Connected to local server")
+        # Test with a sample audio file if provided
+        if len(sys.argv) > 1:
+            audio_file = sys.argv[1]
+            if os.path.exists(audio_file):
+                print(f"✓ Testing with audio file: {audio_file}")
+                result = client.predict(audio_file, api_name="/predict")
+                print(f"\n{'='*60}")
+                print("TRANSCRIPTION RESULT:")
+                print(f"{'='*60}")
+                print(result)
+                print(f"{'='*60}\n")
+                return True
+            else:
+                print(f"✗ Audio file not found: {audio_file}")
+                return False
+        else:
+            print("ℹ No audio file provided for testing")
+            print("Usage: python test_api.py <audio_file_path>")
+            return True
+    except Exception as e:
+        print(f"✗ Error: {str(e)}")
+        return False
+def test_remote_api(space_url):
+    """Test the API on Hugging Face Spaces"""
+    from gradio_client import Client
+    print(f"\nTesting remote API ({space_url})...")
+    try:
+        client = Client(space_url)
+        print("✓ Connected to remote Space")
+        if len(sys.argv) > 1:
+            audio_file = sys.argv[1]
+            if os.path.exists(audio_file):
+                print(f"✓ Testing with audio file: {audio_file}")
+                result = client.predict(audio_file, api_name="/predict")
+                print(f"\n{'='*60}")
+                print("TRANSCRIPTION RESULT:")
+                print(f"{'='*60}")
+                print(result)
+                print(f"{'='*60}\n")
+                return True
+            else:
+                print(f"✗ Audio file not found: {audio_file}")
+                return False
+        else:
+            print("ℹ No audio file provided for testing")
+            return True
+    except Exception as e:
+        print(f"✗ Error: {str(e)}")
+        return False
+def main():
+    print("="*60)
+    print("Whisper Uzbek STT - API Test Script")
+    print("="*60)
+    print()
+    # Test local API
+    local_success = test_local_api()
+    # Optionally test remote API
+    print("\n" + "-"*60)
+    test_remote = input("Do you want to test the remote API? (y/n): ").lower().strip()
+    if test_remote == 'y':
+        space_url = input("Enter your Space URL (e.g., username/space-name): ").strip()
+        if space_url:
+            remote_success = test_remote_api(space_url)
+        else:
+            print("✗ No Space URL provided")
+            remote_success = False
+    else:
+        remote_success = None
+    # Summary
+    print("\n" + "="*60)
+    print("TEST SUMMARY")
+    print("="*60)
+    print(f"Local API:  {'✓ PASSED' if local_success else '✗ FAILED'}")
+    if remote_success is not None:
+        print(f"Remote API: {'✓ PASSED' if remote_success else '✗ FAILED'}")
+    print("="*60)
+if __name__ == "__main__":
+    try:
+        from gradio_client import Client
+    except ImportError:
+        print("✗ Error: gradio-client is not installed")
+        print("Install it with: pip install gradio-client")
+        sys.exit(1)
+    main()