faster-whisper-api / README.md
Alaaharoun's picture
Upload 7 files
9e4d788 verified
metadata
title: Faster Whisper API
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false

🎀 Faster Whisper API - Fixed Version

πŸ†• Latest Fixes Applied:

βœ… Critical Bug Fixes:

  • Fixed "name 'traceback' is not defined" error - Removed problematic traceback import
  • Improved error handling - Better error messages and logging
  • Enhanced CORS middleware - Better browser compatibility
  • Added detailed logging - For easier debugging on Hugging Face Spaces

πŸ”§ Performance Improvements:

  • Better file validation - 25MB file size limit
  • Enhanced VAD support - Voice Activity Detection with fallback
  • Improved model loading - Better error handling during startup
  • Added health check endpoint - For monitoring service status

πŸš€ Quick Start:

Health Check:

curl https://alaaharoun-faster-whisper-api.hf.space/health

Transcribe Audio (without VAD):

curl -X POST \
  -F "file=@audio.wav" \
  -F "language=en" \
  -F "task=transcribe" \
  https://alaaharoun-faster-whisper-api.hf.space/transcribe

Transcribe Audio (with VAD):

curl -X POST \
  -F "file=@audio.wav" \
  -F "language=en" \
  -F "task=transcribe" \
  -F "vad_filter=true" \
  -F "vad_parameters=threshold=0.5" \
  https://alaaharoun-faster-whisper-api.hf.space/transcribe

πŸ“Š Supported Parameters:

  • file: Audio file (WAV, MP3, M4A, FLAC, OGG, WEBM)
  • language: Language code (optional, e.g., "en", "ar", "es")
  • task: "transcribe" or "translate" (default: "transcribe")
  • vad_filter: Enable Voice Activity Detection (default: false)
  • vad_parameters: VAD parameters (default: "threshold=0.5")

πŸ”§ Response Format:

Success Response:

{
  "success": true,
  "text": "Transcribed text here",
  "language": "en",
  "language_probability": 0.95,
  "vad_enabled": false,
  "vad_threshold": null
}

Error Response:

{
  "error": "Error message",
  "error_type": "ExceptionType",
  "success": false
}

πŸ› οΈ Local Development:

# Install dependencies
pip install -r requirements.txt

# Run the server
python app.py

Or with uvicorn:

uvicorn app:app --host 0.0.0.0 --port 7860

πŸ“ Important Notes:

  • Maximum file size: 25MB
  • Supported formats: WAV, MP3, M4A, FLAC, OGG, WEBM
  • VAD support: Configurable threshold with fallback mechanism
  • Language detection: Automatic if not specified
  • Error handling: Detailed error messages for debugging

πŸ” Troubleshooting:

Common Issues:

  1. 500 Internal Server Error:

    • Check if the model is loaded properly
    • Verify file format and size
    • Check server logs for detailed error messages
  2. VAD Issues:

    • The service will automatically fallback to standard transcription
    • Check VAD parameters format
  3. File Upload Issues:

    • Ensure file size is under 25MB
    • Check file format compatibility

🌐 Service URLs:

πŸ“ˆ Performance:

  • Model: Whisper base model with int8 quantization
  • Processing: Optimized for real-time transcription
  • Memory: Efficient memory usage for Hugging Face Spaces
  • Concurrency: Supports multiple concurrent requests

πŸ”’ Security:

  • CORS: Configured for cross-origin requests
  • File Validation: Strict file type and size validation
  • Error Handling: No sensitive information in error messages
  • Authentication: Optional API token support (currently disabled)

πŸ“ž Support:

For issues or questions:

  1. Check the health endpoint first
  2. Review server logs for detailed error messages
  3. Test with a simple audio file
  4. Verify file format and size requirements