mutisya's picture
Deploy Polyglot backend with quantized models
d6e8bff verified
metadata
title: Polyglot Translation Backend
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860

Polyglot Translation Backend - Quantized Models

Real-time speech transcription and translation API with Socket.IO for WebSocket communication. This version uses INT8 quantized models for improved performance and reduced memory footprint.

Features

  • Real-time Speech Recognition: Support for English, Swahili, Kikuyu, Kamba, Kimeru, Luo, and Somali
  • Translation: Multi-language translation using NLLB models
  • Text-to-Speech: Generate speech in multiple languages
  • WebSocket Support: Real-time communication via Socket.IO
  • Model Quantization: INT8 dynamic quantization for faster inference

API Endpoints

  • GET /health - Health check endpoint
  • WebSocket / - Socket.IO connection for real-time communication

Environment

This Space requires the following secrets to be configured:

  • HUGGING_FACE_HUB_TOKEN - HuggingFace token for model access
  • CODE_SPACE_ID - ID of the private code space (e.g., "mutisya/polyglot-backend-code")

Code Space Architecture

This Docker Space downloads the application code from a separate private Space during build time. This allows the Docker Space to be public while keeping the source code private.

  • Public Docker Space (this one): Contains only the Dockerfile and deployment configuration
  • Private Code Space: Contains the actual application code (app/) and data (data/)

During the build process, the Dockerfile downloads the code from the private space using the HuggingFace Hub API.

Technical Details

  • Framework: FastAPI with Socket.IO
  • Models:
    • ASR: Whisper (English) and Wav2Vec2-BERT (African languages)
    • Translation: NLLB-600M fine-tuned model
    • TTS: VITS models for each language
  • Optimization: INT8 dynamic quantization via PyTorch