File size: 1,914 Bytes
d6e8bff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
title: Polyglot Translation Backend
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
---

# Polyglot Translation Backend - Quantized Models

Real-time speech transcription and translation API with Socket.IO for WebSocket communication. This version uses INT8 quantized models for improved performance and reduced memory footprint.

## Features

- **Real-time Speech Recognition**: Support for English, Swahili, Kikuyu, Kamba, Kimeru, Luo, and Somali
- **Translation**: Multi-language translation using NLLB models
- **Text-to-Speech**: Generate speech in multiple languages
- **WebSocket Support**: Real-time communication via Socket.IO
- **Model Quantization**: INT8 dynamic quantization for faster inference

## API Endpoints

- `GET /health` - Health check endpoint
- `WebSocket /` - Socket.IO connection for real-time communication

## Environment

This Space requires the following secrets to be configured:

- `HUGGING_FACE_HUB_TOKEN` - HuggingFace token for model access
- `CODE_SPACE_ID` - ID of the private code space (e.g., "mutisya/polyglot-backend-code")

### Code Space Architecture

This Docker Space downloads the application code from a separate private Space during build time. This allows the Docker Space to be public while keeping the source code private.

- **Public Docker Space** (this one): Contains only the Dockerfile and deployment configuration
- **Private Code Space**: Contains the actual application code (`app/`) and data (`data/`)

During the build process, the Dockerfile downloads the code from the private space using the HuggingFace Hub API.

## Technical Details

- **Framework**: FastAPI with Socket.IO
- **Models**:
  - ASR: Whisper (English) and Wav2Vec2-BERT (African languages)
  - Translation: NLLB-600M fine-tuned model
  - TTS: VITS models for each language
- **Optimization**: INT8 dynamic quantization via PyTorch