Voice_backend / API_DOCUMENTATION.md
Mohansai2004's picture
Upload 66 files
9838866 verified

Voice-to-Voice Translator API Documentation

πŸ“‹ Table of Contents


🎯 Overview

The Voice-to-Voice Translator API provides real-time audio translation capabilities through WebSocket connections. Users can join translation rooms and receive live translations of audio streams.

Key Features:

  • Real-time bidirectional audio translation
  • Multi-room support
  • Multiple language pairs
  • Low-latency streaming
  • JWT authentication (optional)
  • Rate limiting and connection management

🌐 Base URL

Development

ws://localhost:8000/ws

Production

wss://your-domain.com/ws

οΏ½ REST API Endpoints

The API provides several REST endpoints for management and information retrieval.

Base URL for REST API

Development: http://localhost:8000
Production: https://your-domain.com


1. Health Check

Get server health status.

Endpoint: GET /health

Authentication: None required

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": 3600,
  "connections": 15,
  "rooms": 3,
  "timestamp": "2025-12-17T10:30:00Z"
}

Status Codes:

  • 200 OK - Server is healthy
  • 503 Service Unavailable - Server is unhealthy

cURL Example:

curl http://localhost:8000/health

2. Create Authentication Token

Generate a JWT token for WebSocket authentication.

Endpoint: POST /auth/token

Authentication: API Key (optional)

Headers:

Content-Type: application/json
X-API-Key: your-api-key (optional)

Request Body:

{
  "user_id": "user123",
  "name": "John Doe",
  "metadata": {
    "email": "john@example.com"
  }
}

Parameters:

Parameter Type Required Description
user_id string Yes Unique user identifier
name string Yes User display name
metadata object No Additional user metadata

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer",
  "expires_in": 3600,
  "user_id": "user123"
}

Status Codes:

  • 200 OK - Token created successfully
  • 400 Bad Request - Invalid request body
  • 401 Unauthorized - Invalid API key
  • 429 Too Many Requests - Rate limit exceeded

cURL Example:

curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "user_id": "user123",
    "name": "John Doe"
  }'

3. Verify Token

Verify a JWT token's validity.

Endpoint: POST /auth/verify

Authentication: None required

Headers:

Content-Type: application/json

Request Body:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Response:

{
  "valid": true,
  "user_id": "user123",
  "expires_at": "2025-12-17T11:30:00Z"
}

Status Codes:

  • 200 OK - Token is valid
  • 401 Unauthorized - Token is invalid or expired

cURL Example:

curl -X POST http://localhost:8000/auth/verify \
  -H "Content-Type: application/json" \
  -d '{
    "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
  }'

4. Get Supported Languages

Retrieve list of supported languages.

Endpoint: GET /languages

Authentication: None required

Response:

{
  "languages": [
    {
      "code": "en",
      "name": "English",
      "stt_available": true,
      "translation_available": true,
      "tts_available": true
    },
    {
      "code": "es",
      "name": "Spanish",
      "stt_available": true,
      "translation_available": true,
      "tts_available": true
    },
    {
      "code": "fr",
      "name": "French",
      "stt_available": true,
      "translation_available": true,
      "tts_available": true
    }
  ],
  "total": 9
}

Status Codes:

  • 200 OK - Languages retrieved successfully

cURL Example:

curl http://localhost:8000/languages/supported

5. Get Available Translation Pairs

Get list of available language translation pairs.

Endpoint: GET /languages/pairs

Authentication: None required

Query Parameters:

Parameter Type Required Description
source string No Filter by source language
target string No Filter by target language

Response:

{
  "pairs": [
    {
      "source": "en",
      "target": "es",
      "available": true
    },
    {
      "source": "en",
      "target": "fr",
      "available": true
    },
    {
      "source": "es",
      "target": "en",
      "available": true
    }
  ],
  "total": 72
}

Status Codes:

  • 200 OK - Pairs retrieved successfully

cURL Example:

curl "http://localhost:8000/languages/pairs?source=en"

6. Create Room

Create a new translation room.

Endpoint: POST /rooms

Authentication: JWT Token or API Key

Headers:

Content-Type: application/json
Authorization: Bearer <token>

Request Body:

{
  "room_id": "meeting-room-123",
  "name": "Team Meeting",
  "max_users": 10,
  "languages": ["en", "es", "fr"],
  "settings": {
    "auto_translate": true,
    "record_session": false
  }
}

Parameters:

Parameter Type Required Description
room_id string No Custom room ID (auto-generated if not provided)
name string Yes Room display name
max_users integer No Maximum users (default: 10)
languages array No Allowed languages (all if not specified)
settings object No Room configuration

Response:

{
  "room_id": "meeting-room-123",
  "name": "Team Meeting",
  "created_at": "2025-12-17T10:30:00Z",
  "max_users": 10,
  "current_users": 0,
  "websocket_url": "ws://localhost:8000/ws"
}

Status Codes:

  • 201 Created - Room created successfully
  • 400 Bad Request - Invalid request body
  • 401 Unauthorized - Authentication required
  • 409 Conflict - Room ID already exists

cURL Example:

curl -X POST http://localhost:8000/rooms \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "name": "Team Meeting",
    "max_users": 10,
    "languages": ["en", "es"]
  }'

7. Get Room Information

Get details about a specific room.

Endpoint: GET /rooms/{room_id}

Authentication: JWT Token or API Key

Path Parameters:

Parameter Type Required Description
room_id string Yes Room identifier

Response:

{
  "room_id": "meeting-room-123",
  "name": "Team Meeting",
  "created_at": "2025-12-17T10:30:00Z",
  "max_users": 10,
  "current_users": 3,
  "users": [
    {
      "user_id": "user_abc123",
      "name": "Alice",
      "language": "en",
      "connected_at": "2025-12-17T10:31:00Z"
    },
    {
      "user_id": "user_def456",
      "name": "Bob",
      "language": "es",
      "connected_at": "2025-12-17T10:32:00Z"
    }
  ],
  "active": true
}

Status Codes:

  • 200 OK - Room found
  • 401 Unauthorized - Authentication required
  • 404 Not Found - Room does not exist

cURL Example:

curl http://localhost:8000/rooms/meeting-room-123 \
  -H "Authorization: Bearer <token>"

8. List All Rooms

Get list of all active rooms.

Endpoint: GET /rooms

Authentication: JWT Token or API Key

Query Parameters:

Parameter Type Required Description
page integer No Page number (default: 1)
limit integer No Items per page (default: 20, max: 100)
active boolean No Filter by active status

Response:

{
  "rooms": [
    {
      "room_id": "meeting-room-123",
      "name": "Team Meeting",
      "current_users": 3,
      "max_users": 10,
      "active": true,
      "created_at": "2025-12-17T10:30:00Z"
    },
    {
      "room_id": "conference-456",
      "name": "Conference Call",
      "current_users": 5,
      "max_users": 20,
      "active": true,
      "created_at": "2025-12-17T09:15:00Z"
    }
  ],
  "total": 15,
  "page": 1,
  "limit": 20,
  "pages": 1
}

Status Codes:

  • 200 OK - Rooms retrieved successfully
  • 401 Unauthorized - Authentication required

cURL Example:

curl "http://localhost:8000/rooms?page=1&limit=20" \
  -H "Authorization: Bearer <token>"

9. Delete Room

Delete a room and disconnect all users.

Endpoint: DELETE /rooms/{room_id}

Authentication: JWT Token or API Key (Admin)

Path Parameters:

Parameter Type Required Description
room_id string Yes Room identifier

Response:

{
  "success": true,
  "room_id": "meeting-room-123",
  "message": "Room deleted successfully",
  "disconnected_users": 3
}

Status Codes:

  • 200 OK - Room deleted successfully
  • 401 Unauthorized - Authentication required
  • 403 Forbidden - Insufficient permissions
  • 404 Not Found - Room does not exist

cURL Example:

curl -X DELETE http://localhost:8000/rooms/meeting-room-123 \
  -H "Authorization: Bearer <token>"

10. Get Server Statistics

Get server statistics and metrics.

Endpoint: GET /stats

Authentication: JWT Token or API Key

Response:

{
  "server": {
    "uptime": 86400,
    "version": "1.0.0",
    "environment": "production"
  },
  "connections": {
    "total": 150,
    "active": 142,
    "idle": 8
  },
  "rooms": {
    "total": 25,
    "active": 20,
    "empty": 5
  },
  "workers": {
    "translation": {
      "total": 4,
      "busy": 2,
      "queue_size": 5
    },
    "tts": {
      "total": 2,
      "busy": 1,
      "queue_size": 3
    }
  },
  "processing": {
    "total_translations": 5420,
    "total_audio_processed_mb": 2850,
    "avg_latency_ms": 245
  },
  "timestamp": "2025-12-17T10:30:00Z"
}

Status Codes:

  • 200 OK - Statistics retrieved successfully
  • 401 Unauthorized - Authentication required

cURL Example:

curl http://localhost:8000/stats \
  -H "Authorization: Bearer <token>"

11. Text-Only Translation

Translate text without audio processing.

Endpoint: POST /translate

Authentication: JWT Token or API Key

Headers:

Content-Type: application/json
Authorization: Bearer <token>

Request Body:

{
  "text": "Hello, how are you?",
  "source_language": "en",
  "target_language": "es"
}

Parameters:

Parameter Type Required Description
text string Yes Text to translate
source_language string Yes Source language code
target_language string Yes Target language code

Response:

{
  "original_text": "Hello, how are you?",
  "translated_text": "Hola, ΒΏcΓ³mo estΓ‘s?",
  "source_language": "en",
  "target_language": "es",
  "processing_time_ms": 45
}

Status Codes:

  • 200 OK - Translation successful
  • 400 Bad Request - Invalid request body
  • 401 Unauthorized - Authentication required
  • 422 Unprocessable Entity - Unsupported language pair

cURL Example:

curl -X POST http://localhost:8000/translate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "text": "Hello, how are you?",
    "source_language": "en",
    "target_language": "es"
  }'

12. Batch Translation

Translate multiple texts in one request.

Endpoint: POST /translate/batch

Authentication: JWT Token or API Key

Headers:

Content-Type: application/json
Authorization: Bearer <token>

Request Body:

{
  "texts": [
    "Hello, how are you?",
    "What time is it?",
    "Thank you very much"
  ],
  "source_language": "en",
  "target_language": "es"
}

Parameters:

Parameter Type Required Description
texts array Yes Array of texts to translate (max 100)
source_language string Yes Source language code
target_language string Yes Target language code

Response:

{
  "translations": [
    {
      "original": "Hello, how are you?",
      "translated": "Hola, ΒΏcΓ³mo estΓ‘s?",
      "index": 0
    },
    {
      "original": "What time is it?",
      "translated": "ΒΏQuΓ© hora es?",
      "index": 1
    },
    {
      "original": "Thank you very much",
      "translated": "Muchas gracias",
      "index": 2
    }
  ],
  "total": 3,
  "source_language": "en",
  "target_language": "es",
  "processing_time_ms": 120
}

Status Codes:

  • 200 OK - Translations successful
  • 400 Bad Request - Invalid request body or too many texts
  • 401 Unauthorized - Authentication required
  • 422 Unprocessable Entity - Unsupported language pair

cURL Example:

curl -X POST http://localhost:8000/translate/batch \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "texts": ["Hello", "Goodbye", "Thank you"],
    "source_language": "en",
    "target_language": "es"
  }'

13. Download TTS Audio

Generate and download TTS audio for text.

Endpoint: POST /tts/generate

Authentication: JWT Token or API Key

Headers:

Content-Type: application/json
Authorization: Bearer <token>

Request Body:

{
  "text": "Hello, this is a test message",
  "language": "en",
  "format": "wav"
}

Parameters:

Parameter Type Required Description
text string Yes Text to synthesize
language string Yes Language code
format string No Audio format: "wav", "mp3" (default: "wav")

Response:

  • Content-Type: audio/wav or audio/mpeg
  • Body: Binary audio data

Status Codes:

  • 200 OK - Audio generated successfully
  • 400 Bad Request - Invalid request body
  • 401 Unauthorized - Authentication required
  • 422 Unprocessable Entity - Unsupported language

cURL Example:

curl -X POST http://localhost:8000/tts/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "text": "Hello world",
    "language": "en"
  }' \
  --output output.wav

14. System Configuration

Get current system configuration (Admin only).

Endpoint: GET /config

Authentication: JWT Token (Admin)

Response:

{
  "audio": {
    "sample_rate": 16000,
    "channels": 1,
    "chunk_size": 4096,
    "format": "PCM16"
  },
  "limits": {
    "max_connections": 100,
    "max_connections_per_ip": 10,
    "max_users_per_room": 10,
    "max_message_size": 10485760
  },
  "rate_limits": {
    "messages_per_second": 10,
    "requests_per_minute": 100
  },
  "workers": {
    "translation_workers": 4,
    "tts_workers": 2
  },
  "features": {
    "authentication_enabled": false,
    "rate_limiting_enabled": true,
    "metrics_enabled": true
  }
}

Status Codes:

  • 200 OK - Configuration retrieved
  • 401 Unauthorized - Authentication required
  • 403 Forbidden - Admin access required

cURL Example:

curl http://localhost:8000/config \
  -H "Authorization: Bearer <admin-token>"

REST API Response Format

All REST API responses follow this format:

Success Response:

{
  // Response data
}

Error Response:

{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human readable error message",
    "details": {
      // Additional error details
    }
  }
}

οΏ½πŸ” Authentication

Optional JWT Authentication

If authentication is enabled (ENABLE_AUTH=true), include the JWT token in the WebSocket connection URL:

ws://localhost:8000/ws?token=YOUR_JWT_TOKEN

Obtaining a Token

Endpoint: POST /auth/token

Request Body:

{
  "user_id": "user123",
  "name": "John Doe"
}

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer",
  "expires_in": 3600
}

API Key Authentication

Alternatively, use an API key in the query parameter:

ws://localhost:8000/ws?api_key=YOUR_API_KEY

πŸ”Œ WebSocket Connection

Connecting

JavaScript Example:

const ws = new WebSocket('ws://localhost:8000/ws');

ws.onopen = () => {
  console.log('Connected to translation server');
};

ws.onmessage = (event) => {
  if (typeof event.data === 'string') {
    // Text message (JSON)
    const message = JSON.parse(event.data);
    handleMessage(message);
  } else {
    // Binary message (audio data)
    handleAudioData(event.data);
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = () => {
  console.log('Disconnected from server');
};

Python Example:

import asyncio
import websockets
import json

async def connect():
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as websocket:
        # Send message
        message = {
            "type": "join_room",
            "payload": {
                "room_id": "room123",
                "user_name": "Alice",
                "language": "en"
            }
        }
        await websocket.send(json.dumps(message))
        
        # Receive messages
        async for message in websocket:
            if isinstance(message, str):
                data = json.loads(message)
                print(f"Received: {data}")
            else:
                print(f"Received audio: {len(message)} bytes")

asyncio.run(connect())

Connection Limits

  • Max connections per IP: 10 (configurable)
  • Max concurrent connections: 100 (configurable)
  • Connection timeout: 300 seconds (idle)

πŸ“¨ Message Protocol

Message Structure

All text messages are JSON with the following structure:

{
  "type": "MESSAGE_TYPE",
  "payload": {
    // Type-specific data
  },
  "timestamp": "2025-12-17T10:30:00Z"
}

Message Flow

Client β†’ Server: Text Messages (JSON)
Server β†’ Client: Text Messages (JSON)
Client β†’ Server: Binary Messages (Audio Data)
Server β†’ Client: Binary Messages (Audio Data)

πŸ“ Message Types

1. JOIN_ROOM

Join a translation room.

Direction: Client β†’ Server

Parameters:

Parameter Type Required Description
room_id string Yes Room identifier
user_name string Yes User display name
language string Yes User's language code (e.g., "en", "es", "fr")

Example:

{
  "type": "join_room",
  "payload": {
    "room_id": "room123",
    "user_name": "Alice",
    "language": "en"
  }
}

Response:

{
  "type": "room_joined",
  "payload": {
    "room_id": "room123",
    "user_id": "user_abc123",
    "users": [
      {
        "user_id": "user_abc123",
        "name": "Alice",
        "language": "en"
      },
      {
        "user_id": "user_def456",
        "name": "Bob",
        "language": "es"
      }
    ]
  },
  "timestamp": "2025-12-17T10:30:00Z"
}

2. LEAVE_ROOM

Leave the current room.

Direction: Client β†’ Server

Parameters:

Parameter Type Required Description
room_id string Yes Room identifier to leave

Example:

{
  "type": "leave_room",
  "payload": {
    "room_id": "room123"
  }
}

Response:

{
  "type": "room_left",
  "payload": {
    "room_id": "room123",
    "user_id": "user_abc123"
  },
  "timestamp": "2025-12-17T10:35:00Z"
}

3. AUDIO_START

Notify that audio streaming will begin.

Direction: Client β†’ Server

Parameters:

Parameter Type Required Description
room_id string Yes Room identifier
audio_config object No Audio configuration
audio_config.sample_rate integer No Sample rate in Hz (default: 16000)
audio_config.channels integer No Number of channels (default: 1)
audio_config.format string No Audio format (default: "PCM16")

Example:

{
  "type": "audio_start",
  "payload": {
    "room_id": "room123",
    "audio_config": {
      "sample_rate": 16000,
      "channels": 1,
      "format": "PCM16"
    }
  }
}

Response:

{
  "type": "audio_started",
  "payload": {
    "room_id": "room123",
    "user_id": "user_abc123",
    "status": "ready"
  },
  "timestamp": "2025-12-17T10:31:00Z"
}

4. AUDIO_DATA (Binary)

Send audio data for translation.

Direction: Client β†’ Server (Binary)

Format: Raw PCM16 audio bytes

Requirements:

  • Format: PCM16 (16-bit signed integer)
  • Sample Rate: 16000 Hz (configurable)
  • Channels: 1 (mono)
  • Chunk Size: 4096 bytes (recommended)

JavaScript Example:

// Capture audio from microphone
navigator.mediaDevices.getUserMedia({ audio: true })
  .then(stream => {
    const mediaRecorder = new MediaRecorder(stream);
    
    mediaRecorder.ondataavailable = (event) => {
      // Convert to PCM16 and send
      const audioData = convertToPCM16(event.data);
      ws.send(audioData);
    };
    
    mediaRecorder.start(100); // Send every 100ms
  });

Python Example:

import pyaudio

# Audio configuration
CHUNK = 4096
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000

audio = pyaudio.PyAudio()
stream = audio.open(
    format=FORMAT,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    frames_per_buffer=CHUNK
)

# Send audio chunks
while True:
    audio_data = stream.read(CHUNK)
    await websocket.send(audio_data)

5. AUDIO_STOP

Notify that audio streaming has stopped.

Direction: Client β†’ Server

Parameters:

Parameter Type Required Description
room_id string Yes Room identifier

Example:

{
  "type": "audio_stop",
  "payload": {
    "room_id": "room123"
  }
}

Response:

{
  "type": "audio_stopped",
  "payload": {
    "room_id": "room123",
    "user_id": "user_abc123"
  },
  "timestamp": "2025-12-17T10:32:00Z"
}

6. TRANSLATION_RESULT

Receive translated text.

Direction: Server β†’ Client

Parameters:

Parameter Type Description
original_text string Original recognized text
translated_text string Translated text
source_language string Source language code
target_language string Target language code
source_user_id string User who spoke

Example:

{
  "type": "translation_result",
  "payload": {
    "original_text": "Hello, how are you?",
    "translated_text": "Hola, ΒΏcΓ³mo estΓ‘s?",
    "source_language": "en",
    "target_language": "es",
    "source_user_id": "user_abc123"
  },
  "timestamp": "2025-12-17T10:31:15Z"
}

7. TRANSLATED_AUDIO (Binary)

Receive translated audio.

Direction: Server β†’ Client (Binary)

Format: Raw PCM16 audio bytes ready for playback

JavaScript Example:

ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Binary audio data
    playAudio(event.data);
  }
};

function playAudio(audioBlob) {
  const audioContext = new AudioContext();
  const reader = new FileReader();
  
  reader.onload = (e) => {
    audioContext.decodeAudioData(e.target.result, (buffer) => {
      const source = audioContext.createBufferSource();
      source.buffer = buffer;
      source.connect(audioContext.destination);
      source.start();
    });
  };
  
  reader.readAsArrayBuffer(audioBlob);
}

8. USER_JOINED

Notification when a user joins the room.

Direction: Server β†’ Client

Parameters:

Parameter Type Description
room_id string Room identifier
user_id string New user's ID
user_name string New user's name
language string New user's language

Example:

{
  "type": "user_joined",
  "payload": {
    "room_id": "room123",
    "user_id": "user_def456",
    "user_name": "Bob",
    "language": "es"
  },
  "timestamp": "2025-12-17T10:30:30Z"
}

9. USER_LEFT

Notification when a user leaves the room.

Direction: Server β†’ Client

Parameters:

Parameter Type Description
room_id string Room identifier
user_id string User who left

Example:

{
  "type": "user_left",
  "payload": {
    "room_id": "room123",
    "user_id": "user_def456"
  },
  "timestamp": "2025-12-17T10:35:00Z"
}

10. PING / PONG

Heartbeat messages to keep connection alive.

Direction: Bidirectional

PING (Server β†’ Client):

{
  "type": "ping",
  "payload": {},
  "timestamp": "2025-12-17T10:31:00Z"
}

PONG (Client β†’ Server):

{
  "type": "pong",
  "payload": {},
  "timestamp": "2025-12-17T10:31:00Z"
}

Configuration:

  • Ping interval: 30 seconds (default)
  • Ping timeout: 10 seconds (default)

11. ERROR

Error message from server.

Direction: Server β†’ Client

Parameters:

Parameter Type Description
error_code string Error code identifier
message string Human-readable error message
details object Additional error details (optional)

Example:

{
  "type": "error",
  "payload": {
    "error_code": "ROOM_FULL",
    "message": "Room has reached maximum capacity",
    "details": {
      "room_id": "room123",
      "max_users": 10,
      "current_users": 10
    }
  },
  "timestamp": "2025-12-17T10:30:00Z"
}

Common Error Codes:

  • AUTH_FAILED: Authentication failed
  • ROOM_NOT_FOUND: Room does not exist
  • ROOM_FULL: Room at maximum capacity
  • INVALID_MESSAGE: Malformed message
  • RATE_LIMIT_EXCEEDED: Too many requests
  • UNSUPPORTED_LANGUAGE: Language not supported
  • AUDIO_PROCESSING_ERROR: Audio processing failed

⚠️ Error Handling

Client-Side Error Handling

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
  // Attempt reconnection
  setTimeout(() => reconnect(), 5000);
};

ws.onclose = (event) => {
  if (event.code === 1008) {
    console.error('Connection closed: Rate limit exceeded');
  } else if (event.code === 1000) {
    console.log('Connection closed normally');
  } else {
    console.log('Connection closed unexpectedly:', event.code);
    // Attempt reconnection
    setTimeout(() => reconnect(), 5000);
  }
};

Close Codes

Code Description
1000 Normal closure
1001 Going away
1008 Policy violation (rate limit)
1011 Internal server error

🚦 Rate Limits

Connection Limits

Limit Type Default Value Configurable
Max connections per IP 10 Yes
Max total connections 100 Yes
Connection timeout 300 seconds Yes

Message Limits

Limit Type Default Value Configurable
Messages per second 10 per connection Yes
Requests per minute 100 per user Yes
Audio chunk size 10 MB Yes

Rate Limit Headers

Rate limit information is included in error responses:

{
  "type": "error",
  "payload": {
    "error_code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests",
    "details": {
      "limit": 100,
      "remaining": 0,
      "reset_at": "2025-12-17T10:31:00Z"
    }
  }
}

πŸ’» Code Examples

Complete Client Example (JavaScript)

class VoiceTranslatorClient {
  constructor(url, options = {}) {
    this.url = url;
    this.ws = null;
    this.roomId = null;
    this.userId = null;
    this.options = {
      language: options.language || 'en',
      userName: options.userName || 'Anonymous',
      ...options
    };
  }

  connect() {
    return new Promise((resolve, reject) => {
      this.ws = new WebSocket(this.url);
      
      this.ws.onopen = () => {
        console.log('Connected to translation server');
        resolve();
      };
      
      this.ws.onerror = (error) => {
        console.error('WebSocket error:', error);
        reject(error);
      };
      
      this.ws.onmessage = (event) => {
        this.handleMessage(event);
      };
      
      this.ws.onclose = () => {
        console.log('Disconnected from server');
        this.reconnect();
      };
    });
  }

  handleMessage(event) {
    if (typeof event.data === 'string') {
      const message = JSON.parse(event.data);
      
      switch (message.type) {
        case 'room_joined':
          this.userId = message.payload.user_id;
          this.onRoomJoined(message.payload);
          break;
        case 'translation_result':
          this.onTranslation(message.payload);
          break;
        case 'user_joined':
          this.onUserJoined(message.payload);
          break;
        case 'user_left':
          this.onUserLeft(message.payload);
          break;
        case 'error':
          this.onError(message.payload);
          break;
        case 'ping':
          this.sendPong();
          break;
      }
    } else {
      // Binary audio data
      this.onAudioReceived(event.data);
    }
  }

  async joinRoom(roomId) {
    this.roomId = roomId;
    
    const message = {
      type: 'join_room',
      payload: {
        room_id: roomId,
        user_name: this.options.userName,
        language: this.options.language
      }
    };
    
    this.send(message);
  }

  async startAudio() {
    const message = {
      type: 'audio_start',
      payload: {
        room_id: this.roomId,
        audio_config: {
          sample_rate: 16000,
          channels: 1,
          format: 'PCM16'
        }
      }
    };
    
    this.send(message);
    
    // Start capturing audio
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    this.startAudioCapture(stream);
  }

  startAudioCapture(stream) {
    const audioContext = new AudioContext({ sampleRate: 16000 });
    const source = audioContext.createMediaStreamSource(stream);
    const processor = audioContext.createScriptProcessor(4096, 1, 1);
    
    processor.onaudioprocess = (e) => {
      const inputData = e.inputBuffer.getChannelData(0);
      const pcm16 = this.convertToPCM16(inputData);
      this.ws.send(pcm16);
    };
    
    source.connect(processor);
    processor.connect(audioContext.destination);
  }

  convertToPCM16(float32Array) {
    const int16Array = new Int16Array(float32Array.length);
    for (let i = 0; i < float32Array.length; i++) {
      const s = Math.max(-1, Math.min(1, float32Array[i]));
      int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
    }
    return int16Array.buffer;
  }

  stopAudio() {
    const message = {
      type: 'audio_stop',
      payload: {
        room_id: this.roomId
      }
    };
    
    this.send(message);
  }

  leaveRoom() {
    const message = {
      type: 'leave_room',
      payload: {
        room_id: this.roomId
      }
    };
    
    this.send(message);
  }

  send(message) {
    if (this.ws && this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(message));
    }
  }

  sendPong() {
    this.send({ type: 'pong', payload: {} });
  }

  disconnect() {
    if (this.ws) {
      this.ws.close();
    }
  }

  reconnect() {
    setTimeout(() => {
      console.log('Attempting to reconnect...');
      this.connect();
    }, 5000);
  }

  // Event handlers (override these)
  onRoomJoined(data) {
    console.log('Joined room:', data);
  }

  onTranslation(data) {
    console.log('Translation:', data.translated_text);
  }

  onAudioReceived(audioData) {
    console.log('Received audio:', audioData.byteLength, 'bytes');
    // Play the audio
  }

  onUserJoined(data) {
    console.log('User joined:', data.user_name);
  }

  onUserLeft(data) {
    console.log('User left:', data.user_id);
  }

  onError(error) {
    console.error('Error:', error.message);
  }
}

// Usage
const client = new VoiceTranslatorClient('ws://localhost:8000/ws', {
  language: 'en',
  userName: 'Alice'
});

await client.connect();
await client.joinRoom('room123');
await client.startAudio();

Complete Client Example (Python)

import asyncio
import websockets
import json
import pyaudio

class VoiceTranslatorClient:
    def __init__(self, url, language='en', user_name='Anonymous'):
        self.url = url
        self.language = language
        self.user_name = user_name
        self.ws = None
        self.room_id = None
        self.user_id = None
        self.running = False

    async def connect(self):
        self.ws = await websockets.connect(self.url)
        print('Connected to translation server')
        
        # Start message handler
        asyncio.create_task(self.message_handler())

    async def message_handler(self):
        async for message in self.ws:
            if isinstance(message, str):
                data = json.loads(message)
                await self.handle_message(data)
            else:
                await self.handle_audio(message)

    async def handle_message(self, message):
        msg_type = message.get('type')
        payload = message.get('payload', {})
        
        if msg_type == 'room_joined':
            self.user_id = payload.get('user_id')
            print(f"Joined room: {payload.get('room_id')}")
        elif msg_type == 'translation_result':
            print(f"Translation: {payload.get('translated_text')}")
        elif msg_type == 'user_joined':
            print(f"User joined: {payload.get('user_name')}")
        elif msg_type == 'user_left':
            print(f"User left: {payload.get('user_id')}")
        elif msg_type == 'error':
            print(f"Error: {payload.get('message')}")
        elif msg_type == 'ping':
            await self.send_pong()

    async def handle_audio(self, audio_data):
        print(f"Received audio: {len(audio_data)} bytes")
        # Play audio here

    async def join_room(self, room_id):
        self.room_id = room_id
        
        message = {
            'type': 'join_room',
            'payload': {
                'room_id': room_id,
                'user_name': self.user_name,
                'language': self.language
            }
        }
        
        await self.send(message)

    async def start_audio(self):
        message = {
            'type': 'audio_start',
            'payload': {
                'room_id': self.room_id,
                'audio_config': {
                    'sample_rate': 16000,
                    'channels': 1,
                    'format': 'PCM16'
                }
            }
        }
        
        await self.send(message)
        
        # Start audio capture
        asyncio.create_task(self.capture_audio())

    async def capture_audio(self):
        CHUNK = 4096
        FORMAT = pyaudio.paInt16
        CHANNELS = 1
        RATE = 16000
        
        audio = pyaudio.PyAudio()
        stream = audio.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        
        self.running = True
        
        while self.running:
            audio_data = stream.read(CHUNK)
            await self.ws.send(audio_data)
            await asyncio.sleep(0.01)
        
        stream.stop_stream()
        stream.close()
        audio.terminate()

    async def stop_audio(self):
        self.running = False
        
        message = {
            'type': 'audio_stop',
            'payload': {
                'room_id': self.room_id
            }
        }
        
        await self.send(message)

    async def leave_room(self):
        message = {
            'type': 'leave_room',
            'payload': {
                'room_id': self.room_id
            }
        }
        
        await self.send(message)

    async def send(self, message):
        await self.ws.send(json.dumps(message))

    async def send_pong(self):
        await self.send({'type': 'pong', 'payload': {}})

    async def disconnect(self):
        await self.ws.close()

# Usage
async def main():
    client = VoiceTranslatorClient(
        'ws://localhost:8000/ws',
        language='en',
        user_name='Alice'
    )
    
    await client.connect()
    await client.join_room('room123')
    await client.start_audio()
    
    # Keep running for 60 seconds
    await asyncio.sleep(60)
    
    await client.stop_audio()
    await client.leave_room()
    await client.disconnect()

asyncio.run(main())

🌍 Supported Languages

Language Code Language Name
en English
hi Hindi
te Telugu
ta Tamil
kn Kannada
ml Malayalam
gu Gujarati
mr Marathi
bn Bengali
es Spanish
fr French
de German
it Italian
pt Portuguese
ru Russian
zh Chinese
ja Japanese

Primary Focus: Indian languages (Hindi, Telugu, Tamil, Kannada, Malayalam, Gujarati, Marathi, Bengali)

Note: Language support depends on installed models. Check available languages with the /languages endpoint.


πŸ“Š Health Check

Endpoint: GET /health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": 3600,
  "connections": 15,
  "rooms": 3
}

πŸ”§ Configuration

Environment variables to customize API behavior:

# Server
HOST=0.0.0.0
PORT=8000

# Audio
AUDIO_SAMPLE_RATE=16000
AUDIO_CHANNELS=1
AUDIO_CHUNK_SIZE=4096

# Security
ENABLE_AUTH=false
JWT_SECRET_KEY=your-secret-key
API_KEYS=key1,key2,key3

# Rate Limiting
MAX_CONNECTIONS_PER_IP=10
MAX_MESSAGES_PER_SECOND=10
MAX_REQUESTS_PER_MINUTE=100

# Workers
TRANSLATION_WORKERS=4
TTS_WORKERS=2

# Models
VOSK_MODEL_PATH_EN=models/vosk-en
ARGOS_MODEL_PATH=models/argos
COQUI_MODEL_PATH=models/coqui

πŸ› Troubleshooting

Connection Issues

Problem: Cannot connect to WebSocket

Solutions:

  • Verify the server is running
  • Check firewall settings
  • Ensure correct URL (ws:// for HTTP, wss:// for HTTPS)
  • Verify authentication token if required

Audio Issues

Problem: No audio being received

Solutions:

  • Check audio format (must be PCM16, 16kHz, mono)
  • Verify microphone permissions
  • Ensure audio chunks are correct size
  • Check rate limits not exceeded

Translation Issues

Problem: Translations not working

Solutions:

  • Verify language models are installed
  • Check language codes are supported
  • Ensure room has users with different languages
  • Check server logs for errors

πŸ“ž Support

For issues and questions:


πŸ“„ License

This API documentation is part of the Voice-to-Voice Translator project.

Version: 1.0.0
Last Updated: December 17, 2025