Spaces:
Sleeping
WebSocket Protocol
Overview
The Voice-to-Voice Translator uses a WebSocket-based protocol for real-time bidirectional communication between clients and the server. Messages are exchanged in JSON format (text) and raw audio data (binary).
Connection Endpoint
ws://host:port/ws
Connection Parameters
Query parameters (optional):
token: JWT authentication token (if auth enabled)client_id: Unique client identifier
Example:
ws://localhost:8000/ws?token=eyJhbGc...&client_id=client123
Message Types
All text messages follow this structure:
{
"type": "message_type",
"payload": { ... },
"timestamp": "2025-12-17T10:30:00Z",
"message_id": "uuid-v4"
}
Client โ Server Messages
1. JOIN_ROOM
Join or create a translation room.
{
"type": "join_room",
"payload": {
"room_id": "room123",
"user_id": "user1",
"username": "John Doe",
"source_lang": "en",
"target_lang": "hi"
}
}
Fields:
room_id: Unique room identifieruser_id: Unique user identifierusername: Display namesource_lang: User's speaking language (ISO 639-1 code)target_lang: Desired translation language
Response: ROOM_JOINED or ERROR
2. LEAVE_ROOM
Leave current room.
{
"type": "leave_room",
"payload": {
"room_id": "room123",
"user_id": "user1"
}
}
Response: ROOM_LEFT or ERROR
3. AUDIO_START
Signal start of audio stream.
{
"type": "audio_start",
"payload": {
"room_id": "room123",
"user_id": "user1",
"audio_config": {
"sample_rate": 16000,
"channels": 1,
"format": "PCM16",
"chunk_size": 4096
}
}
}
Response: AUDIO_START_ACK
4. AUDIO_STOP
Signal end of audio stream.
{
"type": "audio_stop",
"payload": {
"room_id": "room123",
"user_id": "user1"
}
}
Response: AUDIO_STOP_ACK
5. TEXT_MESSAGE
Send text message (for chat or corrections).
{
"type": "text_message",
"payload": {
"room_id": "room123",
"user_id": "user1",
"text": "Hello, how are you?",
"lang": "en"
}
}
Response: TEXT_MESSAGE (broadcast to room)
6. PING
Heartbeat message.
{
"type": "ping",
"payload": {}
}
Response: PONG
7. GET_ROOM_INFO
Request current room state.
{
"type": "get_room_info",
"payload": {
"room_id": "room123"
}
}
Response: ROOM_INFO
Server โ Client Messages
1. ROOM_JOINED
Confirmation of room join.
{
"type": "room_joined",
"payload": {
"room_id": "room123",
"user_id": "user1",
"users": [
{
"user_id": "user1",
"username": "John Doe",
"source_lang": "en",
"target_lang": "hi"
},
{
"user_id": "user2",
"username": "Jane Smith",
"source_lang": "hi",
"target_lang": "en"
}
]
}
}
2. ROOM_LEFT
Confirmation of room leave.
{
"type": "room_left",
"payload": {
"room_id": "room123",
"user_id": "user1"
}
}
3. USER_JOINED
Broadcast when another user joins.
{
"type": "user_joined",
"payload": {
"room_id": "room123",
"user": {
"user_id": "user2",
"username": "Jane Smith",
"source_lang": "hi",
"target_lang": "en"
}
}
}
4. USER_LEFT
Broadcast when another user leaves.
{
"type": "user_left",
"payload": {
"room_id": "room123",
"user_id": "user2",
"username": "Jane Smith"
}
}
5. TRANSCRIPTION
Intermediate transcription result (from STT).
{
"type": "transcription",
"payload": {
"room_id": "room123",
"user_id": "user1",
"text": "Hello how are you",
"lang": "en",
"is_final": false,
"confidence": 0.85
}
}
6. TRANSLATION
Translation result.
{
"type": "translation",
"payload": {
"room_id": "room123",
"source_user_id": "user1",
"target_user_id": "user2",
"original_text": "Hello, how are you?",
"translated_text": "เคจเคฎเคธเฅเคคเฅ, เคเคช เคเฅเคธเฅ เคนเฅเค?",
"source_lang": "en",
"target_lang": "hi",
"confidence": 0.92
}
}
7. AUDIO_START_ACK
Acknowledgment of audio start.
{
"type": "audio_start_ack",
"payload": {
"room_id": "room123",
"user_id": "user1",
"ready": true
}
}
8. AUDIO_STOP_ACK
Acknowledgment of audio stop.
{
"type": "audio_stop_ack",
"payload": {
"room_id": "room123",
"user_id": "user1"
}
}
9. PONG
Heartbeat response.
{
"type": "pong",
"payload": {
"timestamp": "2025-12-17T10:30:00Z"
}
}
10. ROOM_INFO
Room state information.
{
"type": "room_info",
"payload": {
"room_id": "room123",
"created_at": "2025-12-17T10:00:00Z",
"users": [ ... ],
"active_speakers": ["user1"],
"supported_languages": ["en", "hi"]
}
}
11. ERROR
Error notification.
{
"type": "error",
"payload": {
"code": "INVALID_ROOM",
"message": "Room does not exist",
"details": "Room 'room123' not found",
"recoverable": true
}
}
Error Codes:
INVALID_ROOM: Room not foundROOM_FULL: Maximum users reachedINVALID_MESSAGE: Malformed messageAUTH_FAILED: Authentication failedRATE_LIMIT: Too many requestsINTERNAL_ERROR: Server errorUNSUPPORTED_LANGUAGE: Language not availableAUDIO_ERROR: Audio processing error
Binary Audio Messages
Audio data is sent as binary WebSocket frames.
Client โ Server (Audio Input)
Binary message structure:
[Header (16 bytes)][Audio Data (variable)]
Header Format:
- Bytes 0-7: User ID (UTF-8, padded)
- Bytes 8-11: Sequence number (uint32, big-endian)
- Bytes 12-15: Timestamp (uint32, milliseconds)
Audio Data:
- Format: PCM16 (16-bit signed integer)
- Sample Rate: 16000 Hz (configurable)
- Channels: 1 (mono)
- Byte Order: Little-endian
Server โ Client (Translated Audio)
Binary message structure:
[Header (24 bytes)][Audio Data (variable)]
Header Format:
- Bytes 0-7: Source User ID (UTF-8, padded)
- Bytes 8-15: Target User ID (UTF-8, padded)
- Bytes 16-19: Sequence number (uint32, big-endian)
- Bytes 20-23: Timestamp (uint32, milliseconds)
Audio Data:
- Same format as input
Connection Lifecycle
1. Connection Establishment
Client Server
โ โ
โโโโโโโโ WebSocket Connect โโโโโบโ
โ โ
โโโโโโโโ Connection Open โโโโโโโโค
โ โ
2. Room Join
Client Server
โ โ
โโโโโโโโโโโ JOIN_ROOM โโโโโโโโโโบโ
โ โ
โโโโโโโโโโโ ROOM_JOINED โโโโโโโโโค
โ โ
โโโโโโโโโโโ USER_JOINED โโโโโโโโโค (broadcast to others)
โ โ
3. Audio Streaming
Client Server Other Client
โ โ โ
โโโโโโ AUDIO_START โโโโโโโโโโโโโบโ โ
โ โ โ
โโโโโโ AUDIO_START_ACK โโโโโโโโโโค โ
โ โ โ
โโโโ Binary Audio Chunk 1 โโโโโโบโ โ
โโโโ Binary Audio Chunk 2 โโโโโโบโ โ
โ โ โ
โโโโโโโโโ TRANSCRIPTION โโโโโโโโโค โ
โ โ โ
โโโโโโโโโ TRANSLATION โโโโโโโโโโโค โ
โ โ โ
โ โโโโโบ Binary Audio Chunk โโโโโโบโ
โ โ โ
โโโโโโ AUDIO_STOP โโโโโโโโโโโโโโบโ โ
โ โ โ
โโโโโโ AUDIO_STOP_ACK โโโโโโโโโโโค โ
โ โ โ
4. Disconnection
Client Server
โ โ
โโโโโโโโโโโ LEAVE_ROOM โโโโโโโโโบโ
โ โ
โโโโโโโโโโโ ROOM_LEFT โโโโโโโโโโโค
โ โ
โโโโโโโโ Close Connection โโโโโโบโ
โ โ
โโโโโโโโโ Close Confirm โโโโโโโโโค
โ โ
Rate Limiting
Default limits:
- Join room: 10 requests per minute
- Audio streaming: Unlimited (quality-based throttling)
- Text messages: 30 per minute
- Room info requests: 60 per minute
Reconnection Strategy
- Exponential Backoff: 1s, 2s, 4s, 8s, 16s, 30s (max)
- Session Recovery: Send previous
room_idanduser_idon reconnect - State Sync: Server sends current room state after reconnection
Best Practices
Client Implementation
- Always send AUDIO_START before binary audio
- Buffer audio before sending (minimum 100ms chunks)
- Include sequence numbers for ordering
- Handle ERROR messages gracefully
- Implement heartbeat (PING every 30 seconds)
- Reconnect automatically on disconnect
Server Implementation
- Validate all messages before processing
- Broadcast state changes to all room members
- Clean up resources on disconnect
- Log all errors with context
- Rate limit per connection and IP
Example Client Flow (JavaScript)
const ws = new WebSocket('ws://localhost:8000/ws');
// Connect
ws.onopen = () => {
// Join room
ws.send(JSON.stringify({
type: 'join_room',
payload: {
room_id: 'room123',
user_id: 'user1',
username: 'John',
source_lang: 'en',
target_lang: 'hi'
}
}));
};
// Handle messages
ws.onmessage = (event) => {
if (event.data instanceof Blob) {
// Binary audio data
handleAudio(event.data);
} else {
// JSON message
const msg = JSON.parse(event.data);
handleMessage(msg);
}
};
// Send audio
function sendAudio(audioBuffer) {
ws.send(audioBuffer);
}
Security Considerations
- Always use WSS (WebSocket Secure) in production
- Validate JWT tokens if authentication enabled
- Sanitize user inputs (usernames, room IDs)
- Implement rate limiting to prevent abuse
- Monitor connection count to prevent DoS
- Encrypt sensitive data in messages