# WebSocket Protocol

## Overview

The Voice-to-Voice Translator uses a WebSocket-based protocol for real-time bidirectional communication between clients and the server. Messages are exchanged in JSON format (text) and raw audio data (binary).

## Connection Endpoint

```
ws://host:port/ws
```

### Connection Parameters

Query parameters (optional):
- `token`: JWT authentication token (if auth enabled)
- `client_id`: Unique client identifier

Example:
```
ws://localhost:8000/ws?token=eyJhbGc...&client_id=client123
```

## Message Types

All text messages follow this structure:

```json
{
  "type": "message_type",
  "payload": { ... },
  "timestamp": "2025-12-17T10:30:00Z",
  "message_id": "uuid-v4"
}
```

### Client → Server Messages

#### 1. JOIN_ROOM

Join or create a translation room.

```json
{
  "type": "join_room",
  "payload": {
    "room_id": "room123",
    "user_id": "user1",
    "username": "John Doe",
    "source_lang": "en",
    "target_lang": "hi"
  }
}
```

**Fields**:
- `room_id`: Unique room identifier
- `user_id`: Unique user identifier
- `username`: Display name
- `source_lang`: User's speaking language (ISO 639-1 code)
- `target_lang`: Desired translation language

**Response**: `ROOM_JOINED` or `ERROR`

#### 2. LEAVE_ROOM

Leave current room.

```json
{
  "type": "leave_room",
  "payload": {
    "room_id": "room123",
    "user_id": "user1"
  }
}
```

**Response**: `ROOM_LEFT` or `ERROR`

#### 3. AUDIO_START

Signal start of audio stream.

```json
{
  "type": "audio_start",
  "payload": {
    "room_id": "room123",
    "user_id": "user1",
    "audio_config": {
      "sample_rate": 16000,
      "channels": 1,
      "format": "PCM16",
      "chunk_size": 4096
    }
  }
}
```

**Response**: `AUDIO_START_ACK`

#### 4. AUDIO_STOP

Signal end of audio stream.

```json
{
  "type": "audio_stop",
  "payload": {
    "room_id": "room123",
    "user_id": "user1"
  }
}
```

**Response**: `AUDIO_STOP_ACK`

#### 5. TEXT_MESSAGE

Send text message (for chat or corrections).

```json
{
  "type": "text_message",
  "payload": {
    "room_id": "room123",
    "user_id": "user1",
    "text": "Hello, how are you?",
    "lang": "en"
  }
}
```

**Response**: `TEXT_MESSAGE` (broadcast to room)

#### 6. PING

Heartbeat message.

```json
{
  "type": "ping",
  "payload": {}
}
```

**Response**: `PONG`

#### 7. GET_ROOM_INFO

Request current room state.

```json
{
  "type": "get_room_info",
  "payload": {
    "room_id": "room123"
  }
}
```

**Response**: `ROOM_INFO`

### Server → Client Messages

#### 1. ROOM_JOINED

Confirmation of room join.

```json
{
  "type": "room_joined",
  "payload": {
    "room_id": "room123",
    "user_id": "user1",
    "users": [
      {
        "user_id": "user1",
        "username": "John Doe",
        "source_lang": "en",
        "target_lang": "hi"
      },
      {
        "user_id": "user2",
        "username": "Jane Smith",
        "source_lang": "hi",
        "target_lang": "en"
      }
    ]
  }
}
```

#### 2. ROOM_LEFT

Confirmation of room leave.

```json
{
  "type": "room_left",
  "payload": {
    "room_id": "room123",
    "user_id": "user1"
  }
}
```

#### 3. USER_JOINED

Broadcast when another user joins.

```json
{
  "type": "user_joined",
  "payload": {
    "room_id": "room123",
    "user": {
      "user_id": "user2",
      "username": "Jane Smith",
      "source_lang": "hi",
      "target_lang": "en"
    }
  }
}
```

#### 4. USER_LEFT

Broadcast when another user leaves.

```json
{
  "type": "user_left",
  "payload": {
    "room_id": "room123",
    "user_id": "user2",
    "username": "Jane Smith"
  }
}
```

#### 5. TRANSCRIPTION

Intermediate transcription result (from STT).

```json
{
  "type": "transcription",
  "payload": {
    "room_id": "room123",
    "user_id": "user1",
    "text": "Hello how are you",
    "lang": "en",
    "is_final": false,
    "confidence": 0.85
  }
}
```

#### 6. TRANSLATION

Translation result.

```json
{
  "type": "translation",
  "payload": {
    "room_id": "room123",
    "source_user_id": "user1",
    "target_user_id": "user2",
    "original_text": "Hello, how are you?",
    "translated_text": "नमस्ते, आप कैसे हैं?",
    "source_lang": "en",
    "target_lang": "hi",
    "confidence": 0.92
  }
}
```

#### 7. AUDIO_START_ACK

Acknowledgment of audio start.

```json
{
  "type": "audio_start_ack",
  "payload": {
    "room_id": "room123",
    "user_id": "user1",
    "ready": true
  }
}
```

#### 8. AUDIO_STOP_ACK

Acknowledgment of audio stop.

```json
{
  "type": "audio_stop_ack",
  "payload": {
    "room_id": "room123",
    "user_id": "user1"
  }
}
```

#### 9. PONG

Heartbeat response.

```json
{
  "type": "pong",
  "payload": {
    "timestamp": "2025-12-17T10:30:00Z"
  }
}
```

#### 10. ROOM_INFO

Room state information.

```json
{
  "type": "room_info",
  "payload": {
    "room_id": "room123",
    "created_at": "2025-12-17T10:00:00Z",
    "users": [ ... ],
    "active_speakers": ["user1"],
    "supported_languages": ["en", "hi"]
  }
}
```

#### 11. ERROR

Error notification.

```json
{
  "type": "error",
  "payload": {
    "code": "INVALID_ROOM",
    "message": "Room does not exist",
    "details": "Room 'room123' not found",
    "recoverable": true
  }
}
```

**Error Codes**:
- `INVALID_ROOM`: Room not found
- `ROOM_FULL`: Maximum users reached
- `INVALID_MESSAGE`: Malformed message
- `AUTH_FAILED`: Authentication failed
- `RATE_LIMIT`: Too many requests
- `INTERNAL_ERROR`: Server error
- `UNSUPPORTED_LANGUAGE`: Language not available
- `AUDIO_ERROR`: Audio processing error

## Binary Audio Messages

Audio data is sent as binary WebSocket frames.

### Client → Server (Audio Input)

Binary message structure:
```
[Header (16 bytes)][Audio Data (variable)]
```

**Header Format**:
- Bytes 0-7: User ID (UTF-8, padded)
- Bytes 8-11: Sequence number (uint32, big-endian)
- Bytes 12-15: Timestamp (uint32, milliseconds)

**Audio Data**:
- Format: PCM16 (16-bit signed integer)
- Sample Rate: 16000 Hz (configurable)
- Channels: 1 (mono)
- Byte Order: Little-endian

### Server → Client (Translated Audio)

Binary message structure:
```
[Header (24 bytes)][Audio Data (variable)]
```

**Header Format**:
- Bytes 0-7: Source User ID (UTF-8, padded)
- Bytes 8-15: Target User ID (UTF-8, padded)
- Bytes 16-19: Sequence number (uint32, big-endian)
- Bytes 20-23: Timestamp (uint32, milliseconds)

**Audio Data**:
- Same format as input

## Connection Lifecycle

### 1. Connection Establishment

```
Client                          Server
  │                               │
  ├─────── WebSocket Connect ────►│
  │                               │
  │◄────── Connection Open ───────┤
  │                               │
```

### 2. Room Join

```
Client                          Server
  │                               │
  ├────────── JOIN_ROOM ─────────►│
  │                               │
  │◄───────── ROOM_JOINED ────────┤
  │                               │
  │◄───────── USER_JOINED ────────┤ (broadcast to others)
  │                               │
```

### 3. Audio Streaming

```
Client                          Server                     Other Client
  │                               │                              │
  ├───── AUDIO_START ────────────►│                              │
  │                               │                              │
  │◄──── AUDIO_START_ACK ─────────┤                              │
  │                               │                              │
  ├─── Binary Audio Chunk 1 ─────►│                              │
  ├─── Binary Audio Chunk 2 ─────►│                              │
  │                               │                              │
  │◄─────── TRANSCRIPTION ────────┤                              │
  │                               │                              │
  │◄─────── TRANSLATION ──────────┤                              │
  │                               │                              │
  │                               ├───► Binary Audio Chunk ─────►│
  │                               │                              │
  ├───── AUDIO_STOP ─────────────►│                              │
  │                               │                              │
  │◄──── AUDIO_STOP_ACK ──────────┤                              │
  │                               │                              │
```

### 4. Disconnection

```
Client                          Server
  │                               │
  ├────────── LEAVE_ROOM ────────►│
  │                               │
  │◄───────── ROOM_LEFT ──────────┤
  │                               │
  ├─────── Close Connection ─────►│
  │                               │
  │◄─────── Close Confirm ────────┤
  │                               │
```

## Rate Limiting

Default limits:
- Join room: 10 requests per minute
- Audio streaming: Unlimited (quality-based throttling)
- Text messages: 30 per minute
- Room info requests: 60 per minute

## Reconnection Strategy

1. **Exponential Backoff**: 1s, 2s, 4s, 8s, 16s, 30s (max)
2. **Session Recovery**: Send previous `room_id` and `user_id` on reconnect
3. **State Sync**: Server sends current room state after reconnection

## Best Practices

### Client Implementation

1. **Always send AUDIO_START before binary audio**
2. **Buffer audio before sending** (minimum 100ms chunks)
3. **Include sequence numbers** for ordering
4. **Handle ERROR messages** gracefully
5. **Implement heartbeat** (PING every 30 seconds)
6. **Reconnect automatically** on disconnect

### Server Implementation

1. **Validate all messages** before processing
2. **Broadcast state changes** to all room members
3. **Clean up resources** on disconnect
4. **Log all errors** with context
5. **Rate limit** per connection and IP

## Example Client Flow (JavaScript)

```javascript
const ws = new WebSocket('ws://localhost:8000/ws');

// Connect
ws.onopen = () => {
  // Join room
  ws.send(JSON.stringify({
    type: 'join_room',
    payload: {
      room_id: 'room123',
      user_id: 'user1',
      username: 'John',
      source_lang: 'en',
      target_lang: 'hi'
    }
  }));
};

// Handle messages
ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Binary audio data
    handleAudio(event.data);
  } else {
    // JSON message
    const msg = JSON.parse(event.data);
    handleMessage(msg);
  }
};

// Send audio
function sendAudio(audioBuffer) {
  ws.send(audioBuffer);
}
```

## Security Considerations

1. **Always use WSS (WebSocket Secure)** in production
2. **Validate JWT tokens** if authentication enabled
3. **Sanitize user inputs** (usernames, room IDs)
4. **Implement rate limiting** to prevent abuse
5. **Monitor connection count** to prevent DoS
6. **Encrypt sensitive data** in messages