YC-Chen's picture
add frontend files
a445583

TASTE Voice Bot Frontend

Browser-based frontend that connects directly to the orchestrator using binary protobuf over WebSocket.

Quick Start

# 1. Install dependencies
npm install

# 2. Generate protobuf files (finished)
# npm run proto:generate

# 3. Start server
npm run serve

# 4(a). Port-forward Setup (Optional)
# 4. Open browser
# Go to http://localhost:3000

Architecture

Browser Frontend  ←→  FastAPI Orchestrator (ws://localhost:8000/ws/orchestrator)
     ↓                         ↓
  AudioChunk              AgentResponse
  (protobuf)               (protobuf)

File Structure

frontend/
β”œβ”€β”€ index.html                 # Main UI
β”œβ”€β”€ css/
β”‚   └── styles.css            # UI styling
β”œβ”€β”€ js/
β”‚   β”œβ”€β”€ main.js               # App controller
β”‚   β”œβ”€β”€ orchestrator-client.js # WebSocket + protobuf client
β”‚   β”œβ”€β”€ audio-manager.js      # Audio recording/playback
β”‚   β”œβ”€β”€ audio-processor.js    # AudioWorklet processor (100ms chunks)
β”‚   └── config.js             # Configuration (chunk duration, sample rate)
β”œβ”€β”€ proto/
β”‚   β”œβ”€β”€ data_pb.js            # Generated protobuf (ES6)
β”‚   └── data_pb.d.ts          # TypeScript definitions
└── package.json              # Dependencies and scripts

How It Works

1. Connect to Orchestrator

// Connects to ws://localhost:8000/ws/orchestrator
await orchestratorClient.connect();

2. Send Audio (Microphone β†’ Orchestrator)

// AudioChunk protobuf: { sessionId, audioData }
const audioChunk = AudioChunk.create({
    sessionId: sessionId,
    audioData: new Uint8Array(pcm16Audio)
});
orchestratorClient.sendAudioChunk(audioChunk);

3. Receive Audio (Orchestrator β†’ Speaker)

// AgentResponse protobuf: { sessionId, tokenChunks, audioData }
const agentResponse = AgentResponse.decode(binaryData);
audioManager.playAudio(agentResponse.audioData);

Protobuf Messages

Send:

message AudioChunk {
  string session_id = 1;
  bytes audio_data = 2;  // PCM16, 16kHz, mono
}

Receive:

message AgentResponse {
  string session_id = 1;
  repeated TokenChunk token_chunks = 2;
  bytes audio_data = 3;  // PCM16, 16kHz, mono
}

Development

Generate Protobuf

npm run proto:generate

Generates ES6 modules from ../data/data.proto using protobufjs.

Run Tests

npm test

Build for Production

npm run build

Configuration

Audio Chunk Configuration

The frontend sends exactly 100ms audio chunks (1600 samples at 16kHz = 3200 bytes) to match the backend's VAD configuration.

This is configured in js/config.js:

chunkDurationMs: 100,  // 100ms chunks (matches backend config.py)
micSampleRate: 16000,  // 16kHz

Implementation: Uses AudioWorklet (js/audio-processor.js) to accumulate samples and send exact chunk sizes, overcoming the power-of-2 limitation of ScriptProcessorNode.

Orchestrator URL

Change orchestrator URL:

Edit js/orchestrator-client.js:

constructor(host = 'localhost', port = 8000)

Or in js/main.js:

this.orchestratorClient = new OrchestratorClient('localhost', 8000);

Troubleshooting

Connection fails

  • Check orchestrator is running: localhost:8000
  • Verify WebSocket endpoint: /ws/orchestrator
  • Check browser console for errors

No audio playback

  • Verify audioData is not empty in AgentResponse
  • Check sample rate is 16kHz
  • Allow microphone permissions

Protobuf errors

# Regenerate protobuf files
npm run proto:generate

# Check protobufjs is installed
npm list protobufjs

Tech Stack

  • Protobuf: protobufjs (ES6 modules, browser-native)
  • Audio: Web Audio API (16kHz PCM16)
  • WebSocket: Native WebSocket API (binary mode)
  • Module System: ES6 modules (no bundler needed for dev)

TODOs

βœ… Audio chunk length Mismatch (RESOLVED)

The audio chunk sent from frontend to orchestrator has length mismatch i.e. the duration of the audio chunk sent from frontend does not meet the expectations in orchestrator. There is a audiolength variable in ../service.py.

Resolution: Implemented AudioWorklet processor (js/audio-processor.js) that accumulates audio samples and sends exactly 100ms chunks (1600 samples at 16kHz = 3200 bytes), matching the backend's vad_config['chunk_duration_ms'] in config.py.

Audio Player Queue (done)

Reduce effect of audio chunk delay. The audio player fetches audio chunks from audio queue (FIFO) once the number of chunks in queue reaches the threshold.

Response Control for Multi-Turn Conversation

When user interrupts agent response (while agent is speaking), (in the future) a stop signal is received and the audio player must immediately stop playing audio, and also the queue must be cleaned up.

Text Response Visualization

Show the audio content in text format instead of length of token chunks.

Backend Connection

Maybe due to timeout and reconnection issue