Spaces:
Sleeping
TASTE Voice Bot Frontend
Browser-based frontend that connects directly to the orchestrator using binary protobuf over WebSocket.
Quick Start
# 1. Install dependencies
npm install
# 2. Generate protobuf files (finished)
# npm run proto:generate
# 3. Start server
npm run serve
# 4(a). Port-forward Setup (Optional)
# 4. Open browser
# Go to http://localhost:3000
Architecture
Browser Frontend ββ FastAPI Orchestrator (ws://localhost:8000/ws/orchestrator)
β β
AudioChunk AgentResponse
(protobuf) (protobuf)
File Structure
frontend/
βββ index.html # Main UI
βββ css/
β βββ styles.css # UI styling
βββ js/
β βββ main.js # App controller
β βββ orchestrator-client.js # WebSocket + protobuf client
β βββ audio-manager.js # Audio recording/playback
β βββ audio-processor.js # AudioWorklet processor (100ms chunks)
β βββ config.js # Configuration (chunk duration, sample rate)
βββ proto/
β βββ data_pb.js # Generated protobuf (ES6)
β βββ data_pb.d.ts # TypeScript definitions
βββ package.json # Dependencies and scripts
How It Works
1. Connect to Orchestrator
// Connects to ws://localhost:8000/ws/orchestrator
await orchestratorClient.connect();
2. Send Audio (Microphone β Orchestrator)
// AudioChunk protobuf: { sessionId, audioData }
const audioChunk = AudioChunk.create({
sessionId: sessionId,
audioData: new Uint8Array(pcm16Audio)
});
orchestratorClient.sendAudioChunk(audioChunk);
3. Receive Audio (Orchestrator β Speaker)
// AgentResponse protobuf: { sessionId, tokenChunks, audioData }
const agentResponse = AgentResponse.decode(binaryData);
audioManager.playAudio(agentResponse.audioData);
Protobuf Messages
Send:
message AudioChunk {
string session_id = 1;
bytes audio_data = 2; // PCM16, 16kHz, mono
}
Receive:
message AgentResponse {
string session_id = 1;
repeated TokenChunk token_chunks = 2;
bytes audio_data = 3; // PCM16, 16kHz, mono
}
Development
Generate Protobuf
npm run proto:generate
Generates ES6 modules from ../data/data.proto using protobufjs.
Run Tests
npm test
Build for Production
npm run build
Configuration
Audio Chunk Configuration
The frontend sends exactly 100ms audio chunks (1600 samples at 16kHz = 3200 bytes) to match the backend's VAD configuration.
This is configured in js/config.js:
chunkDurationMs: 100, // 100ms chunks (matches backend config.py)
micSampleRate: 16000, // 16kHz
Implementation: Uses AudioWorklet (js/audio-processor.js) to accumulate samples and send exact chunk sizes, overcoming the power-of-2 limitation of ScriptProcessorNode.
Orchestrator URL
Change orchestrator URL:
Edit js/orchestrator-client.js:
constructor(host = 'localhost', port = 8000)
Or in js/main.js:
this.orchestratorClient = new OrchestratorClient('localhost', 8000);
Troubleshooting
Connection fails
- Check orchestrator is running:
localhost:8000 - Verify WebSocket endpoint:
/ws/orchestrator - Check browser console for errors
No audio playback
- Verify
audioDatais not empty in AgentResponse - Check sample rate is 16kHz
- Allow microphone permissions
Protobuf errors
# Regenerate protobuf files
npm run proto:generate
# Check protobufjs is installed
npm list protobufjs
Tech Stack
- Protobuf: protobufjs (ES6 modules, browser-native)
- Audio: Web Audio API (16kHz PCM16)
- WebSocket: Native WebSocket API (binary mode)
- Module System: ES6 modules (no bundler needed for dev)
TODOs
β Audio chunk length Mismatch (RESOLVED)
The audio chunk sent from frontend to orchestrator has length mismatch i.e. the duration of the audio chunk sent from frontend does not meet the expectations in orchestrator. There is a audiolength variable in ../service.py.
Resolution: Implemented AudioWorklet processor (js/audio-processor.js) that accumulates audio samples and sends exactly 100ms chunks (1600 samples at 16kHz = 3200 bytes), matching the backend's vad_config['chunk_duration_ms'] in config.py.
Audio Player Queue (done)
Reduce effect of audio chunk delay. The audio player fetches audio chunks from audio queue (FIFO) once the number of chunks in queue reaches the threshold.
Response Control for Multi-Turn Conversation
When user interrupts agent response (while agent is speaking), (in the future) a stop signal is received and the audio player must immediately stop playing audio, and also the queue must be cleaned up.
Text Response Visualization
Show the audio content in text format instead of length of token chunks.
Backend Connection
Maybe due to timeout and reconnection issue