Transcript Server
An MCP App Server for live speech transcription using the Web Speech API.
Features
- Live Transcription: Real-time speech-to-text using browser's Web Speech API
- Transitional Model Context: Streams interim transcriptions to the model via
ui/update-model-context, allowing the model to see what the user is saying as they speak - Audio Level Indicator: Visual feedback showing microphone input levels
- Send to Host: Button to send completed transcriptions as a
ui/messageto the MCP host - Start/Stop Control: Toggle listening on and off
- Clear Transcript: Reset the transcript area
Setup
Prerequisites
- Node.js 18+
- Chrome, Edge, or Safari (Web Speech API support)
Installation
npm install
Running
# Development mode (with hot reload)
npm run dev
# Production build and serve
npm run start
Usage
The server exposes a single tool:
transcribe
Opens a live speech transcription interface.
Parameters: None
Example:
{
"name": "transcribe",
"arguments": {}
}
How It Works
- Click Start to begin listening
- Speak into your microphone
- Watch your speech appear as text in real-time (interim text is streamed to model context via
ui/update-model-context) - Click Send to send the transcript as a
ui/messageto the host (clears the model context) - Click Clear to reset the transcript
Architecture
transcript-server/
βββ server.ts # MCP server with transcribe tool
βββ server-utils.ts # HTTP transport utilities
βββ mcp-app.html # Transcript UI entry point
βββ src/
β βββ mcp-app.ts # App logic, Web Speech API integration
β βββ mcp-app.css # Transcript UI styles
β βββ global.css # Base styles
βββ dist/ # Built output (single HTML file)
Notes
- Microphone Permission: Requires
allow="microphone"on the sandbox iframe (configured viapermissions: { microphone: {} }in the resource_meta.ui) - Browser Support: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
- Continuous Mode: Recognition automatically restarts when it ends, for seamless transcription
Future Enhancements
- Language selection dropdown
- Whisper-based offline transcription (see TRANSCRIPTION.md)
- Export transcript to file
- Timestamps toggle
