File size: 2,505 Bytes
e1cc3bc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | # Transcript Server

An MCP App Server for live speech transcription using the Web Speech API.
## Features
- **Live Transcription**: Real-time speech-to-text using browser's Web Speech API
- **Transitional Model Context**: Streams interim transcriptions to the model via `ui/update-model-context`, allowing the model to see what the user is saying as they speak
- **Audio Level Indicator**: Visual feedback showing microphone input levels
- **Send to Host**: Button to send completed transcriptions as a `ui/message` to the MCP host
- **Start/Stop Control**: Toggle listening on and off
- **Clear Transcript**: Reset the transcript area
## Setup
### Prerequisites
- Node.js 18+
- Chrome, Edge, or Safari (Web Speech API support)
### Installation
```bash
npm install
```
### Running
```bash
# Development mode (with hot reload)
npm run dev
# Production build and serve
npm run start
```
## Usage
The server exposes a single tool:
### `transcribe`
Opens a live speech transcription interface.
**Parameters:** None
**Example:**
```json
{
"name": "transcribe",
"arguments": {}
}
```
## How It Works
1. Click **Start** to begin listening
2. Speak into your microphone
3. Watch your speech appear as text in real-time (interim text is streamed to model context via `ui/update-model-context`)
4. Click **Send** to send the transcript as a `ui/message` to the host (clears the model context)
5. Click **Clear** to reset the transcript
## Architecture
```
transcript-server/
βββ server.ts # MCP server with transcribe tool
βββ server-utils.ts # HTTP transport utilities
βββ mcp-app.html # Transcript UI entry point
βββ src/
β βββ mcp-app.ts # App logic, Web Speech API integration
β βββ mcp-app.css # Transcript UI styles
β βββ global.css # Base styles
βββ dist/ # Built output (single HTML file)
```
## Notes
- **Microphone Permission**: Requires `allow="microphone"` on the sandbox iframe (configured via `permissions: { microphone: {} }` in the resource `_meta.ui`)
- **Browser Support**: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
- **Continuous Mode**: Recognition automatically restarts when it ends, for seamless transcription
## Future Enhancements
- Language selection dropdown
- Whisper-based offline transcription (see TRANSCRIPTION.md)
- Export transcript to file
- Timestamps toggle
|