File size: 2,505 Bytes
e1cc3bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# Transcript Server

![Screenshot](screenshot.png)

An MCP App Server for live speech transcription using the Web Speech API.

## Features

- **Live Transcription**: Real-time speech-to-text using browser's Web Speech API
- **Transitional Model Context**: Streams interim transcriptions to the model via `ui/update-model-context`, allowing the model to see what the user is saying as they speak
- **Audio Level Indicator**: Visual feedback showing microphone input levels
- **Send to Host**: Button to send completed transcriptions as a `ui/message` to the MCP host
- **Start/Stop Control**: Toggle listening on and off
- **Clear Transcript**: Reset the transcript area

## Setup

### Prerequisites

- Node.js 18+
- Chrome, Edge, or Safari (Web Speech API support)

### Installation

```bash
npm install
```

### Running

```bash
# Development mode (with hot reload)
npm run dev

# Production build and serve
npm run start
```

## Usage

The server exposes a single tool:

### `transcribe`

Opens a live speech transcription interface.

**Parameters:** None

**Example:**

```json
{
  "name": "transcribe",
  "arguments": {}
}
```

## How It Works

1. Click **Start** to begin listening
2. Speak into your microphone
3. Watch your speech appear as text in real-time (interim text is streamed to model context via `ui/update-model-context`)
4. Click **Send** to send the transcript as a `ui/message` to the host (clears the model context)
5. Click **Clear** to reset the transcript

## Architecture

```
transcript-server/
β”œβ”€β”€ server.ts          # MCP server with transcribe tool
β”œβ”€β”€ server-utils.ts    # HTTP transport utilities
β”œβ”€β”€ mcp-app.html       # Transcript UI entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ mcp-app.ts     # App logic, Web Speech API integration
β”‚   β”œβ”€β”€ mcp-app.css    # Transcript UI styles
β”‚   └── global.css     # Base styles
└── dist/              # Built output (single HTML file)
```

## Notes

- **Microphone Permission**: Requires `allow="microphone"` on the sandbox iframe (configured via `permissions: { microphone: {} }` in the resource `_meta.ui`)
- **Browser Support**: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
- **Continuous Mode**: Recognition automatically restarts when it ends, for seamless transcription

## Future Enhancements

- Language selection dropdown
- Whisper-based offline transcription (see TRANSCRIPTION.md)
- Export transcript to file
- Timestamps toggle