File size: 2,505 Bytes

e1cc3bc

# Transcript Server

![Screenshot](screenshot.png)

An MCP App Server for live speech transcription using the Web Speech API.

## Features

- **Live Transcription**: Real-time speech-to-text using browser's Web Speech API
- **Transitional Model Context**: Streams interim transcriptions to the model via `ui/update-model-context`, allowing the model to see what the user is saying as they speak
- **Audio Level Indicator**: Visual feedback showing microphone input levels
- **Send to Host**: Button to send completed transcriptions as a `ui/message` to the MCP host
- **Start/Stop Control**: Toggle listening on and off
- **Clear Transcript**: Reset the transcript area

## Setup

### Prerequisites

- Node.js 18+
- Chrome, Edge, or Safari (Web Speech API support)

### Installation

```bash
npm install
```

### Running

```bash
# Development mode (with hot reload)
npm run dev

# Production build and serve
npm run start
```

## Usage

The server exposes a single tool:

### `transcribe`

Opens a live speech transcription interface.

**Parameters:** None

**Example:**

```json
{
  "name": "transcribe",
  "arguments": {}
}
```

## How It Works

1. Click **Start** to begin listening
2. Speak into your microphone
3. Watch your speech appear as text in real-time (interim text is streamed to model context via `ui/update-model-context`)
4. Click **Send** to send the transcript as a `ui/message` to the host (clears the model context)
5. Click **Clear** to reset the transcript

## Architecture

```
transcript-server/
├── server.ts          # MCP server with transcribe tool
├── server-utils.ts    # HTTP transport utilities
├── mcp-app.html       # Transcript UI entry point
├── src/
│   ├── mcp-app.ts     # App logic, Web Speech API integration
│   ├── mcp-app.css    # Transcript UI styles
│   └── global.css     # Base styles
└── dist/              # Built output (single HTML file)
```

## Notes

- **Microphone Permission**: Requires `allow="microphone"` on the sandbox iframe (configured via `permissions: { microphone: {} }` in the resource `_meta.ui`)
- **Browser Support**: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
- **Continuous Mode**: Recognition automatically restarts when it ends, for seamless transcription

## Future Enhancements

- Language selection dropdown
- Whisper-based offline transcription (see TRANSCRIPTION.md)
- Export transcript to file
- Timestamps toggle