AbdulElahGwaith's picture
Upload folder using huggingface_hub
e1cc3bc verified
# Transcript Server
![Screenshot](screenshot.png)
An MCP App Server for live speech transcription using the Web Speech API.
## Features
- **Live Transcription**: Real-time speech-to-text using browser's Web Speech API
- **Transitional Model Context**: Streams interim transcriptions to the model via `ui/update-model-context`, allowing the model to see what the user is saying as they speak
- **Audio Level Indicator**: Visual feedback showing microphone input levels
- **Send to Host**: Button to send completed transcriptions as a `ui/message` to the MCP host
- **Start/Stop Control**: Toggle listening on and off
- **Clear Transcript**: Reset the transcript area
## Setup
### Prerequisites
- Node.js 18+
- Chrome, Edge, or Safari (Web Speech API support)
### Installation
```bash
npm install
```
### Running
```bash
# Development mode (with hot reload)
npm run dev
# Production build and serve
npm run start
```
## Usage
The server exposes a single tool:
### `transcribe`
Opens a live speech transcription interface.
**Parameters:** None
**Example:**
```json
{
"name": "transcribe",
"arguments": {}
}
```
## How It Works
1. Click **Start** to begin listening
2. Speak into your microphone
3. Watch your speech appear as text in real-time (interim text is streamed to model context via `ui/update-model-context`)
4. Click **Send** to send the transcript as a `ui/message` to the host (clears the model context)
5. Click **Clear** to reset the transcript
## Architecture
```
transcript-server/
β”œβ”€β”€ server.ts # MCP server with transcribe tool
β”œβ”€β”€ server-utils.ts # HTTP transport utilities
β”œβ”€β”€ mcp-app.html # Transcript UI entry point
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ mcp-app.ts # App logic, Web Speech API integration
β”‚ β”œβ”€β”€ mcp-app.css # Transcript UI styles
β”‚ └── global.css # Base styles
└── dist/ # Built output (single HTML file)
```
## Notes
- **Microphone Permission**: Requires `allow="microphone"` on the sandbox iframe (configured via `permissions: { microphone: {} }` in the resource `_meta.ui`)
- **Browser Support**: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
- **Continuous Mode**: Recognition automatically restarts when it ends, for seamless transcription
## Future Enhancements
- Language selection dropdown
- Whisper-based offline transcription (see TRANSCRIPTION.md)
- Export transcript to file
- Timestamps toggle