AbdulElahGwaith's picture
Upload folder using huggingface_hub
e1cc3bc verified

Transcript Server

Screenshot

An MCP App Server for live speech transcription using the Web Speech API.

Features

  • Live Transcription: Real-time speech-to-text using browser's Web Speech API
  • Transitional Model Context: Streams interim transcriptions to the model via ui/update-model-context, allowing the model to see what the user is saying as they speak
  • Audio Level Indicator: Visual feedback showing microphone input levels
  • Send to Host: Button to send completed transcriptions as a ui/message to the MCP host
  • Start/Stop Control: Toggle listening on and off
  • Clear Transcript: Reset the transcript area

Setup

Prerequisites

  • Node.js 18+
  • Chrome, Edge, or Safari (Web Speech API support)

Installation

npm install

Running

# Development mode (with hot reload)
npm run dev

# Production build and serve
npm run start

Usage

The server exposes a single tool:

transcribe

Opens a live speech transcription interface.

Parameters: None

Example:

{
  "name": "transcribe",
  "arguments": {}
}

How It Works

  1. Click Start to begin listening
  2. Speak into your microphone
  3. Watch your speech appear as text in real-time (interim text is streamed to model context via ui/update-model-context)
  4. Click Send to send the transcript as a ui/message to the host (clears the model context)
  5. Click Clear to reset the transcript

Architecture

transcript-server/
β”œβ”€β”€ server.ts          # MCP server with transcribe tool
β”œβ”€β”€ server-utils.ts    # HTTP transport utilities
β”œβ”€β”€ mcp-app.html       # Transcript UI entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ mcp-app.ts     # App logic, Web Speech API integration
β”‚   β”œβ”€β”€ mcp-app.css    # Transcript UI styles
β”‚   └── global.css     # Base styles
└── dist/              # Built output (single HTML file)

Notes

  • Microphone Permission: Requires allow="microphone" on the sandbox iframe (configured via permissions: { microphone: {} } in the resource _meta.ui)
  • Browser Support: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
  • Continuous Mode: Recognition automatically restarts when it ends, for seamless transcription

Future Enhancements

  • Language selection dropdown
  • Whisper-based offline transcription (see TRANSCRIPTION.md)
  • Export transcript to file
  • Timestamps toggle