AbdulElahGwaith
/

ext-appss

Model card Files Files and versions

ext-appss / examples /transcript-server /README.md

AbdulElahGwaith's picture

AbdulElahGwaith

Upload folder using huggingface_hub

e1cc3bc verified about 1 month ago

|

history blame contribute delete

2.51 kB

	# Transcript Server

	![Screenshot](screenshot.png)

	An MCP App Server for live speech transcription using the Web Speech API.

	## Features

	- Live Transcription: Real-time speech-to-text using browser's Web Speech API
	- Transitional Model Context: Streams interim transcriptions to the model via `ui/update-model-context`, allowing the model to see what the user is saying as they speak
	- Audio Level Indicator: Visual feedback showing microphone input levels
	- Send to Host: Button to send completed transcriptions as a `ui/message` to the MCP host
	- Start/Stop Control: Toggle listening on and off
	- Clear Transcript: Reset the transcript area

	## Setup

	### Prerequisites

	- Node.js 18+
	- Chrome, Edge, or Safari (Web Speech API support)

	### Installation

	```bash
	npm install
	```

	### Running

	```bash
	# Development mode (with hot reload)
	npm run dev

	# Production build and serve
	npm run start
	```

	## Usage

	The server exposes a single tool:

	### `transcribe`

	Opens a live speech transcription interface.

	Parameters: None

	Example:

	```json
	{
	"name": "transcribe",
	"arguments": {}
	}
	```

	## How It Works

	1. Click Start to begin listening
	2. Speak into your microphone
	3. Watch your speech appear as text in real-time (interim text is streamed to model context via `ui/update-model-context`)
	4. Click Send to send the transcript as a `ui/message` to the host (clears the model context)
	5. Click Clear to reset the transcript

	## Architecture

	```
	transcript-server/
	├── server.ts # MCP server with transcribe tool
	├── server-utils.ts # HTTP transport utilities
	├── mcp-app.html # Transcript UI entry point
	├── src/
	│ ├── mcp-app.ts # App logic, Web Speech API integration
	│ ├── mcp-app.css # Transcript UI styles
	│ └── global.css # Base styles
	└── dist/ # Built output (single HTML file)
	```

	## Notes

	- Microphone Permission: Requires `allow="microphone"` on the sandbox iframe (configured via `permissions: { microphone: {} }` in the resource `_meta.ui`)
	- Browser Support: Web Speech API is well-supported in Chrome/Edge, with Safari support. Firefox has limited support.
	- Continuous Mode: Recognition automatically restarts when it ends, for seamless transcription

	## Future Enhancements

	- Language selection dropdown
	- Whisper-based offline transcription (see TRANSCRIPTION.md)
	- Export transcript to file
	- Timestamps toggle