parakeet-v3-streaming

Running

File size: 2,846 Bytes

b830719

# Quick Start Guide

## Running the Demo Locally

1. **Install dependencies** (already done):
   ```bash
   cd parakeet-web-demo
   npm install
   ```

2. **Start development server**:
   ```bash
   npm run dev
   ```

3. **Open browser**:
   - Navigate to: http://localhost:3000
   - Use a WebGPU-compatible browser (Chrome 113+ or Edge 113+)

4. **Use the demo**:
   - Click "Load Model" (downloads ~2GB ONNX model, one-time only)
   - Wait for model to load (30s-2min depending on connection)
   - Click "Start Recording" and grant microphone permissions
   - Speak and watch real-time progressive transcriptions!
   - Click "Stop Recording" when done

## What You'll See

### Color-Coded Transcription
- **Yellow text**: Fixed sentences (completed, locked, won't change)
- **Cyan text**: Active transcription (in-progress, updating in real-time)

### Performance Metrics
- **Latency**: Time to process audio chunk
- **RTF (Real-time Factor)**: Processing speed vs audio duration
  - <1.0 = faster than real-time ✓
  - >1.0 = slower than real-time ⚠️
- **Window State**:
  - "growing" (0-15s): Accumulating audio for accuracy
  - "sliding" (>15s): Smart sentence-aware windowing

## Browser Requirements

### ✅ Full Support (WebGPU)
- Chrome 113+
- Edge 113+

### ⚠️ CPU Fallback
- Firefox (no WebGPU yet)
- Safari (limited support)

Check your browser: https://caniuse.com/webgpu

## Troubleshooting

### Model won't load
- Check internet connection (2GB download)
- Try refreshing the page
- Check browser console for errors

### No microphone access
- Grant microphone permissions when prompted
- Check browser settings (Settings → Privacy → Microphone)

### Slow performance
- Use Chrome or Edge with WebGPU support
- Close other tabs to free memory
- Check performance metrics - RTF should be <1.0

### "Failed to start recording"
- Ensure microphone is connected
- Try using headphones with built-in mic
- Check if another app is using the microphone

## Building for Production

```bash
npm run build
npm run preview
```

The build output will be in `dist/` folder.

## Next Steps

- Read the full [README.md](README.md) for technical details
- Check the implementation plan: [../../../.claude/plans/validated-hugging-book.md](../../../.claude/plans/validated-hugging-book.md)
- Compare with Python implementation: [../STT/smart_progressive_streaming.py](../STT/smart_progressive_streaming.py)

## Key Files

- `src/App.jsx` - Main application component
- `src/worker.js` - Web Worker for model inference
- `src/utils/progressive-streaming.js` - Smart streaming algorithm (ported from Python)
- `src/utils/audio.js` - Microphone capture and audio processing
- `src/components/TranscriptionDisplay.jsx` - Live transcription UI
- `src/components/PerformanceMetrics.jsx` - Developer metrics dashboard

Enjoy the demo! 🎤