andito HF Staff Claude Sonnet 4.5 commited on
Commit
b830719
·
0 Parent(s):

Add Parakeet progressive streaming demo with source code

Browse files

- React + Vite frontend with TypeScript
- Smart progressive streaming implementation
- WebGPU-accelerated inference via parakeet.js
- Real-time transcription with sentence-aware windowing
- Performance metrics and developer tools
- Built dist/ included for immediate deployment
- WASM files tracked with Git LFS

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

.eslintrc.cjs ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ module.exports = {
2
+ root: true,
3
+ env: { browser: true, es2020: true },
4
+ extends: [
5
+ 'eslint:recommended',
6
+ 'plugin:react/recommended',
7
+ 'plugin:react/jsx-runtime',
8
+ 'plugin:react-hooks/recommended',
9
+ ],
10
+ ignorePatterns: ['dist', '.eslintrc.cjs'],
11
+ parserOptions: { ecmaVersion: 'latest', sourceType: 'module' },
12
+ settings: { react: { version: '18.2' } },
13
+ plugins: ['react-refresh'],
14
+ rules: {
15
+ 'react-refresh/only-export-components': [
16
+ 'warn',
17
+ { allowConstantExport: true },
18
+ ],
19
+ 'react/prop-types': 'off',
20
+ },
21
+ }
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ *.wasm filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dependencies
2
+ node_modules/
3
+
4
+ # Build outputs (we'll commit dist/ for Hugging Face)
5
+ # dist/
6
+
7
+ # Logs
8
+ *.log
9
+ npm-debug.log*
10
+
11
+ # Editor
12
+ .DS_Store
13
+ .vscode/
14
+ *.swp
15
+ *.swo
16
+
17
+ # Backups
18
+ *.bak*
19
+
20
+ # Environment
21
+ .env
22
+ .env.local
DEPLOY.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide for Hugging Face Spaces
2
+
3
+ ## Quick Deployment Steps
4
+
5
+ ### Option 1: Using Hugging Face Web Interface (Recommended)
6
+
7
+ 1. **Create a new Space**:
8
+ - Go to https://huggingface.co/new-space
9
+ - Name: `parakeet-progressive-streaming` (or your preferred name)
10
+ - License: `mit`
11
+ - SDK: `static`
12
+ - Click "Create Space"
13
+
14
+ 2. **Upload files**:
15
+ - Upload the entire `dist/` folder contents to your Space
16
+ - Upload `README.md` to the root (this will be displayed on the Space page)
17
+
18
+ The file structure should look like:
19
+ ```
20
+ your-space/
21
+ ├── README.md
22
+ ├── index.html
23
+ └── assets/
24
+ ├── *.js
25
+ ├── *.css
26
+ └── *.wasm
27
+ ```
28
+
29
+ 3. **Done!** Your Space will be live at `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME`
30
+
31
+ ### Option 2: Using Git (Advanced)
32
+
33
+ 1. **Initialize git repository** (if not already done):
34
+ ```bash
35
+ cd parakeet-web-demo
36
+ git init
37
+ git add dist/ README.md
38
+ git commit -m "Initial commit: Parakeet progressive streaming demo"
39
+ ```
40
+
41
+ 2. **Add Hugging Face remote**:
42
+ ```bash
43
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
44
+ ```
45
+
46
+ 3. **Push to Hugging Face**:
47
+ ```bash
48
+ git push hf main
49
+ ```
50
+
51
+ ## What Gets Deployed
52
+
53
+ The `dist/` folder contains:
54
+ - `index.html` - Main HTML entry point
55
+ - `assets/*.js` - JavaScript bundles (React app, worker, libraries)
56
+ - `assets/*.css` - Stylesheets
57
+ - `assets/*.wasm` - ONNX Runtime WebAssembly files
58
+
59
+ Total size: ~47MB (mostly WASM files for ONNX Runtime)
60
+
61
+ ## Post-Deployment
62
+
63
+ After deployment, your Space will:
64
+ 1. Load immediately (static site)
65
+ 2. Download the Parakeet model (~2.5GB) on first use
66
+ 3. Cache the model in the browser's IndexedDB for subsequent visits
67
+
68
+ ## Updating the Space
69
+
70
+ To update after making changes:
71
+
72
+ 1. **Rebuild**:
73
+ ```bash
74
+ npm run build
75
+ ```
76
+
77
+ 2. **Upload new dist/ contents** via the web interface, or:
78
+ ```bash
79
+ git add dist/
80
+ git commit -m "Update: description of changes"
81
+ git push hf main
82
+ ```
83
+
84
+ ## Testing Locally Before Deployment
85
+
86
+ ```bash
87
+ npm run preview
88
+ ```
89
+
90
+ This will serve the production build locally at http://localhost:4173
91
+
92
+ ## Troubleshooting
93
+
94
+ ### Space shows blank page
95
+ - Check browser console for errors
96
+ - Verify all files in `dist/` were uploaded
97
+ - Ensure `index.html` is in the root directory
98
+
99
+ ### Model fails to load
100
+ - Check that the Space has WebGPU enabled (Chrome 113+, Edge 113+)
101
+ - Verify CORS headers are set correctly (Hugging Face handles this automatically)
102
+ - Check browser console for specific error messages
103
+
104
+ ### Performance is slow
105
+ - This is expected - see README.md for performance notes
106
+ - Ensure WebGPU is available (check console logs)
107
+ - Try on a different browser (Chrome/Edge recommended)
108
+
109
+ ## Browser Requirements
110
+
111
+ Recommend users use:
112
+ - **Chrome 113+** or **Edge 113+** for full WebGPU support
113
+ - Modern desktop/laptop (mobile may be very slow)
114
+ - Good internet connection for initial model download
115
+
116
+ ## Privacy & Security
117
+
118
+ The demo:
119
+ - ✅ Runs entirely client-side (no server processing)
120
+ - ✅ No data sent to any server
121
+ - ✅ Model cached locally in browser
122
+ - ✅ Microphone access required (browser prompt)
QUICKSTART.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Start Guide
2
+
3
+ ## Running the Demo Locally
4
+
5
+ 1. **Install dependencies** (already done):
6
+ ```bash
7
+ cd parakeet-web-demo
8
+ npm install
9
+ ```
10
+
11
+ 2. **Start development server**:
12
+ ```bash
13
+ npm run dev
14
+ ```
15
+
16
+ 3. **Open browser**:
17
+ - Navigate to: http://localhost:3000
18
+ - Use a WebGPU-compatible browser (Chrome 113+ or Edge 113+)
19
+
20
+ 4. **Use the demo**:
21
+ - Click "Load Model" (downloads ~2GB ONNX model, one-time only)
22
+ - Wait for model to load (30s-2min depending on connection)
23
+ - Click "Start Recording" and grant microphone permissions
24
+ - Speak and watch real-time progressive transcriptions!
25
+ - Click "Stop Recording" when done
26
+
27
+ ## What You'll See
28
+
29
+ ### Color-Coded Transcription
30
+ - **Yellow text**: Fixed sentences (completed, locked, won't change)
31
+ - **Cyan text**: Active transcription (in-progress, updating in real-time)
32
+
33
+ ### Performance Metrics
34
+ - **Latency**: Time to process audio chunk
35
+ - **RTF (Real-time Factor)**: Processing speed vs audio duration
36
+ - <1.0 = faster than real-time ✓
37
+ - >1.0 = slower than real-time ⚠️
38
+ - **Window State**:
39
+ - "growing" (0-15s): Accumulating audio for accuracy
40
+ - "sliding" (>15s): Smart sentence-aware windowing
41
+
42
+ ## Browser Requirements
43
+
44
+ ### ✅ Full Support (WebGPU)
45
+ - Chrome 113+
46
+ - Edge 113+
47
+
48
+ ### ⚠️ CPU Fallback
49
+ - Firefox (no WebGPU yet)
50
+ - Safari (limited support)
51
+
52
+ Check your browser: https://caniuse.com/webgpu
53
+
54
+ ## Troubleshooting
55
+
56
+ ### Model won't load
57
+ - Check internet connection (2GB download)
58
+ - Try refreshing the page
59
+ - Check browser console for errors
60
+
61
+ ### No microphone access
62
+ - Grant microphone permissions when prompted
63
+ - Check browser settings (Settings → Privacy → Microphone)
64
+
65
+ ### Slow performance
66
+ - Use Chrome or Edge with WebGPU support
67
+ - Close other tabs to free memory
68
+ - Check performance metrics - RTF should be <1.0
69
+
70
+ ### "Failed to start recording"
71
+ - Ensure microphone is connected
72
+ - Try using headphones with built-in mic
73
+ - Check if another app is using the microphone
74
+
75
+ ## Building for Production
76
+
77
+ ```bash
78
+ npm run build
79
+ npm run preview
80
+ ```
81
+
82
+ The build output will be in `dist/` folder.
83
+
84
+ ## Next Steps
85
+
86
+ - Read the full [README.md](README.md) for technical details
87
+ - Check the implementation plan: [../../../.claude/plans/validated-hugging-book.md](../../../.claude/plans/validated-hugging-book.md)
88
+ - Compare with Python implementation: [../STT/smart_progressive_streaming.py](../STT/smart_progressive_streaming.py)
89
+
90
+ ## Key Files
91
+
92
+ - `src/App.jsx` - Main application component
93
+ - `src/worker.js` - Web Worker for model inference
94
+ - `src/utils/progressive-streaming.js` - Smart streaming algorithm (ported from Python)
95
+ - `src/utils/audio.js` - Microphone capture and audio processing
96
+ - `src/components/TranscriptionDisplay.jsx` - Live transcription UI
97
+ - `src/components/PerformanceMetrics.jsx` - Developer metrics dashboard
98
+
99
+ Enjoy the demo! 🎤
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Parakeet STT Progressive Transcription
3
+ emoji: 🎤
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: static
7
+ pinned: false
8
+ ---
9
+
10
+ # Parakeet STT Progressive Transcription Demo
11
+
12
+ Real-time speech recognition with smart progressive streaming, powered by **Parakeet TDT 0.6B v3** (ONNX) via [parakeet.js](https://github.com/ysdede/parakeet.js) and WebGPU acceleration.
13
+
14
+ ## Features
15
+
16
+ - **🎤 Parakeet TDT 0.6B v3**: NVIDIA's multilingual speech recognition model
17
+ - 25 European languages supported
18
+ - Word-level timestamps and confidence scores
19
+ - WebGPU accelerated inference
20
+
21
+ - **⚡ Smart Progressive Streaming**: Intelligent window management with sentence-aware boundaries
22
+ - Growing window (0-15s) for accuracy
23
+ - Sentence-aware sliding window (>15s) to maintain context
24
+ - Real-time updates every 250ms
25
+
26
+ - **🔒 Privacy-First**: All processing happens locally in your browser - no data sent to servers
27
+
28
+ - **🎨 Visual Feedback**:
29
+ - Yellow text: Fixed sentences (completed, won't change)
30
+ - Cyan text: Active transcription (in-progress)
31
+
32
+ - **📊 Developer Metrics**: Real-time performance monitoring
33
+ - Latency and Real-time Factor (RTF)
34
+ - Window state visualization
35
+ - Memory usage tracking
36
+ - Confidence scores
37
+
38
+ ## Tech Stack
39
+
40
+ - **Model**: [Parakeet TDT 0.6B v3 (ONNX)](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
41
+ - **Inference**: [parakeet.js](https://www.npmjs.com/package/parakeet.js) + [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/)
42
+ - **Framework**: React 18 + Vite
43
+ - **Styling**: Tailwind CSS
44
+
45
+ ## Usage
46
+
47
+ 1. **Load Model**: Click "Load Model" to download Parakeet (~2.5GB, one-time download)
48
+ 2. **Start Recording**: Click "Start Recording" and grant microphone permissions
49
+ 3. **Speak**: Watch real-time progressive transcriptions appear
50
+ 4. **Stop Recording**: Click "Stop Recording" to finalize the transcription
51
+
52
+ ## How It Works
53
+
54
+ ### Progressive Streaming Algorithm
55
+
56
+ This demo implements the smart progressive streaming algorithm from the [speech-to-speech repository](https://github.com/huggingface/speech-to-speech):
57
+
58
+ 1. **Growing Window (0-15s)**:
59
+ - Accumulates audio for better accuracy
60
+ - Re-transcribes entire buffer every 250ms
61
+
62
+ 2. **Sliding Window (>15s)**:
63
+ - Locks completed sentences as "fixed"
64
+ - Only re-transcribes active portion (last 2s)
65
+ - Prevents memory growth and maintains accuracy
66
+
67
+ ### Architecture
68
+
69
+ ```
70
+ User Microphone
71
+
72
+ Web Audio API (16kHz)
73
+
74
+ Audio Processor (accumulate chunks)
75
+
76
+ Progressive Streaming Handler (250ms updates)
77
+
78
+ Web Worker → Parakeet ONNX Model (via parakeet.js + WebGPU)
79
+
80
+ Transcription Display (yellow fixed + cyan active)
81
+ ```
82
+
83
+ ## Model Information
84
+
85
+ - **Model**: Parakeet TDT 0.6B v3
86
+ - **Format**: ONNX (optimized for web via parakeet.js)
87
+ - **Size**: ~2.5GB
88
+ - **Languages**: 25 European languages (EN, DE, FR, ES, IT, PT, NL, PL, RU, UK, CS, SK, HU, RO, BG, HR, SL, SR, DA, NO, SV, FI, ET, LV, LT)
89
+ - **Sample Rate**: 16kHz
90
+ - **Architecture**: Conformer encoder + RNN-Transducer decoder
91
+
92
+ ## Browser Compatibility
93
+
94
+ | Browser | WebGPU Support | Status |
95
+ |---------|----------------|--------|
96
+ | Chrome 113+ | ✅ Yes | Full support |
97
+ | Edge 113+ | ✅ Yes | Full support |
98
+ | Firefox | ⚠️ Limited | WASM fallback |
99
+ | Safari | ⚠️ Limited | WASM fallback |
100
+
101
+ ## Performance
102
+
103
+ - **First result**: <500ms latency
104
+ - **Progressive updates**: 250ms cadence
105
+ - **RTF (Real-time Factor)**: ~0.3-0.5x with WebGPU
106
+ - **Model loading**: 1-2 minutes (one-time, cached locally)
107
+
108
+ **Note**: Browser-based inference is inherently slower than native implementations. For comparison, the Python MLX implementation achieves ~60x faster performance on Apple Silicon. This is a fundamental limitation of running large models in browsers.
109
+
110
+ ## Credits
111
+
112
+ - **Progressive Streaming Algorithm**: [speech-to-speech/STT/smart_progressive_streaming.py](https://github.com/huggingface/speech-to-speech/blob/main/STT/smart_progressive_streaming.py)
113
+ - **Parakeet.js**: [ysdede/parakeet.js](https://github.com/ysdede/parakeet.js)
114
+ - **ONNX Model**: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
115
+ - **Original Model**: NVIDIA Parakeet TDT 0.6B v3
116
+
117
+ ## License
118
+
119
+ MIT
120
+
121
+ ## References
122
+
123
+ - [Parakeet.js Documentation](https://github.com/ysdede/parakeet.js)
124
+ - [Parakeet.js Live Demo](https://huggingface.co/spaces/ysdede/parakeet.js-demo)
125
+ - [Original Python Implementation](https://github.com/huggingface/speech-to-speech)
dist/assets/hub-BlMT648A.js ADDED
@@ -0,0 +1 @@
 
 
1
+ import{getModelConfig as P}from"./models-Dq2DCePq.js";const H="parakeet-cache-db",h="file-store";let B=null;const $=new Map;async function E(t,e="main"){const o=`${t}@${e}`;if($.has(o))return $.get(o);const r=`https://huggingface.co/api/models/${t}?revision=${e}`;try{const n=await fetch(r);if(!n.ok)throw new Error(`Failed to list repo files: ${n.status}`);const c=(await n.json()).siblings?.map(s=>s.rfilename)||[];return $.set(o,c),c}catch(n){return console.warn("[Hub] Could not fetch repo file list – falling back to optimistic fetch",n),$.set(o,[]),[]}}function U(){return B||(B=new Promise((t,e)=>{const o=indexedDB.open(H,1);o.onerror=()=>e("Error opening IndexedDB"),o.onsuccess=()=>t(o.result),o.onupgradeneeded=r=>{const n=r.target.result;n.objectStoreNames.contains(h)||n.createObjectStore(h)}})),B}async function S(t){const e=await U();return new Promise((o,r)=>{const c=e.transaction([h],"readonly").objectStore(h).get(t);c.onerror=()=>r("Error reading from DB"),c.onsuccess=()=>o(c.result)})}async function N(t,e){const o=await U();return new Promise((r,n)=>{const s=o.transaction([h],"readwrite").objectStore(h).put(e,t);s.onerror=()=>n("Error writing to DB"),s.onsuccess=()=>r(s.result)})}async function R(t,e,o={}){const{revision:r="main",subfolder:n="",progress:d}=o,c="https://huggingface.co",s=[t,"resolve",r];n&&s.push(n),s.push(e);const b=`${c}/${s.join("/")}`,w=`hf-${t}-${r}-${n}-${e}`;if(typeof indexedDB<"u")try{const a=await S(w);if(a)return console.log(`[Hub] Using cached ${e} from IndexedDB`),URL.createObjectURL(a)}catch(a){console.warn("[Hub] IndexedDB cache check failed:",a)}console.log(`[Hub] Downloading ${e} from ${t}...`);const i=await fetch(b);if(!i.ok)throw new Error(`Failed to download ${e}: ${i.status} ${i.statusText}`);const l=i.headers.get("content-length"),g=l?parseInt(l):0;let m=0;const y=i.body.getReader(),u=[];for(;;){const{done:a,value:p}=await y.read();if(a)break;u.push(p),m+=p.length,d&&g>0&&d({loaded:m,total:g,file:e})}const f=new Blob(u,{type:i.headers.get("content-type")||"application/octet-stream"});if(typeof indexedDB<"u")try{await N(w,f),console.log(`[Hub] Cached ${e} in IndexedDB`)}catch(a){console.warn("[Hub] Failed to cache in IndexedDB:",a)}return URL.createObjectURL(f)}async function C(t,e={}){const o=P(t),r=o?.repoId||t,n=o?.preprocessor||"nemo128",{encoderQuant:d="int8",decoderQuant:c="int8",preprocessor:s=n,preprocessorBackend:b="js",backend:w="webgpu",progress:i}=e;let l=d,g=c;w.startsWith("webgpu")&&l==="int8"&&(console.warn("[Hub] Forcing encoder to fp32 on WebGPU (int8 unsupported)"),l="fp32");const m=l==="int8"?".int8.onnx":".onnx",y=g==="int8"?".int8.onnx":".onnx",u=`encoder-model${m}`,f=`decoder_joint-model${y}`,a=await E(r,e.revision||"main"),p=[{key:"encoderUrl",name:u},{key:"decoderUrl",name:f},{key:"tokenizerUrl",name:"vocab.txt"}];b!=="js"?(p.push({key:"preprocessorUrl",name:`${s}.onnx`}),console.log(`[Hub] Preprocessor: ONNX — will download ${s}.onnx`)):console.log(`[Hub] Preprocessor: JS (mel.js) — skipping ${s}.onnx download`),a.includes(`${u}.data`)&&p.push({key:"encoderDataUrl",name:`${u}.data`}),a.includes(`${f}.data`)&&p.push({key:"decoderDataUrl",name:`${f}.data`});const x={urls:{},filenames:{encoder:u,decoder:f},quantisation:{encoder:l,decoder:g},modelConfig:o||null,preprocessorBackend:b};for(const{key:k,name:D}of p)try{const j=i?F=>i({...F,file:D}):void 0;x.urls[k]=await R(r,D,{...e,progress:j})}catch(j){if(k.endsWith("DataUrl"))console.warn(`[Hub] Optional external data file not found: ${D}. This is expected if the model is small.`),x.urls[k]=null;else throw j}return x}export{R as getModelFile,C as getParakeetModel};
dist/assets/index-BG0k6Qhd.css ADDED
@@ -0,0 +1 @@
 
 
1
+ *,:before,:after{--tw-border-spacing-x: 0;--tw-border-spacing-y: 0;--tw-translate-x: 0;--tw-translate-y: 0;--tw-rotate: 0;--tw-skew-x: 0;--tw-skew-y: 0;--tw-scale-x: 1;--tw-scale-y: 1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness: proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width: 0px;--tw-ring-offset-color: #fff;--tw-ring-color: rgb(59 130 246 / .5);--tw-ring-offset-shadow: 0 0 #0000;--tw-ring-shadow: 0 0 #0000;--tw-shadow: 0 0 #0000;--tw-shadow-colored: 0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }::backdrop{--tw-border-spacing-x: 0;--tw-border-spacing-y: 0;--tw-translate-x: 0;--tw-translate-y: 0;--tw-rotate: 0;--tw-skew-x: 0;--tw-skew-y: 0;--tw-scale-x: 1;--tw-scale-y: 1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness: proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width: 0px;--tw-ring-offset-color: #fff;--tw-ring-color: rgb(59 130 246 / .5);--tw-ring-offset-shadow: 0 0 #0000;--tw-ring-shadow: 0 0 #0000;--tw-shadow: 0 0 #0000;--tw-shadow-colored: 0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }*,:before,:after{box-sizing:border-box;border-width:0;border-style:solid;border-color:#e5e7eb}:before,:after{--tw-content: ""}html,:host{line-height:1.5;-webkit-text-size-adjust:100%;-moz-tab-size:4;-o-tab-size:4;tab-size:4;font-family:ui-sans-serif,system-ui,sans-serif,"Apple Color Emoji","Segoe UI Emoji",Segoe UI Symbol,"Noto Color Emoji";font-feature-settings:normal;font-variation-settings:normal;-webkit-tap-highlight-color:transparent}body{margin:0;line-height:inherit}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,samp,pre{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;font-feature-settings:normal;font-variation-settings:normal;font-size:1em}small{font-size:80%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}button,input,optgroup,select,textarea{font-family:inherit;font-feature-settings:inherit;font-variation-settings:inherit;font-size:100%;font-weight:inherit;line-height:inherit;letter-spacing:inherit;color:inherit;margin:0;padding:0}button,select{text-transform:none}button,input:where([type=button]),input:where([type=reset]),input:where([type=submit]){-webkit-appearance:button;background-color:transparent;background-image:none}:-moz-focusring{outline:auto}:-moz-ui-invalid{box-shadow:none}progress{vertical-align:baseline}::-webkit-inner-spin-button,::-webkit-outer-spin-button{height:auto}[type=search]{-webkit-appearance:textfield;outline-offset:-2px}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{-webkit-appearance:button;font:inherit}summary{display:list-item}blockquote,dl,dd,h1,h2,h3,h4,h5,h6,hr,figure,p,pre{margin:0}fieldset{margin:0;padding:0}legend{padding:0}ol,ul,menu{list-style:none;margin:0;padding:0}dialog{padding:0}textarea{resize:vertical}input::-moz-placeholder,textarea::-moz-placeholder{opacity:1;color:#9ca3af}input::placeholder,textarea::placeholder{opacity:1;color:#9ca3af}button,[role=button]{cursor:pointer}:disabled{cursor:default}img,svg,video,canvas,audio,iframe,embed,object{display:block;vertical-align:middle}img,video{max-width:100%;height:auto}[hidden]:where(:not([hidden=until-found])){display:none}.fixed{position:fixed}.mx-auto{margin-left:auto;margin-right:auto}.mb-1{margin-bottom:.25rem}.mb-2{margin-bottom:.5rem}.mb-3{margin-bottom:.75rem}.mb-4{margin-bottom:1rem}.ml-1{margin-left:.25rem}.ml-4{margin-left:1rem}.mt-1{margin-top:.25rem}.mt-12{margin-top:3rem}.mt-2{margin-top:.5rem}.mt-4{margin-top:1rem}.mt-6{margin-top:1.5rem}.flex{display:flex}.grid{display:grid}.h-3{height:.75rem}.h-4{height:1rem}.h-5{height:1.25rem}.max-h-\[400px\]{max-height:400px}.min-h-\[200px\]{min-height:200px}.min-h-screen{min-height:100vh}.w-3{width:.75rem}.w-4{width:1rem}.w-5{width:1.25rem}.w-full{width:100%}.max-w-4xl{max-width:56rem}.max-w-6xl{max-width:72rem}@keyframes pulse{50%{opacity:.5}}.animate-pulse{animation:pulse 2s cubic-bezier(.4,0,.6,1) infinite}@keyframes spin{to{transform:rotate(360deg)}}.animate-spin{animation:spin 1s linear infinite}.list-inside{list-style-position:inside}.list-disc{list-style-type:disc}.grid-cols-1{grid-template-columns:repeat(1,minmax(0,1fr))}.grid-cols-2{grid-template-columns:repeat(2,minmax(0,1fr))}.items-center{align-items:center}.justify-between{justify-content:space-between}.gap-2{gap:.5rem}.gap-3{gap:.75rem}.gap-4{gap:1rem}.gap-6{gap:1.5rem}.space-y-1>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(.25rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(.25rem * var(--tw-space-y-reverse))}.space-y-3>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(.75rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(.75rem * var(--tw-space-y-reverse))}.space-y-8>:not([hidden])~:not([hidden]){--tw-space-y-reverse: 0;margin-top:calc(2rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(2rem * var(--tw-space-y-reverse))}.overflow-y-auto{overflow-y:auto}.rounded{border-radius:.25rem}.rounded-full{border-radius:9999px}.rounded-lg{border-radius:.5rem}.border{border-width:1px}.border-2{border-width:2px}.border-b{border-bottom-width:1px}.border-t{border-top-width:1px}.border-cyan-400{--tw-border-opacity: 1;border-color:rgb(34 211 238 / var(--tw-border-opacity, 1))}.border-gray-700{--tw-border-opacity: 1;border-color:rgb(55 65 81 / var(--tw-border-opacity, 1))}.border-gray-800{--tw-border-opacity: 1;border-color:rgb(31 41 55 / var(--tw-border-opacity, 1))}.border-green-700{--tw-border-opacity: 1;border-color:rgb(21 128 61 / var(--tw-border-opacity, 1))}.border-red-700{--tw-border-opacity: 1;border-color:rgb(185 28 28 / var(--tw-border-opacity, 1))}.border-t-transparent{border-top-color:transparent}.bg-cyan-400{--tw-bg-opacity: 1;background-color:rgb(34 211 238 / var(--tw-bg-opacity, 1))}.bg-gray-700{--tw-bg-opacity: 1;background-color:rgb(55 65 81 / var(--tw-bg-opacity, 1))}.bg-gray-800{--tw-bg-opacity: 1;background-color:rgb(31 41 55 / var(--tw-bg-opacity, 1))}.bg-gray-900{--tw-bg-opacity: 1;background-color:rgb(17 24 39 / var(--tw-bg-opacity, 1))}.bg-gray-950\/50{background-color:#0a0a0a80}.bg-green-900\/30{background-color:#14532d4d}.bg-red-500{--tw-bg-opacity: 1;background-color:rgb(239 68 68 / var(--tw-bg-opacity, 1))}.bg-red-900\/30{background-color:#7f1d1d4d}.bg-yellow-400{--tw-bg-opacity: 1;background-color:rgb(250 204 21 / var(--tw-bg-opacity, 1))}.bg-gradient-to-b{background-image:linear-gradient(to bottom,var(--tw-gradient-stops))}.bg-gradient-to-r{background-image:linear-gradient(to right,var(--tw-gradient-stops))}.from-cyan-400{--tw-gradient-from: #22d3ee var(--tw-gradient-from-position);--tw-gradient-to: rgb(34 211 238 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.from-cyan-500{--tw-gradient-from: #06b6d4 var(--tw-gradient-from-position);--tw-gradient-to: rgb(6 182 212 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.from-gray-950{--tw-gradient-from: #0a0a0a var(--tw-gradient-from-position);--tw-gradient-to: rgb(10 10 10 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.from-green-500{--tw-gradient-from: #22c55e var(--tw-gradient-from-position);--tw-gradient-to: rgb(34 197 94 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.from-red-500{--tw-gradient-from: #ef4444 var(--tw-gradient-from-position);--tw-gradient-to: rgb(239 68 68 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.to-blue-500{--tw-gradient-to: #3b82f6 var(--tw-gradient-to-position)}.to-emerald-500{--tw-gradient-to: #10b981 var(--tw-gradient-to-position)}.to-gray-900{--tw-gradient-to: #111827 var(--tw-gradient-to-position)}.to-pink-500{--tw-gradient-to: #ec4899 var(--tw-gradient-to-position)}.bg-clip-text{-webkit-background-clip:text;background-clip:text}.p-3{padding:.75rem}.p-4{padding:1rem}.p-6{padding:1.5rem}.px-4{padding-left:1rem;padding-right:1rem}.px-6{padding-left:1.5rem;padding-right:1.5rem}.py-2{padding-top:.5rem;padding-bottom:.5rem}.py-3{padding-top:.75rem;padding-bottom:.75rem}.py-6{padding-top:1.5rem;padding-bottom:1.5rem}.py-8{padding-top:2rem;padding-bottom:2rem}.pb-4{padding-bottom:1rem}.pt-4{padding-top:1rem}.text-center{text-align:center}.font-mono{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace}.font-sans{font-family:ui-sans-serif,system-ui,sans-serif,"Apple Color Emoji","Segoe UI Emoji",Segoe UI Symbol,"Noto Color Emoji"}.text-2xl{font-size:1.5rem;line-height:2rem}.text-3xl{font-size:1.875rem;line-height:2.25rem}.text-lg{font-size:1.125rem;line-height:1.75rem}.text-sm{font-size:.875rem;line-height:1.25rem}.text-xl{font-size:1.25rem;line-height:1.75rem}.text-xs{font-size:.75rem;line-height:1rem}.font-bold{font-weight:700}.font-medium{font-weight:500}.font-semibold{font-weight:600}.uppercase{text-transform:uppercase}.italic{font-style:italic}.leading-relaxed{line-height:1.625}.tracking-wider{letter-spacing:.05em}.text-cyan-400{--tw-text-opacity: 1;color:rgb(34 211 238 / var(--tw-text-opacity, 1))}.text-gray-100{--tw-text-opacity: 1;color:rgb(243 244 246 / var(--tw-text-opacity, 1))}.text-gray-200{--tw-text-opacity: 1;color:rgb(229 231 235 / var(--tw-text-opacity, 1))}.text-gray-300{--tw-text-opacity: 1;color:rgb(209 213 219 / var(--tw-text-opacity, 1))}.text-gray-400{--tw-text-opacity: 1;color:rgb(156 163 175 / var(--tw-text-opacity, 1))}.text-gray-500{--tw-text-opacity: 1;color:rgb(107 114 128 / var(--tw-text-opacity, 1))}.text-green-400{--tw-text-opacity: 1;color:rgb(74 222 128 / var(--tw-text-opacity, 1))}.text-transparent{color:transparent}.text-white{--tw-text-opacity: 1;color:rgb(255 255 255 / var(--tw-text-opacity, 1))}.text-yellow-400{--tw-text-opacity: 1;color:rgb(250 204 21 / var(--tw-text-opacity, 1))}.opacity-80{opacity:.8}.shadow-lg{--tw-shadow: 0 10px 15px -3px rgb(0 0 0 / .1), 0 4px 6px -4px rgb(0 0 0 / .1);--tw-shadow-colored: 0 10px 15px -3px var(--tw-shadow-color), 0 4px 6px -4px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}.shadow-xl{--tw-shadow: 0 20px 25px -5px rgb(0 0 0 / .1), 0 8px 10px -6px rgb(0 0 0 / .1);--tw-shadow-colored: 0 20px 25px -5px var(--tw-shadow-color), 0 8px 10px -6px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}.backdrop-blur{--tw-backdrop-blur: blur(8px);backdrop-filter:var(--tw-backdrop-blur) var(--tw-backdrop-brightness) var(--tw-backdrop-contrast) var(--tw-backdrop-grayscale) var(--tw-backdrop-hue-rotate) var(--tw-backdrop-invert) var(--tw-backdrop-opacity) var(--tw-backdrop-saturate) var(--tw-backdrop-sepia)}.transition-all{transition-property:all;transition-timing-function:cubic-bezier(.4,0,.2,1);transition-duration:.15s}.duration-200{transition-duration:.2s}:root{font-family:Inter,system-ui,Avenir,Helvetica,Arial,sans-serif;line-height:1.5;font-weight:400;color-scheme:dark;color:#ffffffde;background-color:#0a0a0a;font-synthesis:none;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}body{margin:0;min-width:320px;min-height:100vh}::-webkit-scrollbar{width:8px;height:8px}::-webkit-scrollbar-track{background:#1a1a1a}::-webkit-scrollbar-thumb{background:#444;border-radius:4px}::-webkit-scrollbar-thumb:hover{background:#555}.hover\:bg-red-900\/50:hover{background-color:#7f1d1d80}.hover\:from-cyan-600:hover{--tw-gradient-from: #0891b2 var(--tw-gradient-from-position);--tw-gradient-to: rgb(8 145 178 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.hover\:from-green-600:hover{--tw-gradient-from: #16a34a var(--tw-gradient-from-position);--tw-gradient-to: rgb(22 163 74 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.hover\:from-red-600:hover{--tw-gradient-from: #dc2626 var(--tw-gradient-from-position);--tw-gradient-to: rgb(220 38 38 / 0) var(--tw-gradient-to-position);--tw-gradient-stops: var(--tw-gradient-from), var(--tw-gradient-to)}.hover\:to-blue-600:hover{--tw-gradient-to: #2563eb var(--tw-gradient-to-position)}.hover\:to-emerald-600:hover{--tw-gradient-to: #059669 var(--tw-gradient-to-position)}.hover\:to-pink-600:hover{--tw-gradient-to: #db2777 var(--tw-gradient-to-position)}.hover\:text-cyan-300:hover{--tw-text-opacity: 1;color:rgb(103 232 249 / var(--tw-text-opacity, 1))}.hover\:shadow-xl:hover{--tw-shadow: 0 20px 25px -5px rgb(0 0 0 / .1), 0 8px 10px -6px rgb(0 0 0 / .1);--tw-shadow-colored: 0 20px 25px -5px var(--tw-shadow-color), 0 8px 10px -6px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow, 0 0 #0000),var(--tw-ring-shadow, 0 0 #0000),var(--tw-shadow)}@media(min-width:768px){.md\:grid-cols-3{grid-template-columns:repeat(3,minmax(0,1fr))}.md\:grid-cols-4{grid-template-columns:repeat(4,minmax(0,1fr))}}
dist/assets/index-C6lwVqn6.js ADDED
The diff for this file is too large to render. See raw diff
 
dist/assets/models-Dq2DCePq.js ADDED
@@ -0,0 +1 @@
 
 
1
+ const a={"parakeet-tdt-0.6b-v2":{repoId:"ysdede/parakeet-tdt-0.6b-v2-onnx",displayName:"Parakeet TDT 0.6B v2 (English)",languages:["en"],defaultLanguage:"en",vocabSize:1025,featuresSize:128,preprocessor:"nemo128",subsampling:8,predHidden:640,predLayers:2},"parakeet-tdt-0.6b-v3":{repoId:"istupakov/parakeet-tdt-0.6b-v3-onnx",displayName:"Parakeet TDT 0.6B v3 (Multilingual)",languages:["en","fr","de","es","it","pt","nl","pl","ru","uk","ja","ko","zh"],defaultLanguage:"en",vocabSize:4097,featuresSize:128,preprocessor:"nemo128",subsampling:8,predHidden:640,predLayers:2}};function t(e){if(a[e])return a[e];for(const[r,n]of Object.entries(a))if(n.repoId===e)return n;return null}export{a as MODELS,t as getModelConfig};
dist/assets/onnxruntime-l0sNRNKZ.js ADDED
@@ -0,0 +1 @@
 
 
1
+
dist/assets/ort-wasm-simd-threaded.jsep-6MnTkKum.wasm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5febcf74919ee7ba3c7e838290c9ef2c03d6da297f06a8facfd7d22f623d7cd9
3
+ size 24911187
dist/assets/ort-wasm-simd-threaded.jsep-B0T3yYHD.wasm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c46655e8a94afc45338d4cb2b840475f88e5012d524509916e505079c00bfa39
3
+ size 21596019
dist/assets/ort.bundle.min-LxnbbrqV.js ADDED
The diff for this file is too large to render. See raw diff
 
dist/assets/parakeet-l0sNRNKZ.js ADDED
@@ -0,0 +1 @@
 
 
1
+
dist/assets/parakeet-xcg-VHSn.js ADDED
@@ -0,0 +1 @@
 
 
1
+ async function Ne({backend:f="webgpu",wasmPaths:e,numThreads:s}={}){let t;try{const n=await import("./ort.bundle.min-LxnbbrqV.js");t=n.default||n,console.log("[Parakeet.js] ORT structure:",{hasDefault:!!n.default,hasEnv:!!t.env,hasWasm:!!t.env?.wasm,hasWebgpu:!!t.env?.webgpu,keys:Object.keys(t).slice(0,10)}),t.env||(console.log("[Parakeet.js] Trying alternative access patterns..."),console.log("[Parakeet.js] ortModule keys:",Object.keys(n)),n.ort&&(t=n.ort,console.log("[Parakeet.js] Found ort in ortModule.ort")))}catch(n){throw console.error("[Parakeet.js] Failed to import onnxruntime-web:",n),new Error("Failed to load ONNX Runtime Web. Please check your network connection.")}if(!t||!t.env)throw new Error("ONNX Runtime Web loaded but env is not available. This might be a bundling issue.");if(!t.env.wasm.wasmPaths){const n="1.22.0-dev.20250409-89f8206ba4";t.env.wasm.wasmPaths=`https://cdn.jsdelivr.net/npm/onnxruntime-web@${n}/dist/`}if((f==="wasm"||f==="webgpu")&&(typeof SharedArrayBuffer<"u"?(t.env.wasm.numThreads=s||navigator.hardwareConcurrency||4,t.env.wasm.simd=!0,console.log(`[Parakeet.js] WASM configured with ${t.env.wasm.numThreads} threads, SIMD enabled`)):(console.warn("[Parakeet.js] SharedArrayBuffer not available - using single-threaded WASM"),t.env.wasm.numThreads=1),t.env.wasm.proxy=!1),f==="webgpu"){const n="gpu"in navigator;if(console.log(`[Parakeet.js] WebGPU supported: ${n}`),n)try{console.log("[Parakeet.js] WebGPU will be initialized automatically when creating session")}catch(r){console.warn("[Parakeet.js] WebGPU initialization failed:",r),console.warn("[Parakeet.js] Falling back to WASM"),f="wasm"}else console.warn("[Parakeet.js] WebGPU not supported – falling back to WASM"),f="wasm"}return typeof globalThis<"u"&&(globalThis.ort=t),typeof self<"u"&&(self.ort=t),t}async function Ue(f){const e=await fetch(f);if(!e.ok)throw new Error(`Failed to fetch ${f}: ${e.status}`);return e.text()}class Se{constructor(e){this.id2token=e,this.blankToken="<blk>",this.blankId=e.findIndex(s=>s==="<blk>"),this.blankId===-1&&(console.warn("[ParakeetTokenizer] Blank token <blk> not found in vocabulary, defaulting to 1024"),this.blankId=1024)}static async fromUrl(e){const t=(await Ue(e)).split(/\r?\n/).filter(Boolean),n=[];for(const r of t){const[c,i]=r.split(/\s+/),a=parseInt(i,10);n[a]=c}return new Se(n)}decode(e){const s=[];for(const n of e){const r=this.id2token[n];r!==void 0&&r!==this.blankToken&&s.push(r.replace(/\u2581/g," "))}let t=s.join("");return t=t.replace(/^\s+/,""),t=t.replace(/\s+(?=[^\w\s])/g,""),t=t.replace(/\s+/g," "),t.trim()}}class Ge{constructor(e,s={}){this.modelUrl=e,this.opts=s,this.opts.enableGraphCapture===void 0&&(this.opts.enableGraphCapture=this.opts.backend==="wasm"),this.session=null,this.ort=null}async _ensureSession(){if(!this.session){this.ort=await Ne(this.opts);const e=this.opts.enableGraphCapture?{enableProfiling:this.opts.enableProfiling||!1,enableGraphCapture:!0}:{enableProfiling:this.opts.enableProfiling||!1},s=async()=>{try{return await this.ort.InferenceSession.create(this.modelUrl,e)}catch(t){const n=(t.message||"")+"";if(e.enableGraphCapture&&n.includes("graph capture"))return console.warn("[Preprocessor] Graph capture unsupported, retrying without it"),await this.ort.InferenceSession.create(this.modelUrl,{...e,enableGraphCapture:!1});throw t}};this.session=await s()}}async process(e){await this._ensureSession();let s;e instanceof Float32Array?s=e.byteOffset===0||e.byteLength===e.length*4?e:new Float32Array(e):s=new Float32Array(e);const t=new this.ort.Tensor("float32",s,[1,s.length]),n=new BigInt64Array([BigInt(s.length)]),r=new this.ort.Tensor("int64",n,[1]),c={waveforms:t,waveforms_lens:r},i=await this.session.run(c),a=i.features,l=i.features_lens;return{features:a.data,length:Number(l.data[0])}}}const Xe=16e3,O=512,xe=400,ee=160,Ce=.97,je=2**-24,B=(O>>1)+1,ye=200/3,me=1e3,Pe=me/ye,Be=Math.log(6.4)/27;function Oe(f){return f>=me?Pe+Math.log(f/me)/Be:f/ye}function He(f){return f>=Pe?me*Math.exp(Be*(f-Pe)):f*ye}function Ve(f){const s=Xe/2,t=new Float64Array(B);for(let o=0;o<B;o++)t[o]=s*o/(B-1);const n=Oe(0),r=Oe(s),c=f+2,i=new Float64Array(c);for(let o=0;o<c;o++)i[o]=He(n+(r-n)*o/(c-1));const a=new Float64Array(c-1);for(let o=0;o<c-1;o++)a[o]=i[o+1]-i[o];const l=new Float32Array(f*B);for(let o=0;o<f;o++){const m=2/(i[o+2]-i[o]),k=o*B;for(let d=0;d<B;d++){const u=(t[d]-i[o])/a[o],x=(i[o+2]-t[d])/a[o+1];l[k+d]=Math.max(0,Math.min(u,x))*m}}return l}function Je(){const f=new Float64Array(O),e=O-xe>>1;for(let s=0;s<xe;s++)f[e+s]=.5*(1-Math.cos(2*Math.PI*s/(xe-1)));return f}function qe(f){const e=f>>1,s=new Float64Array(e),t=new Float64Array(e);for(let n=0;n<e;n++){const r=-2*Math.PI*n/f;s[n]=Math.cos(r),t[n]=Math.sin(r)}return{cos:s,sin:t}}function Le(f,e,s,t){let n=0;for(let r=0;r<s-1;r++){if(r<n){let i=f[r];f[r]=f[n],f[n]=i,i=e[r],e[r]=e[n],e[n]=i}let c=s>>1;for(;c<=n;)n-=c,c>>=1;n+=c}for(let r=2;r<=s;r<<=1){const c=r>>1,i=s/r;for(let a=0;a<s;a+=r)for(let l=0;l<c;l++){const o=l*i,m=t.cos[o],k=t.sin[o],d=a+l,u=d+c,x=f[u]*m-e[u]*k,A=f[u]*k+e[u]*m;f[u]=f[d]-x,e[u]=e[d]-A,f[d]+=x,e[d]+=A}}}class oe{constructor(e={}){this.nMels=e.nMels||128,this.melFilterbank=Ve(this.nMels),this.hannWindow=Je(),this.twiddles=qe(O),this._fftRe=new Float64Array(O),this._fftIm=new Float64Array(O),this._powerBuf=new Float32Array(B)}process(e){const s=e.length;if(s===0)return{features:new Float32Array(0),length:0};const t=new Float32Array(s);t[0]=e[0];for(let p=1;p<s;p++)t[p]=e[p]-Ce*e[p-1];const n=O>>1,r=s+2*n,c=new Float64Array(r);for(let p=0;p<s;p++)c[n+p]=t[p];const i=Math.floor((r-O)/ee)+1,a=Math.floor(s/ee);if(a===0)return{features:new Float32Array(0),length:0};const l=new Float32Array(this.nMels*i),o=this._fftRe,m=this._fftIm,k=this._powerBuf,d=this.hannWindow,u=this.melFilterbank,x=this.nMels,A=this.twiddles;for(let p=0;p<i;p++){const _=p*ee;for(let w=0;w<O;w++)o[w]=c[_+w]*d[w],m[w]=0;Le(o,m,O,A);for(let w=0;w<B;w++)k[w]=o[w]*o[w]+m[w]*m[w];for(let w=0;w<x;w++){let S=0;const T=w*B;for(let P=0;P<B;P++)S+=k[P]*u[T+P];l[w*i+p]=Math.log(S+je)}}const F=new Float32Array(x*a);for(let p=0;p<x;p++){const _=p*i,w=p*a;let S=0;for(let b=0;b<a;b++)S+=l[_+b];const T=S/a;let P=0;for(let b=0;b<a;b++){const v=l[_+b]-T;P+=v*v}const z=a>1?1/(Math.sqrt(P/(a-1))+1e-5):0;for(let b=0;b<a;b++)F[w+b]=(l[_+b]-T)*z}return{features:F,length:a}}computeRawMel(e){const s=e.length;if(s===0)return{rawMel:new Float32Array(0),nFrames:0,featuresLen:0};const t=new Float32Array(s);t[0]=e[0];for(let F=1;F<s;F++)t[F]=e[F]-Ce*e[F-1];const n=O>>1,r=s+2*n,c=new Float64Array(r);for(let F=0;F<s;F++)c[n+F]=t[F];const i=Math.floor((r-O)/ee)+1,a=Math.floor(s/ee);if(a===0)return{rawMel:new Float32Array(0),nFrames:0,featuresLen:0};const l=new Float32Array(this.nMels*i),o=this._fftRe,m=this._fftIm,k=this._powerBuf,d=this.hannWindow,u=this.melFilterbank,x=this.nMels,A=this.twiddles;for(let F=0;F<i;F++){const p=F*ee;for(let _=0;_<O;_++)o[_]=c[p+_]*d[_],m[_]=0;Le(o,m,O,A);for(let _=0;_<B;_++)k[_]=o[_]*o[_]+m[_]*m[_];for(let _=0;_<x;_++){let w=0;const S=_*B;for(let T=0;T<B;T++)w+=k[T]*u[S+T];l[_*i+F]=Math.log(w+je)}}return{rawMel:l,nFrames:i,featuresLen:a}}normalizeFeatures(e,s,t){const n=this.nMels,r=new Float32Array(n*t);for(let c=0;c<n;c++){const i=c*s,a=c*t;let l=0;for(let d=0;d<t;d++)l+=e[i+d];const o=l/t;let m=0;for(let d=0;d<t;d++){const u=e[i+d]-o;m+=u*u}const k=t>1?1/(Math.sqrt(m/(t-1))+1e-5):0;for(let d=0;d<t;d++)r[a+d]=(e[i+d]-o)*k}return r}}class $e{constructor(e={}){this.preprocessor=new oe({nMels:e.nMels||128}),this.nMels=this.preprocessor.nMels,this.boundaryFrames=e.boundaryFrames||3,this._cachedRawMel=null,this._cachedNFrames=0,this._cachedAudioLen=0,this._cachedFeaturesLen=0}reset(){this._cachedRawMel=null,this._cachedNFrames=0,this._cachedAudioLen=0,this._cachedFeaturesLen=0}process(e,s=0){const t=e.length;if(t===0)return{features:new Float32Array(0),length:0,cached:!1,cachedFrames:0,newFrames:0};if(!(s>0&&this._cachedRawMel!==null&&s<=this._cachedAudioLen)){const m=this.preprocessor.process(e),{rawMel:k,nFrames:d,featuresLen:u}=this.preprocessor.computeRawMel(e);return this._cachedRawMel=k,this._cachedNFrames=d,this._cachedAudioLen=t,this._cachedFeaturesLen=u,{...m,cached:!1,cachedFrames:0,newFrames:u}}const r=Math.floor(s/ee),c=Math.max(0,Math.min(r-this.boundaryFrames,this._cachedFeaturesLen)),{rawMel:i,nFrames:a,featuresLen:l}=this.preprocessor.computeRawMel(e);if(c>0&&this._cachedRawMel)for(let m=0;m<this.nMels;m++){const k=m*this._cachedNFrames,d=m*a;for(let u=0;u<c;u++)i[d+u]=this._cachedRawMel[k+u]}const o=this.preprocessor.normalizeFeatures(i,a,l);return this._cachedRawMel=i,this._cachedNFrames=a,this._cachedAudioLen=t,this._cachedFeaturesLen=l,{features:o,length:l,cached:!0,cachedFrames:c,newFrames:l-c}}clear(){this._cachedRawMel=null,this._cachedNFrames=0,this._cachedAudioLen=0,this._cachedFeaturesLen=0}}class Re{constructor({tokenizer:e,encoderSession:s,joinerSession:t,preprocessor:n,ort:r,subsampling:c=8,windowStride:i=.01,normalizer:a=m=>m,onnxPreprocessor:l=null,nMels:o}){this.tokenizer=e,this.encoderSession=s,this.joinerSession=t,this.preprocessor=n,this.ort=r,this._onnxPreprocessor=l,this._jsPreprocessor=n instanceof oe?n:null,this._incrementalMel=n instanceof oe?new $e({nMels:n.nMels}):null,this.blankId=e.blankId,this.predHidden=640,this.predLayers=2,this.maxTokensPerStep=10;const m=this.predLayers,k=this.predHidden,d=m*1*k,u=new Float32Array(d);this._combState1=new r.Tensor("float32",u,[m,1,k]),this._combState2=new r.Tensor("float32",u.slice(),[m,1,k]),this._normalizer=a,this.subsampling=c,this.windowStride=i,this._nMels=o||128,this._targetIdArray=new Int32Array(1),this._targetTensor=new r.Tensor("int32",this._targetIdArray,[1,1]),this._targetLenArray=new Int32Array([1]),this._targetLenTensor=new r.Tensor("int32",this._targetLenArray,[1]),this._encoderFrameBuffer=null,this._encoderFrameTensor=null,this._incrementalCache=new Map}static async fromUrls(e){const{encoderUrl:s,decoderUrl:t,tokenizerUrl:n,preprocessorUrl:r,encoderDataUrl:c,decoderDataUrl:i,filenames:a,backend:l="webgpu-hybrid",wasmPaths:o,subsampling:m=8,windowStride:k=.01,verbose:d=!1,enableProfiling:u=!1,enableGraphCapture:x,cpuThreads:A=void 0,preprocessorBackend:F="js",nMels:p}=e,_=F==="js";if(console.log(`[Parakeet.js] Preprocessor backend requested: '${F}' → ${_?"JS (mel.js)":"ONNX"}`),!s||!t||!n||!r&&!_)throw new Error('fromUrls requires encoderUrl, decoderUrl, tokenizerUrl and preprocessorUrl (preprocessorUrl not needed if preprocessorBackend="js")');let w=l;l.startsWith("webgpu")&&(w="webgpu");const S=await Ne({backend:w,wasmPaths:o,numThreads:A}),P={executionProviders:[],graphOptimizationLevel:"all",executionMode:"parallel",enableCpuMemArena:!0,enableMemPattern:!0,enableProfiling:u,enableGraphCapture:!!x&&l==="webgpu-strict",logSeverityLevel:d?0:2};l==="webgpu-hybrid"?P.executionProviders=[{name:"webgpu",deviceType:"gpu",powerPreference:"high-performance"},"wasm"]:l==="webgpu-strict"?P.executionProviders=[{name:"webgpu",deviceType:"gpu",powerPreference:"high-performance"}]:l==="wasm"&&(P.executionProviders=["wasm"]),console.log(`[Parakeet.js] Creating ONNX sessions with execution mode '${l}'. Providers:`,P.executionProviders),d&&console.log("[Parakeet.js] Verbose logging enabled for ONNX Runtime.");const z={...P};c&&a?.encoder&&(z.externalData=[{data:c,path:a.encoder+".data"}]);const b={...P};i&&a?.decoder&&(b.externalData=[{data:i,path:a.decoder+".data"}]),l.startsWith("webgpu")&&(b.executionProviders=["wasm"]);async function v(D,W){try{return await S.InferenceSession.create(D,W)}catch(ie){const ce=(ie.message||"")+"";if(W.enableGraphCapture&&ce.includes("graph capture")){console.warn("[Parakeet] Graph-capture unsupported for this model/backend; retrying without it");const Z={...W,enableGraphCapture:!1};return await S.InferenceSession.create(D,Z)}throw ie}}const H=Se.fromUrl(n),I=p||128,te=new oe({nMels:I});let V=null;!_&&r?(V=new Ge(r,{backend:"wasm",wasmPaths:o,enableProfiling:u,enableGraphCapture:!1,numThreads:A}),console.log(`[Parakeet.js] ONNX preprocessor session created (${I} mel bins)`)):!_&&!r&&console.warn("[Parakeet.js] ONNX preprocessor requested but no URL provided — falling back to JS");const U=_?te:V||te,pe=Promise.resolve(U);console.log(`[Parakeet.js] Active preprocessor: ${(U===te?"js":"onnx")==="js"?"JS (mel.js) — no ONNX preprocessor needed":"ONNX (nemo128.onnx)"}, ${I} mel bins`);let ae,L;l==="webgpu-hybrid"?(ae=await v(s,z),L=await v(t,b)):[ae,L]=await Promise.all([v(s,z),v(t,b)]);const[C,R]=await Promise.all([H,pe]);try{const D=new Float32Array(1600);await R.process(D),d&&console.log("[Parakeet.js] Preprocessor warmed up")}catch(D){console.warn("[Parakeet.js] Preprocessor warm-up failed (non-fatal):",D.message)}return new Re({tokenizer:C,encoderSession:ae,joinerSession:L,preprocessor:R,ort:S,subsampling:m,windowStride:k,onnxPreprocessor:V!==R?V:null,nMels:I})}async _runCombinedStep(e,s,t=null){const n=typeof s=="number"?s:this.blankId;this._targetIdArray[0]=n;const r=t?.state1||this._combState1,c=t?.state2||this._combState2,i={encoder_outputs:e,targets:this._targetTensor,target_length:this._targetLenTensor,input_states_1:r,input_states_2:c},a=await this.joinerSession.run(i),l=a.outputs,o=this.tokenizer.id2token.length,m=l.dims[3],k=l.data,d=k.slice(0,o),u=k.slice(o,m);let x=0;if(u.length){let F=-1/0;for(let p=0;p<u.length;++p)u[p]>F&&(F=u[p],x=p)}const A={state1:a.output_states_1||r,state2:a.output_states_2||c};return{tokenLogits:d,step:x,newState:A}}_snapshotDecoderState(e){if(!e)return null;const s=e.state1,t=e.state2;return{s1:new Float32Array(s.data),s2:new Float32Array(t.data),dims1:s.dims.slice(),dims2:t.dims.slice()}}_restoreDecoderState(e){if(!e)return null;const s=new this.ort.Tensor("float32",new Float32Array(e.s1),e.dims1),t=new this.ort.Tensor("float32",new Float32Array(e.s2),e.dims2);return{state1:s,state2:t}}async computeFeatures(e,s=16e3,t={}){const{prefixSamples:n=0}=t;if(this._incrementalMel&&n>0){const a=this._incrementalMel.process(e,n),l=a.length;return{features:a.features,T:l,melBins:this._nMels,cached:a.cached,cachedFrames:a.cachedFrames,newFrames:a.newFrames}}const{features:r,length:c}=await this.preprocessor.process(e),i=r.length/this._nMels;return{features:r,T:i,melBins:this._nMels,validLength:c}}setPreprocessorBackend(e){if(e==="onnx"){if(!this._onnxPreprocessor)throw new Error("ONNX preprocessor not available. Load model with preprocessorUrl to enable ONNX backend.");this.preprocessor=this._onnxPreprocessor,this._incrementalMel=null,console.log("[Parakeet.js] Switched to ONNX preprocessor")}else if(e==="js")this._jsPreprocessor||(this._jsPreprocessor=new oe({nMels:128})),this.preprocessor=this._jsPreprocessor,this._incrementalMel=new $e({nMels:this._jsPreprocessor.nMels}),console.log("[Parakeet.js] Switched to JS preprocessor (incremental caching enabled)");else throw new Error(`Unknown preprocessor backend: ${e}. Use 'js' or 'onnx'.`)}getPreprocessorBackend(){return this.preprocessor instanceof oe?"js":"onnx"}resetMelCache(){this._incrementalMel&&this._incrementalMel.reset()}getFrameTimeStride(){return this.subsampling*this.windowStride}frameToTime(e,s=0){return s+e*this.getFrameTimeStride()}getStreamingConstants(){return{subsampling:this.subsampling,windowStride:this.windowStride,frameTimeStride:this.getFrameTimeStride(),melBins:80,blankId:this.blankId,maxTokensPerStep:this.maxTokensPerStep}}async transcribe(e,s=16e3,t={}){const{returnTimestamps:n=!1,returnConfidences:r=!1,temperature:c=1,debug:i=!1,skipCMVN:a=!1,frameStride:l=1,previousDecoderState:o=null,returnDecoderState:m=!1,timeOffset:k=0,returnTokenIds:d=!1,returnFrameIndices:u=!1,returnLogProbs:x=!1,returnTdtSteps:A=!1,prefixSamples:F=0,precomputedFeatures:p=null}=t,_=!0;let w,S=0,T=0,P=0,z=0;w=performance.now();let b,v,H,I,te,V=p?"mel-worker":this.getPreprocessorBackend();if(p)b=p.features,v=p.T,H=p.melBins,I={},console.log(`[Parakeet] Preprocessor: mel-worker (precomputed ${v} frames × ${H} mel bins, 0 ms)`);else{const h=performance.now();({features:b,T:v,melBins:H,validLength:te,...I}=await this.computeFeatures(e,s,{prefixSamples:F})),S=performance.now()-h;const g=I?.cached?` (cached: ${I.cachedFrames} frames, new: ${I.newFrames} frames)`:"";console.log(`[Parakeet] Preprocessor: ${V}, ${v} frames × ${H} mel bins, ${S.toFixed(1)} ms${g}`)}if(!b||!b.length||v<=0||H<=0)return{utterance_text:"",words:[],tokens:[],confidence_scores:{overall_log_prob:null,frame:null,frame_avg:null},metrics:_?{preprocess_ms:+S.toFixed(1),encode_ms:0,decode_ms:0,tokenize_ms:0,total_ms:+(performance.now()-w).toFixed(1),rtf:0}:null,is_final:!t?.incremental};const U=e?e.length/s:v*160/s,pe=new this.ort.Tensor("float32",b,[1,H,v]),Me=te??v,ae=new this.ort.Tensor("int64",BigInt64Array.from([BigInt(Me)]),[1]);let L;{const h=performance.now(),g=await this.encoderSession.run({audio_signal:pe,length:ae});T=performance.now()-h,L=g.outputs??Object.values(g)[0]}const[,C,R]=L.dims;let D;if(L.dims.length===3&&L.dims[0]===1&&L.dims[1]===C&&L.dims[2]===R){D=new Float32Array(R*C);const h=L.data,g=Math.min(64,C);for(let M=0;M<C;M+=g){const $=Math.min(M+g,C);for(let j=0;j<R;j++){const X=j*C;for(let N=M;N<$;N++)D[X+N]=h[N*R+j]}}}else console.warn("[Parakeet] Unexpected encoder output format:",L.dims),D=new Float32Array(L.data);(!this._encoderFrameBuffer||this._encoderFrameBuffer.length!==C)&&(this._encoderFrameBuffer=new Float32Array(C),this._encoderFrameTensor=new this.ort.Tensor("float32",this._encoderFrameBuffer,[1,C,1]));const W=[],ie=[],ce=[],Z=[];let Te=0;const we=[],_e=[],ge=[],de=this.subsampling*this.windowStride;let ke=0,Fe=k,Q=null;o&&(Q=this._restoreDecoderState(o),i&&console.log("[Parakeet] Restored decoder state from previous chunk"));let G=0;const J=t.incremental;if(J&&J.cacheKey){G=Math.max(0,Math.min(R,Math.floor(((J.prefixSeconds||0)+1e-6)/de)));const h=this._incrementalCache.get(J.cacheKey);h&&h.prefixFrames===G&&h.D===C&&(ke=G,Fe=k+G*de,Q=this._restoreDecoderState(h.state),i&&console.log(`[Parakeet] Incremental cache hit: skipping ${G}/${R} frames (${(G/R*100).toFixed(0)}%)`))}let he=0;const We=performance.now();let ve=ke>0||G===0;for(let h=ke;h<R;){const g=h*C;for(let y=0;y<C;y++)this._encoderFrameBuffer[y]=D[g+y];const M=W.length?W[W.length-1]:this.blankId,{tokenLogits:$,step:j,newState:X}=await this._runCombinedStep(this._encoderFrameTensor,M,Q);let N=-1/0,E=0;for(let y=0;y<$.length;y++){const K=$[y]/c;K>N&&(N=K,E=y)}let re=1,ue=0;if(r||x){let y=0;for(let K=0;K<$.length;K++)y+=Math.exp($[K]/c-N);re=1/y,ue=$[E]/c-N-Math.log(y),r&&(Z.push(re),Te+=Math.log(re))}if(E!==this.blankId){if(Q=X,W.push(E),u&&we.push(h),x&&_e.push(ue),A&&ge.push(j),n){const y=j>0?j:1,K=Fe+h*de,Ee=Fe+(h+y)*de;ie.push([K,Ee])}r&&ce.push(re),he+=1}if(j>0?(h+=j,he=0):(E===this.blankId||he>=this.maxTokensPerStep)&&(h+=l,he=0),J&&J.cacheKey&&!ve&&h>=G){const y=this._snapshotDecoderState(Q);this._incrementalCache.set(J.cacheKey,{state:y,prefixFrames:G,D:C}),ve=!0}}P=performance.now()-We;let Ie;Ie=performance.now();const Ae=this._normalizer(this.tokenizer.decode(W));if(z=performance.now()-Ie,!n&&!r){{const M=performance.now()-w,$=U/(M/1e3);console.log(`[Perf] RTF: ${$.toFixed(2)}x (audio ${U.toFixed(2)} s, time ${(M/1e3).toFixed(2)} s)`),console.table({Preprocess:`${S.toFixed(1)} ms`,Encode:`${T.toFixed(1)} ms`,Decode:`${P.toFixed(1)} ms`,Tokenize:`${z.toFixed(1)} ms`,Total:`${M.toFixed(1)} ms`})}const h=_?{preprocess_ms:+S.toFixed(1),encode_ms:+T.toFixed(1),decode_ms:+P.toFixed(1),tokenize_ms:+z.toFixed(1),total_ms:+(performance.now()-w).toFixed(1),rtf:+(U/((performance.now()-w)/1e3)).toFixed(2),preprocessor_backend:V,mel_cache:I?.cached?{cached_frames:I.cachedFrames,new_frames:I.newFrames}:null}:null,g={utterance_text:Ae,words:[],metrics:h,is_final:!o};return m&&(g.decoderState=this._snapshotDecoderState(Q)),d&&(g.tokenIds=W.slice()),u&&(g.frameIndices=we.slice()),x&&(g.logProbs=_e.slice()),A&&(g.tdtSteps=ge.slice()),g}const Y=[],le=[];let se="",be=0,fe=0,q=[];if(W.forEach((h,g)=>{const M=this.tokenizer.id2token[h];if(M===this.tokenizer.blankToken)return;const $=M.startsWith("▁"),j=$?M.slice(1):M,X=ie[g]||[null,null],N=ce[g],E={token:j,raw_token:M,is_word_start:$};if(n&&(E.start_time=+X[0].toFixed(3),E.end_time=+X[1].toFixed(3)),r&&(E.confidence=+N.toFixed(4)),le.push(E),$){if(se){const re=q.length?q.reduce((ue,y)=>ue+y,0)/q.length:0;Y.push({text:se,start_time:+be.toFixed(3),end_time:+fe.toFixed(3),confidence:+re.toFixed(4)})}se=j,n&&(be=X[0],fe=X[1]),q=r?[N]:[]}else se+=j,n&&(fe=X[1]),r&&q.push(N)}),se){const h=q.length?q.reduce((g,M)=>g+M,0)/q.length:0;Y.push({text:se,start_time:+be.toFixed(3),end_time:+fe.toFixed(3),confidence:+h.toFixed(4)})}const ze=Y.length&&r?Y.reduce((h,g)=>h+g.confidence,0)/Y.length:null,De=le.length&&r?le.reduce((h,g)=>h+(g.confidence||0),0)/le.length:null;{const h=performance.now()-w,g=U/(h/1e3);console.log(`[Perf] RTF: ${g.toFixed(2)}x (audio ${U.toFixed(2)} s, time ${(h/1e3).toFixed(2)} s)`),console.table({Preprocess:`${S.toFixed(1)} ms`,Encode:`${T.toFixed(1)} ms`,Decode:`${P.toFixed(1)} ms`,Tokenize:`${z.toFixed(1)} ms`,Total:`${h.toFixed(1)} ms`})}const ne={utterance_text:Ae,words:Y,tokens:le,confidence_scores:r?{token:ce.map(h=>+h.toFixed(4)),token_avg:+De?.toFixed(4),word:Y.map(h=>h.confidence),word_avg:+ze?.toFixed(4),frame:Z.map(h=>+h.toFixed(4)),frame_avg:Z.length?+(Z.reduce((h,g)=>h+g,0)/Z.length).toFixed(4):null,overall_log_prob:+Te.toFixed(6)}:{overall_log_prob:null,frame:null,frame_avg:null},metrics:_?{preprocess_ms:+S.toFixed(1),encode_ms:+T.toFixed(1),decode_ms:+P.toFixed(1),tokenize_ms:+z.toFixed(1),total_ms:+(performance.now()-w).toFixed(1),rtf:+(U/((performance.now()-w)/1e3)).toFixed(2),preprocessor_backend:V,mel_cache:I?.cached?{cached_frames:I.cachedFrames,new_frames:I.newFrames}:null}:null,is_final:!J&&!o};return m&&(ne.decoderState=this._snapshotDecoderState(Q)),d&&(ne.tokenIds=W.slice()),u&&(ne.frameIndices=we.slice()),x&&(ne.logProbs=_e.map(h=>+h.toFixed(6))),A&&(ne.tdtSteps=ge.slice()),ne}createStreamingTranscriber(e={}){return new Ke(this,e)}endProfiling(){try{this.encoderSession?.endProfiling()}catch{}try{this.joinerSession?.endProfiling()}catch{}const e=this.ort?.env?.wasm?.FS;if(!e)return console.warn("[Parakeet] Profiling FS not accessible"),null;const s=e.readdir("/tmp").filter(n=>n.startsWith("profile_")&&n.endsWith(".json"));if(!s.length)return console.warn("[Parakeet] No profiling files found. Was profiling enabled?"),null;const t={};for(const n of s)try{const r=e.readFile("/tmp/"+n,{encoding:"utf8"}),c=JSON.parse(r);let i=0,a=0;for(const l of c)if(l.cat==="Node"){const o=l.args?.provider;o==="webgpu"?i+=l.dur:o&&(a+=l.dur)}t[n]={gpu_us:i,cpu_us:a,total_us:i+a}}catch(r){console.warn("[Parakeet] Failed to parse profile file",n,r)}return console.table(t),t}}class Ke{constructor(e,s={}){this.model=e,this.opts={returnTimestamps:s.returnTimestamps??!0,returnConfidences:s.returnConfidences??!1,returnTokenIds:s.returnTokenIds??!1,sampleRate:s.sampleRate??16e3,debug:s.debug??!1},this._decoderState=null,this._currentOffset=0,this._totalWords=[],this._totalTokenIds=[],this._chunkCount=0,this._isFinalized=!1}async processChunk(e){if(this._isFinalized)throw new Error("Streamer is finalized. Create a new instance to process more audio.");const s=e.length/this.opts.sampleRate,t=await this.model.transcribe(e,this.opts.sampleRate,{returnTimestamps:this.opts.returnTimestamps,returnConfidences:this.opts.returnConfidences,returnTokenIds:this.opts.returnTokenIds,previousDecoderState:this._decoderState,returnDecoderState:!0,timeOffset:this._currentOffset});return this._decoderState=t.decoderState,this._currentOffset+=s,this._chunkCount++,t.words&&t.words.length>0&&this._totalWords.push(...t.words),this.opts.returnTokenIds&&t.tokenIds&&this._totalTokenIds.push(...t.tokenIds),this.opts.debug&&console.log(`[Streamer] Chunk ${this._chunkCount}: "${t.utterance_text}" (${t.words?.length||0} words, offset: ${this._currentOffset.toFixed(2)}s)`),{chunkText:t.utterance_text,chunkWords:t.words||[],text:this._totalWords.map(n=>n.text).join(" "),words:this._totalWords.slice(),totalDuration:this._currentOffset,chunkCount:this._chunkCount,is_final:!1,...this.opts.returnTokenIds?{tokenIds:this._totalTokenIds.slice()}:{},...this.opts.returnConfidences&&t.confidence_scores?{confidence_scores:t.confidence_scores}:{},metrics:t.metrics}}finalize(){return this._isFinalized=!0,{text:this._totalWords.map(e=>e.text).join(" "),words:this._totalWords.slice(),totalDuration:this._currentOffset,chunkCount:this._chunkCount,is_final:!0,...this.opts.returnTokenIds?{tokenIds:this._totalTokenIds.slice()}:{}}}reset(){this._decoderState=null,this._currentOffset=0,this._totalWords=[],this._totalTokenIds=[],this._chunkCount=0,this._isFinalized=!1}getState(){return{hasDecoderState:this._decoderState!==null,currentOffset:this._currentOffset,wordCount:this._totalWords.length,chunkCount:this._chunkCount,isFinalized:this._isFinalized}}}export{Re as ParakeetModel,Ke as StatefulStreamingTranscriber};
dist/assets/worker-BE5R_Ila.js ADDED
@@ -0,0 +1 @@
 
 
1
+ async function g(s,r={}){const{getParakeetModel:e}=await import("./hub-BlMT648A.js"),{ParakeetModel:t}=await import("./parakeet-xcg-VHSn.js"),{MODELS:a}=await import("./models-Dq2DCePq.js"),o=a[s]?.repoId||s,n=await e(o,r);return t.fromUrls({...n.urls,filenames:n.filenames,preprocessorBackend:n.preprocessorBackend,...r})}let i=null,c=!1;async function m(s="parakeet-tdt-0.6b-v3",r={}){if(c)return{status:"loading",message:"Model is already loading..."};if(i)return{status:"ready",message:"Model already loaded"};try{c=!0;const e=r.device==="webgpu"?"webgpu":"wasm";self.postMessage({status:"loading",message:`Downloading Parakeet ${s}... (~2.5GB, this may take 1-2 minutes)`}),console.log(`[Worker] Loading model with backend: ${e}`),i=await g(s,{backend:e});const t=i.session?.executionProviders?.[0]||e;console.log(`[Worker] Model loaded. Requested: ${e}, Actual provider: ${t}`),self.postMessage({status:"loading",message:"Model downloaded, warming up..."});const a=new Float32Array(16e3);return await i.transcribe(a,16e3),self.postMessage({status:"ready",message:`Parakeet ${s} loaded successfully!`,device:e,modelVersion:s}),{status:"ready",device:e}}catch(e){return console.error("Failed to load model:",e),self.postMessage({status:"error",message:`Failed to load model: ${e.message}`,error:e.toString()}),{status:"error",error:e.toString()}}finally{c=!1}}async function f(s,r=null){if(!i)throw new Error("Model not loaded. Call load() first.");try{const e=performance.now(),t=await i.transcribe(s,16e3,{returnTimestamps:!0,returnConfidences:!0,temperature:1}),o=(performance.now()-e)/1e3,n=s.length/16e3,u=o/n;console.log("[Worker] Parakeet words:",t.words?.length||0,"words"),t.words&&t.words.length>0&&console.log("[Worker] First 5 words:",t.words.slice(0,5).map(l=>`"${l.text}" (${l.start_time?.toFixed(1)}-${l.end_time?.toFixed(1)})`));const d=p(t.words||[]);return console.log("[Worker] Grouped into",d.length,"sentences"),{text:t.utterance_text||"",sentences:d,words:t.words||[],chunks:t.words||[],metadata:{latency:o,audioDuration:n,rtf:u,language:r,confidence:t.confidence_scores,metrics:t.metrics}}}catch(e){throw console.error("Transcription error:",e),e}}function p(s){if(!s||s.length===0)return[];const r=[];let e=[],t=s[0].start_time||0;for(let a=0;a<s.length;a++){const o=s[a];e.push(o.text),(/[.!?]$/.test(o.text)||a===s.length-1)&&(r.push({text:e.join(" ").trim(),start:t,end:o.end_time||o.start_time||0}),a<s.length-1&&(e=[],t=s[a+1].start_time||o.end_time||0))}return r}self.onmessage=async s=>{const{type:r,data:e}=s.data;try{switch(r){case"load":await m(e?.modelVersion,e?.options||{});break;case"transcribe":const t=await f(e.audio,e.language);self.postMessage({status:"transcription",result:t});break;case"ping":self.postMessage({status:"pong"});break;default:self.postMessage({status:"error",message:`Unknown message type: ${r}`})}}catch(t){self.postMessage({status:"error",message:t.message,error:t.toString()})}};
dist/index.html ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <link rel="icon" type="image/svg+xml" href="/vite.svg" />
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
7
+ <meta name="description" content="Real-time speech recognition with Parakeet STT and WebGPU acceleration. Progressive transcription demo." />
8
+ <title>Parakeet STT Progressive Transcription | WebGPU Demo</title>
9
+ <script type="module" crossorigin src="/assets/index-C6lwVqn6.js"></script>
10
+ <link rel="stylesheet" crossorigin href="/assets/index-BG0k6Qhd.css">
11
+ </head>
12
+ <body>
13
+ <div id="root"></div>
14
+ </body>
15
+ </html>
index.html ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <link rel="icon" type="image/svg+xml" href="/vite.svg" />
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
7
+ <meta name="description" content="Real-time speech recognition with Parakeet STT and WebGPU acceleration. Progressive transcription demo." />
8
+ <title>Parakeet STT Progressive Transcription | WebGPU Demo</title>
9
+ </head>
10
+ <body>
11
+ <div id="root"></div>
12
+ <script type="module" src="/src/main.jsx"></script>
13
+ </body>
14
+ </html>
package-lock.json ADDED
The diff for this file is too large to render. See raw diff
 
package.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "parakeet-web-demo",
3
+ "private": true,
4
+ "version": "0.1.0",
5
+ "type": "module",
6
+ "description": "Browser-based Parakeet STT demo with progressive transcription and WebGPU acceleration",
7
+ "scripts": {
8
+ "dev": "vite",
9
+ "build": "vite build",
10
+ "preview": "vite preview",
11
+ "lint": "eslint . --ext js,jsx --report-unused-disable-directives --max-warnings 0"
12
+ },
13
+ "dependencies": {
14
+ "@huggingface/transformers": "^3.7.1",
15
+ "onnxruntime-web": "^1.20.0",
16
+ "parakeet.js": "^1.1.2",
17
+ "react": "^18.2.0",
18
+ "react-dom": "^18.2.0"
19
+ },
20
+ "devDependencies": {
21
+ "@types/react": "^18.2.66",
22
+ "@types/react-dom": "^18.2.22",
23
+ "@vitejs/plugin-react": "^4.2.1",
24
+ "autoprefixer": "^10.4.19",
25
+ "eslint": "^8.57.0",
26
+ "eslint-plugin-react": "^7.34.1",
27
+ "eslint-plugin-react-hooks": "^4.6.0",
28
+ "eslint-plugin-react-refresh": "^0.4.6",
29
+ "postcss": "^8.4.38",
30
+ "tailwindcss": "^3.4.3",
31
+ "vite": "^6.2.0"
32
+ }
33
+ }
postcss.config.js ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ export default {
2
+ plugins: {
3
+ tailwindcss: {},
4
+ autoprefixer: {},
5
+ },
6
+ }
src/App.jsx ADDED
@@ -0,0 +1,371 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Main Application Component
3
+ *
4
+ * Parakeet STT Progressive Transcription Demo with WebGPU
5
+ */
6
+
7
+ import { useState, useEffect, useRef } from 'react';
8
+ import TranscriptionDisplay from './components/TranscriptionDisplay';
9
+ import PerformanceMetrics from './components/PerformanceMetrics';
10
+ import { AudioRecorder, AudioProcessor } from './utils/audio';
11
+ import { SmartProgressiveStreamingHandler } from './utils/progressive-streaming';
12
+
13
+ // Import worker
14
+ import WorkerUrl from './worker.js?worker&url';
15
+
16
+ function App() {
17
+ // Model state
18
+ const [modelStatus, setModelStatus] = useState('not_loaded');
19
+ const [modelMessage, setModelMessage] = useState('');
20
+ const [device, setDevice] = useState(null);
21
+
22
+ // Recording state
23
+ const [isRecording, setIsRecording] = useState(false);
24
+ const [fixedText, setFixedText] = useState('');
25
+ const [activeText, setActiveText] = useState('');
26
+ const [timestamp, setTimestamp] = useState(0);
27
+
28
+ // Performance metrics
29
+ const [latency, setLatency] = useState(null);
30
+ const [rtf, setRtf] = useState(null);
31
+ const [audioDuration, setAudioDuration] = useState(null);
32
+ const [windowState, setWindowState] = useState(null);
33
+
34
+ // Refs
35
+ const workerRef = useRef(null);
36
+ const recorderRef = useRef(null);
37
+ const audioProcessorRef = useRef(null);
38
+ const streamingHandlerRef = useRef(null);
39
+ const progressiveIntervalRef = useRef(null);
40
+
41
+ // Initialize worker
42
+ useEffect(() => {
43
+ workerRef.current = new Worker(WorkerUrl, { type: 'module' });
44
+
45
+ workerRef.current.onmessage = (event) => {
46
+ const { status, message, result, device: deviceType } = event.data;
47
+
48
+ if (status === 'loading') {
49
+ setModelStatus('loading');
50
+ setModelMessage(message);
51
+ } else if (status === 'ready') {
52
+ setModelStatus('ready');
53
+ setModelMessage(message);
54
+ setDevice(deviceType);
55
+ } else if (status === 'error') {
56
+ setModelStatus('error');
57
+ setModelMessage(message);
58
+ console.error('Worker error:', event.data);
59
+ } else if (status === 'transcription' && result) {
60
+ // Update performance metrics
61
+ if (result.metadata) {
62
+ setLatency(result.metadata.latency);
63
+ setRtf(result.metadata.rtf);
64
+ setAudioDuration(result.metadata.audioDuration);
65
+ }
66
+ }
67
+ };
68
+
69
+ return () => {
70
+ if (workerRef.current) {
71
+ workerRef.current.terminate();
72
+ }
73
+ };
74
+ }, []);
75
+
76
+ const loadModel = async () => {
77
+ if (modelStatus === 'loading' || modelStatus === 'ready') return;
78
+
79
+ setModelStatus('loading');
80
+ setModelMessage('Initializing model...');
81
+
82
+ workerRef.current.postMessage({
83
+ type: 'load',
84
+ data: {
85
+ modelVersion: "parakeet-tdt-0.6b-v3", // Multilingual Parakeet
86
+ options: {
87
+ device: 'wasm', // Use WASM to enable INT8 quantization (670MB vs 2.5GB)
88
+ // INT8 is default for WASM - no need to specify encoderQuant/decoderQuant
89
+ },
90
+ },
91
+ });
92
+ };
93
+
94
+ const startRecording = async () => {
95
+ if (modelStatus !== 'ready') {
96
+ alert('Please load the model first');
97
+ return;
98
+ }
99
+
100
+ try {
101
+ // Reset state
102
+ setFixedText('');
103
+ setActiveText('');
104
+ setTimestamp(0);
105
+ setLatency(null);
106
+ setRtf(null);
107
+ setAudioDuration(null);
108
+
109
+ // Initialize audio processor
110
+ audioProcessorRef.current = new AudioProcessor();
111
+
112
+ // Create model wrapper for progressive streaming
113
+ const modelWrapper = {
114
+ transcribe: async (audio) => {
115
+ return new Promise((resolve) => {
116
+ const messageHandler = (event) => {
117
+ if (event.data.status === 'transcription') {
118
+ workerRef.current.removeEventListener('message', messageHandler);
119
+ resolve(event.data.result);
120
+ }
121
+ };
122
+
123
+ workerRef.current.addEventListener('message', messageHandler);
124
+ workerRef.current.postMessage({
125
+ type: 'transcribe',
126
+ data: { audio },
127
+ });
128
+ });
129
+ },
130
+ };
131
+
132
+ // Initialize progressive streaming handler
133
+ streamingHandlerRef.current = new SmartProgressiveStreamingHandler(modelWrapper, {
134
+ emissionInterval: 0.25, // 250ms
135
+ maxWindowSize: 15.0,
136
+ sentenceBuffer: 2.0,
137
+ });
138
+
139
+ // Start recording with callback for audio chunks
140
+ recorderRef.current = new AudioRecorder((audioChunk) => {
141
+ // Append PCM audio chunk directly (Float32Array)
142
+ console.log('Audio chunk received:', audioChunk.length, 'samples (~' + (audioChunk.length / 16000 * 1000).toFixed(0) + 'ms)');
143
+ audioProcessorRef.current.appendChunk(audioChunk);
144
+ console.log('Total buffer:', audioProcessorRef.current.getBuffer().length, 'samples');
145
+ });
146
+
147
+ await recorderRef.current.start();
148
+ setIsRecording(true);
149
+
150
+ // Start progressive transcription updates
151
+ let transcriptionInProgress = false;
152
+ progressiveIntervalRef.current = setInterval(async () => {
153
+ // Stop if recording stopped
154
+ if (!recorderRef.current || !recorderRef.current.isRecording) {
155
+ if (progressiveIntervalRef.current) {
156
+ clearInterval(progressiveIntervalRef.current);
157
+ progressiveIntervalRef.current = null;
158
+ }
159
+ return;
160
+ }
161
+
162
+ const audioBuffer = audioProcessorRef.current.getBuffer();
163
+ const duration = audioBuffer.length / 16000;
164
+
165
+ // Update timestamp even if not transcribing yet
166
+ setTimestamp(duration);
167
+
168
+ // Skip if previous transcription still in progress (matches Python MLX lock behavior)
169
+ if (transcriptionInProgress) {
170
+ console.debug('Skipping progressive update (previous transcription still running)');
171
+ return;
172
+ }
173
+
174
+ // Only transcribe if we have enough audio (at least 1 second)
175
+ if (audioBuffer.length >= 16000) {
176
+ try {
177
+ transcriptionInProgress = true;
178
+ const result = await streamingHandlerRef.current.transcribeIncremental(audioBuffer);
179
+
180
+ setFixedText(result.fixedText);
181
+ setActiveText(result.activeText);
182
+
183
+ // Update window state
184
+ setWindowState(duration >= 15 ? 'sliding' : 'growing');
185
+ } catch (error) {
186
+ console.error('Progressive transcription error:', error);
187
+ // Show error in UI
188
+ setActiveText(`Error: ${error.message}`);
189
+ } finally {
190
+ transcriptionInProgress = false;
191
+ }
192
+ } else {
193
+ // Not enough audio yet
194
+ setWindowState('growing');
195
+ }
196
+ }, 250); // 250ms updates
197
+ } catch (error) {
198
+ console.error('Failed to start recording:', error);
199
+ alert('Failed to start recording: ' + error.message);
200
+ setIsRecording(false);
201
+ }
202
+ };
203
+
204
+ const stopRecording = async () => {
205
+ if (!isRecording) return;
206
+
207
+ // Stop progressive updates
208
+ if (progressiveIntervalRef.current) {
209
+ clearInterval(progressiveIntervalRef.current);
210
+ progressiveIntervalRef.current = null;
211
+ }
212
+
213
+ // Stop recorder
214
+ if (recorderRef.current) {
215
+ try {
216
+ await recorderRef.current.stop();
217
+
218
+ // Final transcription
219
+ const audioBuffer = audioProcessorRef.current.getBuffer();
220
+ if (audioBuffer.length > 0 && streamingHandlerRef.current) {
221
+ const finalText = await streamingHandlerRef.current.finalize(audioBuffer);
222
+ setFixedText(finalText);
223
+ setActiveText('');
224
+ }
225
+ } catch (error) {
226
+ console.error('Error stopping recording:', error);
227
+ }
228
+ }
229
+
230
+ setIsRecording(false);
231
+ setWindowState(null);
232
+ };
233
+
234
+ return (
235
+ <div className="min-h-screen bg-gradient-to-b from-gray-950 to-gray-900 text-white">
236
+ {/* Header */}
237
+ <header className="border-b border-gray-800 bg-gray-950/50 backdrop-blur">
238
+ <div className="max-w-6xl mx-auto px-6 py-6">
239
+ <h1 className="text-3xl font-bold bg-gradient-to-r from-cyan-400 to-blue-500 bg-clip-text text-transparent">
240
+ 🎤 Parakeet STT Progressive Transcription
241
+ </h1>
242
+ <p className="text-gray-400 mt-2">
243
+ Real-time speech recognition with smart progressive streaming • WebGPU accelerated
244
+ </p>
245
+ </div>
246
+ </header>
247
+
248
+ {/* Main Content */}
249
+ <main className="max-w-6xl mx-auto px-6 py-8 space-y-8">
250
+ {/* Model Status */}
251
+ <div className="bg-gray-900 rounded-lg border border-gray-700 p-6">
252
+ <div className="flex items-center justify-between">
253
+ <div>
254
+ <h2 className="text-lg font-semibold mb-2">Model Status</h2>
255
+ <p className="text-sm text-gray-400">{modelMessage || 'Ready to load model'}</p>
256
+ </div>
257
+ <div>
258
+ {modelStatus === 'not_loaded' && (
259
+ <button
260
+ onClick={loadModel}
261
+ className="px-6 py-3 bg-gradient-to-r from-cyan-500 to-blue-500 hover:from-cyan-600 hover:to-blue-600 rounded-lg font-semibold transition-all duration-200 shadow-lg hover:shadow-xl"
262
+ >
263
+ Load Model (~2.5GB)
264
+ </button>
265
+ )}
266
+ {modelStatus === 'loading' && (
267
+ <div className="px-6 py-3 bg-gray-700 rounded-lg font-semibold flex items-center gap-3">
268
+ <div className="w-5 h-5 border-2 border-cyan-400 border-t-transparent rounded-full animate-spin"></div>
269
+ Loading...
270
+ </div>
271
+ )}
272
+ {modelStatus === 'ready' && (
273
+ <div className="flex items-center gap-4">
274
+ <div className="px-4 py-2 bg-green-900/30 border border-green-700 rounded-lg text-green-400 text-sm font-semibold">
275
+ ✓ Ready
276
+ </div>
277
+ {!isRecording ? (
278
+ <button
279
+ onClick={startRecording}
280
+ className="px-6 py-3 bg-gradient-to-r from-green-500 to-emerald-500 hover:from-green-600 hover:to-emerald-600 rounded-lg font-semibold transition-all duration-200 shadow-lg hover:shadow-xl"
281
+ >
282
+ Start Recording
283
+ </button>
284
+ ) : (
285
+ <button
286
+ onClick={stopRecording}
287
+ className="px-6 py-3 bg-gradient-to-r from-red-500 to-pink-500 hover:from-red-600 hover:to-pink-600 rounded-lg font-semibold transition-all duration-200 shadow-lg hover:shadow-xl"
288
+ >
289
+ Stop Recording
290
+ </button>
291
+ )}
292
+ </div>
293
+ )}
294
+ {modelStatus === 'error' && (
295
+ <button
296
+ onClick={loadModel}
297
+ className="px-6 py-3 bg-red-900/30 border border-red-700 hover:bg-red-900/50 rounded-lg font-semibold transition-all duration-200"
298
+ >
299
+ Retry
300
+ </button>
301
+ )}
302
+ </div>
303
+ </div>
304
+ </div>
305
+
306
+ {/* Transcription Display */}
307
+ <TranscriptionDisplay
308
+ fixedText={fixedText}
309
+ activeText={activeText}
310
+ timestamp={timestamp}
311
+ isRecording={isRecording}
312
+ />
313
+
314
+ {/* Performance Metrics */}
315
+ <PerformanceMetrics
316
+ latency={latency}
317
+ rtf={rtf}
318
+ audioDuration={audioDuration}
319
+ windowState={windowState}
320
+ device={device}
321
+ updateInterval={250}
322
+ />
323
+
324
+ {/* Info Section */}
325
+ <div className="bg-gray-900 rounded-lg border border-gray-700 p-6">
326
+ <h2 className="text-lg font-semibold mb-3">About This Demo</h2>
327
+ <div className="space-y-3 text-sm text-gray-400">
328
+ <p>
329
+ <strong className="text-gray-200">Smart Progressive Streaming:</strong> This demo showcases
330
+ advanced real-time transcription with intelligent window management. As you speak:
331
+ </p>
332
+ <ul className="list-disc list-inside space-y-1 ml-4">
333
+ <li><strong>0-15s:</strong> Growing window accumulates audio for better accuracy</li>
334
+ <li><strong>&gt;15s:</strong> Sentence-aware sliding window locks completed sentences</li>
335
+ <li><strong>Updates:</strong> Partial transcriptions every 250ms for real-time feedback</li>
336
+ </ul>
337
+ <p>
338
+ <strong className="text-gray-200">Privacy:</strong> All processing happens locally in your
339
+ browser using WebGPU acceleration. No data is sent to servers.
340
+ </p>
341
+ <p className="text-xs">
342
+ Model: Parakeet TDT 0.6B v3 (ONNX via parakeet.js) • 25 languages supported • ~2.5GB download (cached locally)
343
+ <br/>
344
+ <strong className="text-yellow-400">Note:</strong> Using Whisper to demonstrate progressive streaming algorithm.
345
+ Parakeet requires custom preprocessing (coming soon!)
346
+ </p>
347
+ </div>
348
+ </div>
349
+ </main>
350
+
351
+ {/* Footer */}
352
+ <footer className="border-t border-gray-800 mt-12 py-6">
353
+ <div className="max-w-6xl mx-auto px-6 text-center text-sm text-gray-500">
354
+ <p>
355
+ Built with Transformers.js, ONNX Runtime Web, React, and Vite •{' '}
356
+ <a
357
+ href="https://github.com/huggingface/speech-to-speech"
358
+ className="text-cyan-400 hover:text-cyan-300"
359
+ target="_blank"
360
+ rel="noopener noreferrer"
361
+ >
362
+ View Source
363
+ </a>
364
+ </p>
365
+ </div>
366
+ </footer>
367
+ </div>
368
+ );
369
+ }
370
+
371
+ export default App;
src/components/PerformanceMetrics.jsx ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Performance Metrics Component
3
+ *
4
+ * Developer-focused dashboard showing:
5
+ * - Real-time inference speed (tokens/sec, RTF)
6
+ * - Progressive update latency
7
+ * - Window state (growing vs sliding)
8
+ * - Memory usage
9
+ */
10
+
11
+ import { useState, useEffect } from 'react';
12
+
13
+ export default function PerformanceMetrics({
14
+ latency,
15
+ rtf,
16
+ audioDuration,
17
+ windowState,
18
+ device,
19
+ updateInterval,
20
+ }) {
21
+ const [memoryUsage, setMemoryUsage] = useState(null);
22
+
23
+ useEffect(() => {
24
+ // Monitor memory usage if available
25
+ if (performance.memory) {
26
+ const interval = setInterval(() => {
27
+ const memory = performance.memory;
28
+ setMemoryUsage({
29
+ used: (memory.usedJSHeapSize / 1024 / 1024).toFixed(1),
30
+ total: (memory.totalJSHeapSize / 1024 / 1024).toFixed(1),
31
+ limit: (memory.jsHeapSizeLimit / 1024 / 1024).toFixed(1),
32
+ });
33
+ }, 1000);
34
+
35
+ return () => clearInterval(interval);
36
+ }
37
+ }, []);
38
+
39
+ const MetricCard = ({ label, value, unit, color = 'gray' }) => (
40
+ <div className="bg-gray-800 rounded-lg p-4 border border-gray-700">
41
+ <div className="text-xs text-gray-400 uppercase tracking-wider mb-1">
42
+ {label}
43
+ </div>
44
+ <div className={`text-2xl font-bold text-${color}-400 font-mono`}>
45
+ {value !== null && value !== undefined ? value : '—'}
46
+ {unit && <span className="text-sm ml-1 text-gray-500">{unit}</span>}
47
+ </div>
48
+ </div>
49
+ );
50
+
51
+ const getRTFColor = (rtf) => {
52
+ if (rtf === null || rtf === undefined) return 'gray';
53
+ if (rtf < 0.3) return 'green';
54
+ if (rtf < 0.7) return 'yellow';
55
+ return 'red';
56
+ };
57
+
58
+ const getWindowStateIcon = (state) => {
59
+ if (state === 'growing') return '📈';
60
+ if (state === 'sliding') return '↔️';
61
+ return '⏸️';
62
+ };
63
+
64
+ return (
65
+ <div className="w-full max-w-4xl mx-auto mt-6">
66
+ <div className="bg-gray-900 rounded-lg border border-gray-700 p-6 shadow-xl">
67
+ <h2 className="text-xl font-semibold text-gray-100 mb-4 flex items-center gap-2">
68
+ <span>📊</span> Performance Metrics
69
+ </h2>
70
+
71
+ {/* Metrics Grid */}
72
+ <div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-4">
73
+ <MetricCard
74
+ label="Latency"
75
+ value={latency ? latency.toFixed(2) : null}
76
+ unit="s"
77
+ color="cyan"
78
+ />
79
+ <MetricCard
80
+ label="Real-time Factor"
81
+ value={rtf ? rtf.toFixed(2) : null}
82
+ unit="x"
83
+ color={getRTFColor(rtf)}
84
+ />
85
+ <MetricCard
86
+ label="Audio Duration"
87
+ value={audioDuration ? audioDuration.toFixed(1) : null}
88
+ unit="s"
89
+ color="blue"
90
+ />
91
+ <MetricCard
92
+ label="Update Rate"
93
+ value={updateInterval ? (1000 / updateInterval).toFixed(1) : null}
94
+ unit="Hz"
95
+ color="purple"
96
+ />
97
+ </div>
98
+
99
+ {/* Additional Info */}
100
+ <div className="grid grid-cols-1 md:grid-cols-3 gap-4">
101
+ {/* Window State */}
102
+ <div className="bg-gray-800 rounded-lg p-4 border border-gray-700">
103
+ <div className="text-xs text-gray-400 uppercase tracking-wider mb-1">
104
+ Window State
105
+ </div>
106
+ <div className="text-lg font-semibold text-gray-200">
107
+ {getWindowStateIcon(windowState)} {windowState || 'idle'}
108
+ </div>
109
+ <div className="text-xs text-gray-500 mt-1">
110
+ {windowState === 'growing' && 'Building context (0-15s)'}
111
+ {windowState === 'sliding' && 'Sentence-aware sliding (>15s)'}
112
+ {!windowState && 'Not recording'}
113
+ </div>
114
+ </div>
115
+
116
+ {/* Device */}
117
+ <div className="bg-gray-800 rounded-lg p-4 border border-gray-700">
118
+ <div className="text-xs text-gray-400 uppercase tracking-wider mb-1">
119
+ Acceleration
120
+ </div>
121
+ <div className="text-lg font-semibold text-gray-200">
122
+ {device === 'webgpu' && '🚀 WebGPU'}
123
+ {device === 'wasm' && '⚙️ WebAssembly'}
124
+ {device === 'cpu' && '🖥️ CPU'}
125
+ {!device && '—'}
126
+ </div>
127
+ <div className="text-xs text-gray-500 mt-1">
128
+ {device === 'webgpu' && 'Hardware accelerated'}
129
+ {device === 'wasm' && 'Software optimized'}
130
+ {device === 'cpu' && 'Fallback mode'}
131
+ </div>
132
+ </div>
133
+
134
+ {/* Memory (if available) */}
135
+ {memoryUsage && (
136
+ <div className="bg-gray-800 rounded-lg p-4 border border-gray-700">
137
+ <div className="text-xs text-gray-400 uppercase tracking-wider mb-1">
138
+ Memory Usage
139
+ </div>
140
+ <div className="text-lg font-semibold text-gray-200">
141
+ {memoryUsage.used} MB
142
+ </div>
143
+ <div className="text-xs text-gray-500 mt-1">
144
+ of {memoryUsage.total} MB allocated
145
+ </div>
146
+ </div>
147
+ )}
148
+ </div>
149
+
150
+ {/* RTF Explanation */}
151
+ {rtf !== null && rtf !== undefined && (
152
+ <div className="mt-4 p-3 bg-gray-800 border border-gray-700 rounded text-xs text-gray-400">
153
+ <strong>Real-time Factor (RTF):</strong> Ratio of processing time to audio duration.
154
+ {rtf < 1 && ' ✓ Faster than real-time'}
155
+ {rtf >= 1 && ' ⚠️ Slower than real-time'}
156
+ {' (Lower is better)'}
157
+ </div>
158
+ )}
159
+ </div>
160
+
161
+ {/* Technical Info */}
162
+ <div className="mt-4 text-xs text-gray-500 text-center space-y-1">
163
+ <p>Model: Parakeet TDT 0.6B v3 (ONNX) | Sample Rate: 16kHz</p>
164
+ <p>Progressive updates every 250ms | Smart window management (15s max)</p>
165
+ </div>
166
+ </div>
167
+ );
168
+ }
src/components/TranscriptionDisplay.jsx ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Transcription Display Component
3
+ *
4
+ * Shows progressive transcription with:
5
+ * - Yellow text for fixed sentences (completed, won't change)
6
+ * - Cyan dim text for active transcription (in-progress)
7
+ */
8
+
9
+ import { useEffect, useRef } from 'react';
10
+
11
+ export default function TranscriptionDisplay({ fixedText, activeText, timestamp, isRecording }) {
12
+ const containerRef = useRef(null);
13
+
14
+ // Auto-scroll to bottom when new text appears
15
+ useEffect(() => {
16
+ if (containerRef.current) {
17
+ containerRef.current.scrollTop = containerRef.current.scrollHeight;
18
+ }
19
+ }, [fixedText, activeText]);
20
+
21
+ const formatTimestamp = (seconds) => {
22
+ const mins = Math.floor(seconds / 60);
23
+ const secs = (seconds % 60).toFixed(1);
24
+ return `${mins}:${secs.padStart(4, '0')}`;
25
+ };
26
+
27
+ return (
28
+ <div className="w-full max-w-4xl mx-auto">
29
+ <div className="bg-gray-900 rounded-lg border border-gray-700 p-6 shadow-xl">
30
+ {/* Header */}
31
+ <div className="flex items-center justify-between mb-4 pb-4 border-b border-gray-700">
32
+ <h2 className="text-xl font-semibold text-gray-100">
33
+ Live Transcription
34
+ </h2>
35
+ <div className="flex items-center gap-4">
36
+ {isRecording && (
37
+ <div className="flex items-center gap-2">
38
+ <div className="w-3 h-3 bg-red-500 rounded-full animate-pulse"></div>
39
+ <span className="text-sm text-gray-300">Recording</span>
40
+ </div>
41
+ )}
42
+ {timestamp > 0 && (
43
+ <span className="text-sm text-gray-400 font-mono">
44
+ {formatTimestamp(timestamp)}
45
+ </span>
46
+ )}
47
+ </div>
48
+ </div>
49
+
50
+ {/* Transcription Text */}
51
+ <div
52
+ ref={containerRef}
53
+ className="min-h-[200px] max-h-[400px] overflow-y-auto font-sans text-lg leading-relaxed"
54
+ >
55
+ {!fixedText && !activeText && !isRecording && (
56
+ <p className="text-gray-500 italic">
57
+ Click "Start Recording" to begin transcription...
58
+ </p>
59
+ )}
60
+
61
+ {!fixedText && !activeText && isRecording && (
62
+ <p className="text-gray-500 italic animate-pulse">
63
+ Listening...
64
+ </p>
65
+ )}
66
+
67
+ {/* Fixed text (yellow) - sentences that won't change */}
68
+ {fixedText && (
69
+ <span className="text-yellow-400 font-medium">
70
+ {fixedText}
71
+ </span>
72
+ )}
73
+
74
+ {/* Space between fixed and active */}
75
+ {fixedText && activeText && ' '}
76
+
77
+ {/* Active text (cyan dim) - current partial transcription */}
78
+ {activeText && (
79
+ <span className="text-cyan-400 opacity-80">
80
+ {activeText}
81
+ </span>
82
+ )}
83
+ </div>
84
+
85
+ {/* Legend */}
86
+ <div className="mt-4 pt-4 border-t border-gray-700 flex gap-6 text-sm">
87
+ <div className="flex items-center gap-2">
88
+ <div className="w-4 h-4 bg-yellow-400 rounded"></div>
89
+ <span className="text-gray-300">Fixed sentences</span>
90
+ </div>
91
+ <div className="flex items-center gap-2">
92
+ <div className="w-4 h-4 bg-cyan-400 opacity-80 rounded"></div>
93
+ <span className="text-gray-300">Active transcription</span>
94
+ </div>
95
+ </div>
96
+ </div>
97
+
98
+ {/* Technical Details */}
99
+ <div className="mt-4 text-xs text-gray-500 text-center">
100
+ <p>
101
+ Smart progressive streaming: Growing window (0-15s) → Sentence-aware sliding (&gt;15s)
102
+ </p>
103
+ </div>
104
+ </div>
105
+ );
106
+ }
src/index.css ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @tailwind base;
2
+ @tailwind components;
3
+ @tailwind utilities;
4
+
5
+ :root {
6
+ font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif;
7
+ line-height: 1.5;
8
+ font-weight: 400;
9
+
10
+ color-scheme: dark;
11
+ color: rgba(255, 255, 255, 0.87);
12
+ background-color: #0a0a0a;
13
+
14
+ font-synthesis: none;
15
+ text-rendering: optimizeLegibility;
16
+ -webkit-font-smoothing: antialiased;
17
+ -moz-osx-font-smoothing: grayscale;
18
+ }
19
+
20
+ body {
21
+ margin: 0;
22
+ min-width: 320px;
23
+ min-height: 100vh;
24
+ }
25
+
26
+ /* Custom scrollbar */
27
+ ::-webkit-scrollbar {
28
+ width: 8px;
29
+ height: 8px;
30
+ }
31
+
32
+ ::-webkit-scrollbar-track {
33
+ background: #1a1a1a;
34
+ }
35
+
36
+ ::-webkit-scrollbar-thumb {
37
+ background: #444;
38
+ border-radius: 4px;
39
+ }
40
+
41
+ ::-webkit-scrollbar-thumb:hover {
42
+ background: #555;
43
+ }
src/main.jsx ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ import React from 'react';
2
+ import ReactDOM from 'react-dom/client';
3
+ import App from './App.jsx';
4
+ import './index.css';
5
+
6
+ ReactDOM.createRoot(document.getElementById('root')).render(
7
+ <React.StrictMode>
8
+ <App />
9
+ </React.StrictMode>,
10
+ );
src/utils/audio.js ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Audio capture and processing utilities
3
+ *
4
+ * Uses Web Audio API with ScriptProcessorNode for real-time PCM audio capture
5
+ */
6
+
7
+ const WHISPER_SAMPLING_RATE = 16000;
8
+
9
+ export class AudioRecorder {
10
+ constructor(onDataAvailable) {
11
+ this.onDataAvailable = onDataAvailable;
12
+ this.audioContext = null;
13
+ this.stream = null;
14
+ this.source = null;
15
+ this.processor = null;
16
+ this.isRecording = false;
17
+ this.audioChunks = [];
18
+ }
19
+
20
+ async start() {
21
+ /**
22
+ * Start recording audio from microphone using Web Audio API
23
+ */
24
+ try {
25
+ // Request microphone access
26
+ this.stream = await navigator.mediaDevices.getUserMedia({
27
+ audio: {
28
+ channelCount: 1,
29
+ sampleRate: WHISPER_SAMPLING_RATE,
30
+ echoCancellation: true,
31
+ noiseSuppression: true,
32
+ }
33
+ });
34
+
35
+ // Create AudioContext with 16kHz sample rate
36
+ this.audioContext = new AudioContext({ sampleRate: WHISPER_SAMPLING_RATE });
37
+
38
+ // Create source from stream
39
+ this.source = this.audioContext.createMediaStreamSource(this.stream);
40
+
41
+ // Create ScriptProcessorNode (deprecated but works everywhere)
42
+ // 4096 samples = ~256ms at 16kHz
43
+ const bufferSize = 4096;
44
+ this.processor = this.audioContext.createScriptProcessor(bufferSize, 1, 1);
45
+
46
+ this.processor.onaudioprocess = (event) => {
47
+ if (!this.isRecording) return;
48
+
49
+ const inputData = event.inputBuffer.getChannelData(0);
50
+ // Copy the data (important! buffer is reused)
51
+ const audioChunk = new Float32Array(inputData);
52
+
53
+ this.audioChunks.push(audioChunk);
54
+
55
+ if (this.onDataAvailable) {
56
+ this.onDataAvailable(audioChunk);
57
+ }
58
+ };
59
+
60
+ // Connect: source -> processor -> destination
61
+ this.source.connect(this.processor);
62
+ this.processor.connect(this.audioContext.destination);
63
+
64
+ this.isRecording = true;
65
+ console.log('Recording started with ScriptProcessorNode');
66
+
67
+ return true;
68
+ } catch (error) {
69
+ console.error('Failed to start recording:', error);
70
+ throw error;
71
+ }
72
+ }
73
+
74
+ requestData() {
75
+ /**
76
+ * No-op for ScriptProcessor (data comes automatically)
77
+ */
78
+ // Data is emitted automatically via onaudioprocess
79
+ }
80
+
81
+ async stop() {
82
+ /**
83
+ * Stop recording and return complete audio as Float32Array
84
+ */
85
+ return new Promise((resolve) => {
86
+ this.isRecording = false;
87
+
88
+ // Disconnect nodes
89
+ if (this.processor) {
90
+ this.processor.disconnect();
91
+ this.processor = null;
92
+ }
93
+
94
+ if (this.source) {
95
+ this.source.disconnect();
96
+ this.source = null;
97
+ }
98
+
99
+ // Concatenate all chunks
100
+ let totalLength = 0;
101
+ for (const chunk of this.audioChunks) {
102
+ totalLength += chunk.length;
103
+ }
104
+
105
+ const completeAudio = new Float32Array(totalLength);
106
+ let offset = 0;
107
+ for (const chunk of this.audioChunks) {
108
+ completeAudio.set(chunk, offset);
109
+ offset += chunk.length;
110
+ }
111
+
112
+ // Clean up
113
+ this.cleanup();
114
+
115
+ resolve(completeAudio);
116
+ });
117
+ }
118
+
119
+ cleanup() {
120
+ /**
121
+ * Clean up resources
122
+ */
123
+ if (this.stream) {
124
+ this.stream.getTracks().forEach(track => track.stop());
125
+ this.stream = null;
126
+ }
127
+
128
+ if (this.audioContext && this.audioContext.state !== 'closed') {
129
+ this.audioContext.close();
130
+ this.audioContext = null;
131
+ }
132
+
133
+ this.audioChunks = [];
134
+ this.isRecording = false;
135
+ }
136
+ }
137
+
138
+ export class AudioProcessor {
139
+ /**
140
+ * Process audio chunks for real-time transcription
141
+ */
142
+ constructor(sampleRate = WHISPER_SAMPLING_RATE) {
143
+ this.sampleRate = sampleRate;
144
+ this.audioBuffer = new Float32Array(0);
145
+ }
146
+
147
+ appendChunk(chunk) {
148
+ /**
149
+ * Append new audio chunk to buffer
150
+ */
151
+ const newBuffer = new Float32Array(this.audioBuffer.length + chunk.length);
152
+ newBuffer.set(this.audioBuffer);
153
+ newBuffer.set(chunk, this.audioBuffer.length);
154
+ this.audioBuffer = newBuffer;
155
+ }
156
+
157
+ getBuffer() {
158
+ /**
159
+ * Get current audio buffer
160
+ */
161
+ return this.audioBuffer;
162
+ }
163
+
164
+ getDuration() {
165
+ /**
166
+ * Get current buffer duration in seconds
167
+ */
168
+ return this.audioBuffer.length / this.sampleRate;
169
+ }
170
+
171
+ reset() {
172
+ /**
173
+ * Clear audio buffer
174
+ */
175
+ this.audioBuffer = new Float32Array(0);
176
+ }
177
+
178
+ trimToSize(maxDuration) {
179
+ /**
180
+ * Trim buffer to maximum duration (in seconds)
181
+ */
182
+ const maxSamples = Math.floor(maxDuration * this.sampleRate);
183
+ if (this.audioBuffer.length > maxSamples) {
184
+ this.audioBuffer = this.audioBuffer.slice(-maxSamples);
185
+ }
186
+ }
187
+ }
188
+
189
+ export { WHISPER_SAMPLING_RATE };
src/utils/progressive-streaming.js ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Smart Progressive Streaming Handler
3
+ *
4
+ * JavaScript port of STT/smart_progressive_streaming.py
5
+ *
6
+ * Provides frequent partial transcriptions (every 250ms) with:
7
+ * - Growing window up to 15s for accuracy
8
+ * - Sentence-boundary-aware window sliding for audio > 15s
9
+ * - Fixed sentences + active transcription
10
+ */
11
+
12
+ export class PartialTranscription {
13
+ constructor(fixedText, activeText, timestamp, isFinal) {
14
+ this.fixedText = fixedText; // Sentences that won't change
15
+ this.activeText = activeText; // Current partial transcription
16
+ this.timestamp = timestamp; // Current position in audio
17
+ this.isFinal = isFinal; // True if this is the last update
18
+ }
19
+ }
20
+
21
+ export class SmartProgressiveStreamingHandler {
22
+ /**
23
+ * Smart progressive streaming with sentence-aware window management.
24
+ *
25
+ * Strategy:
26
+ * 1. Emit partial transcriptions every 250ms
27
+ * 2. Use growing window (up to 15s) for better accuracy
28
+ * 3. When audio > 15s, slide window using sentence boundaries:
29
+ * - Keep completed sentences as "fixed"
30
+ * - Only re-transcribe the "active" portion
31
+ */
32
+ constructor(model, options = {}) {
33
+ this.model = model;
34
+ this.emissionInterval = options.emissionInterval || 0.25; // 250ms
35
+ this.maxWindowSize = options.maxWindowSize || 15.0; // 15 seconds
36
+ this.sentenceBuffer = options.sentenceBuffer || 2.0; // 2 seconds
37
+ this.sampleRate = options.sampleRate || 16000;
38
+
39
+ // State for incremental streaming
40
+ this.reset();
41
+ }
42
+
43
+ reset() {
44
+ /**
45
+ * Reset state for new streaming session.
46
+ */
47
+ this.fixedSentences = [];
48
+ this.fixedEndTime = 0.0;
49
+ this.lastTranscribedLength = 0;
50
+ }
51
+
52
+ async transcribeIncremental(audio) {
53
+ /**
54
+ * Transcribe audio incrementally (for live streaming).
55
+ *
56
+ * Call this repeatedly with growing audio buffer (Float32Array).
57
+ * Returns a single PartialTranscription for current state.
58
+ *
59
+ * @param {Float32Array} audio - Growing audio buffer
60
+ * @returns {Promise<PartialTranscription>}
61
+ */
62
+
63
+ // Skip if not enough new audio
64
+ const currentLength = audio.length;
65
+ if (currentLength < this.sampleRate * 0.5) { // Need at least 500ms
66
+ return new PartialTranscription(
67
+ this.fixedSentences.join(" "),
68
+ "",
69
+ currentLength / this.sampleRate,
70
+ false
71
+ );
72
+ }
73
+
74
+ // Skip if no new audio since last transcription
75
+ if (currentLength === this.lastTranscribedLength) {
76
+ return new PartialTranscription(
77
+ this.fixedSentences.join(" "),
78
+ "",
79
+ currentLength / this.sampleRate,
80
+ false
81
+ );
82
+ }
83
+
84
+ this.lastTranscribedLength = currentLength;
85
+
86
+ // Extract window for transcription (from last fixed sentence to end)
87
+ const windowStartSamples = Math.floor(this.fixedEndTime * this.sampleRate);
88
+ const audioWindow = audio.slice(windowStartSamples);
89
+
90
+ // Check if window exceeds max_window_size
91
+ const windowDuration = audioWindow.length / this.sampleRate;
92
+
93
+ // Transcribe current window
94
+ let result = await this.model.transcribe(audioWindow);
95
+
96
+ console.log('[Progressive] Window duration:', windowDuration.toFixed(2), 's, Sentences:', result.sentences?.length || 0);
97
+ if (result.sentences && result.sentences.length > 0) {
98
+ console.log('[Progressive] Sentence times:', result.sentences.map(s => `"${s.text.slice(0, 20)}..." (${s.start.toFixed(1)}-${s.end.toFixed(1)}s)`));
99
+ }
100
+
101
+ if (windowDuration >= this.maxWindowSize && result.sentences && result.sentences.length > 1) {
102
+ // Window is too large - fix some sentences
103
+ const cutoffTime = windowDuration - this.sentenceBuffer;
104
+
105
+ // Find sentences to fix
106
+ const newFixedSentences = [];
107
+ let newFixedEndTime = this.fixedEndTime;
108
+
109
+ for (const sentence of result.sentences) {
110
+ const sentenceAbsTime = this.fixedEndTime + sentence.end;
111
+
112
+ if (sentence.end < cutoffTime) {
113
+ // Fix this sentence
114
+ newFixedSentences.push(sentence.text.trim());
115
+ newFixedEndTime = sentenceAbsTime;
116
+ } else {
117
+ break;
118
+ }
119
+ }
120
+
121
+ if (newFixedSentences.length > 0) {
122
+ this.fixedSentences.push(...newFixedSentences);
123
+ this.fixedEndTime = newFixedEndTime;
124
+
125
+ // Re-transcribe from new fixed point
126
+ const newWindowStartSamples = Math.floor(this.fixedEndTime * this.sampleRate);
127
+ const newAudioWindow = audio.slice(newWindowStartSamples);
128
+ result = await this.model.transcribe(newAudioWindow);
129
+ }
130
+ }
131
+
132
+ // Build output
133
+ const fixedText = this.fixedSentences.join(" ");
134
+ const activeText = result.text ? result.text.trim() : "";
135
+ const timestamp = audio.length / this.sampleRate;
136
+
137
+ return new PartialTranscription(
138
+ fixedText,
139
+ activeText,
140
+ timestamp,
141
+ false
142
+ );
143
+ }
144
+
145
+ async *transcribeProgressive(audio) {
146
+ /**
147
+ * Transcribe audio with smart progressive emissions.
148
+ *
149
+ * Yields PartialTranscription with:
150
+ * - fixedText: Completed sentences (won't change)
151
+ * - activeText: Current partial transcription
152
+ * - timestamp: Current position
153
+ *
154
+ * @param {Float32Array} audio - Complete audio buffer
155
+ * @yields {PartialTranscription}
156
+ */
157
+
158
+ const totalDuration = audio.length / this.sampleRate;
159
+ let currentTime = 0;
160
+
161
+ this.reset();
162
+
163
+ while (currentTime < totalDuration) {
164
+ currentTime += this.emissionInterval;
165
+ const currentSamples = Math.min(
166
+ Math.floor(currentTime * this.sampleRate),
167
+ audio.length
168
+ );
169
+
170
+ const currentAudio = audio.slice(0, currentSamples);
171
+ const result = await this.transcribeIncremental(currentAudio);
172
+
173
+ yield result;
174
+
175
+ // Small delay to simulate real-time
176
+ await new Promise(resolve => setTimeout(resolve, this.emissionInterval * 1000));
177
+ }
178
+
179
+ // Final transcription
180
+ const finalResult = await this.transcribeIncremental(audio);
181
+ yield new PartialTranscription(
182
+ finalResult.fixedText,
183
+ finalResult.activeText,
184
+ finalResult.timestamp,
185
+ true // is_final = true
186
+ );
187
+ }
188
+
189
+ async finalize(audio) {
190
+ /**
191
+ * Get final transcription by combining fixed + active.
192
+ *
193
+ * @param {Float32Array} audio - Complete audio buffer
194
+ * @returns {Promise<string>} Final complete transcription
195
+ */
196
+ const result = await this.transcribeIncremental(audio);
197
+
198
+ const parts = [];
199
+ if (result.fixedText) parts.push(result.fixedText);
200
+ if (result.activeText) parts.push(result.activeText);
201
+
202
+ return parts.join(" ");
203
+ }
204
+ }
src/worker.js ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Web Worker for Parakeet ONNX Model Inference
3
+ *
4
+ * Handles model loading and transcription in a separate thread using parakeet.js
5
+ * https://github.com/ysdede/parakeet.js
6
+ */
7
+
8
+ import { fromHub } from 'parakeet.js';
9
+
10
+ let model = null;
11
+ let isLoading = false;
12
+
13
+ /**
14
+ * Load the Parakeet model using parakeet.js
15
+ */
16
+ async function loadModel(modelVersion = 'parakeet-tdt-0.6b-v3', options = {}) {
17
+ if (isLoading) {
18
+ return { status: 'loading', message: 'Model is already loading...' };
19
+ }
20
+
21
+ if (model) {
22
+ return { status: 'ready', message: 'Model already loaded' };
23
+ }
24
+
25
+ try {
26
+ isLoading = true;
27
+
28
+ const backend = options.device === 'webgpu' ? 'webgpu' : 'wasm';
29
+
30
+ self.postMessage({
31
+ status: 'loading',
32
+ message: `Downloading Parakeet ${modelVersion}... (~2.5GB, this may take 1-2 minutes)`,
33
+ });
34
+
35
+ // Load model using parakeet.js fromHub helper
36
+ console.log(`[Worker] Loading model with backend: ${backend}`);
37
+ model = await fromHub(modelVersion, { backend });
38
+
39
+ // Check actual backend being used (parakeet.js may have fallen back)
40
+ const actualBackend = model.session?.executionProviders?.[0] || backend;
41
+ console.log(`[Worker] Model loaded. Requested: ${backend}, Actual provider: ${actualBackend}`);
42
+
43
+ self.postMessage({
44
+ status: 'loading',
45
+ message: 'Model downloaded, warming up...',
46
+ });
47
+
48
+ // Warm-up inference (recommended by parakeet.js)
49
+ const dummyAudio = new Float32Array(16000); // 1 second of silence
50
+ await model.transcribe(dummyAudio, 16000);
51
+
52
+ self.postMessage({
53
+ status: 'ready',
54
+ message: `Parakeet ${modelVersion} loaded successfully!`,
55
+ device: backend,
56
+ modelVersion,
57
+ });
58
+
59
+ return { status: 'ready', device: backend };
60
+ } catch (error) {
61
+ console.error('Failed to load model:', error);
62
+
63
+ self.postMessage({
64
+ status: 'error',
65
+ message: `Failed to load model: ${error.message}`,
66
+ error: error.toString(),
67
+ });
68
+
69
+ return { status: 'error', error: error.toString() };
70
+ } finally {
71
+ isLoading = false;
72
+ }
73
+ }
74
+
75
+ /**
76
+ * Transcribe audio chunk using Parakeet
77
+ */
78
+ async function transcribe(audio, language = null) {
79
+ if (!model) {
80
+ throw new Error('Model not loaded. Call load() first.');
81
+ }
82
+
83
+ try {
84
+ const startTime = performance.now();
85
+
86
+ // Transcribe with parakeet.js
87
+ const result = await model.transcribe(audio, 16000, {
88
+ returnTimestamps: true, // Get word-level timestamps
89
+ returnConfidences: true, // Get confidence scores
90
+ temperature: 1.0, // Greedy decoding
91
+ });
92
+
93
+ const endTime = performance.now();
94
+ const latency = (endTime - startTime) / 1000; // seconds
95
+ const audioDuration = audio.length / 16000;
96
+ const rtf = latency / audioDuration; // Real-time factor
97
+
98
+ // Convert parakeet.js word format to our sentence format
99
+ console.log('[Worker] Parakeet words:', result.words?.length || 0, 'words');
100
+ if (result.words && result.words.length > 0) {
101
+ console.log('[Worker] First 5 words:', result.words.slice(0, 5).map(w => `"${w.text}" (${w.start_time?.toFixed(1)}-${w.end_time?.toFixed(1)})`));
102
+ }
103
+
104
+ const sentences = groupWordsIntoSentences(result.words || []);
105
+ console.log('[Worker] Grouped into', sentences.length, 'sentences');
106
+
107
+ return {
108
+ text: result.utterance_text || '',
109
+ sentences,
110
+ words: result.words || [],
111
+ chunks: result.words || [], // For compatibility
112
+ metadata: {
113
+ latency,
114
+ audioDuration,
115
+ rtf,
116
+ language,
117
+ confidence: result.confidence_scores,
118
+ metrics: result.metrics,
119
+ },
120
+ };
121
+ } catch (error) {
122
+ console.error('Transcription error:', error);
123
+ throw error;
124
+ }
125
+ }
126
+
127
+ /**
128
+ * Group words into sentences based on punctuation
129
+ *
130
+ * Note: This is a simplified implementation since parakeet.js provides word-level
131
+ * alignments but not sentence-level. The Python implementation uses model-provided
132
+ * sentence boundaries. We split on sentence-ending punctuation (.!?) to approximate
133
+ * sentence boundaries for the progressive streaming window management.
134
+ */
135
+ function groupWordsIntoSentences(words) {
136
+ if (!words || words.length === 0) {
137
+ return [];
138
+ }
139
+
140
+ const sentences = [];
141
+ let currentWords = [];
142
+ let currentStart = words[0].start_time || 0;
143
+
144
+ for (let i = 0; i < words.length; i++) {
145
+ const word = words[i];
146
+ currentWords.push(word.text);
147
+
148
+ // Check if this word ends a sentence (only period, question mark, exclamation)
149
+ // Note: We explicitly ignore commas - they don't end sentences
150
+ const endsWithTerminalPunctuation = /[.!?]$/.test(word.text);
151
+
152
+ if (endsWithTerminalPunctuation || i === words.length - 1) {
153
+ // Create sentence
154
+ sentences.push({
155
+ text: currentWords.join(' ').trim(),
156
+ start: currentStart,
157
+ end: word.end_time || (word.start_time || 0),
158
+ });
159
+
160
+ // Start new sentence if there are more words
161
+ if (i < words.length - 1) {
162
+ currentWords = [];
163
+ currentStart = words[i + 1].start_time || (word.end_time || 0);
164
+ }
165
+ }
166
+ }
167
+
168
+ return sentences;
169
+ }
170
+
171
+ /**
172
+ * Message handler
173
+ */
174
+ self.onmessage = async (event) => {
175
+ const { type, data } = event.data;
176
+
177
+ try {
178
+ switch (type) {
179
+ case 'load':
180
+ await loadModel(data?.modelVersion, data?.options || {});
181
+ break;
182
+
183
+ case 'transcribe':
184
+ const result = await transcribe(data.audio, data.language);
185
+ self.postMessage({
186
+ status: 'transcription',
187
+ result,
188
+ });
189
+ break;
190
+
191
+ case 'ping':
192
+ self.postMessage({ status: 'pong' });
193
+ break;
194
+
195
+ default:
196
+ self.postMessage({
197
+ status: 'error',
198
+ message: `Unknown message type: ${type}`,
199
+ });
200
+ }
201
+ } catch (error) {
202
+ self.postMessage({
203
+ status: 'error',
204
+ message: error.message,
205
+ error: error.toString(),
206
+ });
207
+ }
208
+ };
tailwind.config.js ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /** @type {import('tailwindcss').Config} */
2
+ export default {
3
+ content: [
4
+ "./index.html",
5
+ "./src/**/*.{js,ts,jsx,tsx}",
6
+ ],
7
+ theme: {
8
+ extend: {
9
+ colors: {
10
+ gray: {
11
+ 950: '#0a0a0a',
12
+ },
13
+ },
14
+ },
15
+ },
16
+ plugins: [],
17
+ }
vite.config.js ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { defineConfig } from 'vite';
2
+ import react from '@vitejs/plugin-react';
3
+
4
+ // https://vitejs.dev/config/
5
+ export default defineConfig({
6
+ plugins: [react()],
7
+ server: {
8
+ port: 3000,
9
+ headers: {
10
+ // Required for WebGPU
11
+ 'Cross-Origin-Opener-Policy': 'same-origin',
12
+ 'Cross-Origin-Embedder-Policy': 'require-corp',
13
+ },
14
+ },
15
+ optimizeDeps: {
16
+ exclude: ['@huggingface/transformers', 'parakeet.js'],
17
+ },
18
+ worker: {
19
+ format: 'es', // Use ES modules for workers
20
+ },
21
+ build: {
22
+ target: 'esnext',
23
+ rollupOptions: {
24
+ output: {
25
+ manualChunks: {
26
+ 'parakeet': ['parakeet.js'],
27
+ 'onnxruntime': ['onnxruntime-web'],
28
+ },
29
+ },
30
+ },
31
+ },
32
+ });