File size: 2,267 Bytes
5f0a2ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: Voice Latency Lab
emoji: 🎙️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Voice chat app with STT, Qwen 1.5B, and Kokoro.
---

# Voice Latency Lab

This is the Hugging Face Docker Space copy of the app.

It is intentionally separate from the desktop version and tuned for a **free CPU-first Space**:

- STT: `faster-whisper` on CPU
- LLM: `Qwen/Qwen2.5-1.5B-Instruct` through local `transformers`
- TTS: Kokoro on CPU

There is now an optional STT comparison path for:

- `nvidia/parakeet-tdt-0.6b-v3` through NeMo on CPU

## Important constraints

- This is a **cold-starting Space** on free hardware.
- It is expected to be **slower** than the local GPU version.
- The local model is configured to **hide reasoning output** and answer briefly for voice use.
- The first load is intentionally front-loaded at startup so the reply watchdog does not fire during model warmup.

## Runtime profile

The default Space profile lives in:

```bash
.env.hf-free.example
```

At startup, `start.sh` copies that file to `.env` if no `.env` already exists.

To compare Whisper with Parakeet v3, change:

```bash
VOICE_LAB_STT_BACKEND=parakeet-tdt-v3
```

and keep the default Whisper values if you want to switch back quickly.

You can also test Parakeet without Silero gating:

```bash
VOICE_LAB_PARAKEET_BYPASS_SILERO_VAD=true
```

That keeps the app turn-taking logic but switches the speech gate to a simpler RMS-based path for the Parakeet backend only.
Use the separate Parakeet barge-in knobs if assistant interruption becomes too sensitive:

```bash
VOICE_LAB_PARAKEET_BARGE_IN_RMS_THRESHOLD=0.02
VOICE_LAB_PARAKEET_BARGE_IN_START_MS=180
```

## What this Space is for

- waking on visit
- testing a remote phone-friendly voice loop
- keeping the desktop-only stack separate

## What this Space is not for

- matching the latency of the local GPU setup
- expecting Parakeet v3 on free hardware to be fast or cheap
- running the desktop-only `my-agent-cli` backend

## Later upgrade path

If you switch this Space to paid GPU hardware later, this copy is the place to test:

- different STT backends
- Parakeet variants
- a different local LLM profile

without disturbing the original local app.