pgits Claude commited on
Commit
9dffff0
Β·
1 Parent(s): bdfd367

v4.7.1-fix-buffering-messages: Fix streaming buffer status messages

Browse files

- Return None from transcribe_audio_stream() when buffering instead of status text
- Modify WebSocket handler to only send transcription messages when result is not None
- Eliminate buffering status messages appearing as transcription text
- Clean streaming experience: only send actual transcription results to client

πŸ€– Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show
  1. test.out +604 -0
test.out ADDED
@@ -0,0 +1,604 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Spaces
2
+ Hugging Face's logo
3
+
4
+ pgits
5
+ /
6
+ stt-gpu-service-v5
7
+
8
+
9
+ like
10
+ 0
11
+
12
+ Logs
13
+ App
14
+ Files
15
+ Community
16
+ Settings
17
+
18
+
19
+ Logs
20
+
21
+ build
22
+ container
23
+
24
+
25
+
26
+ ===== Application Startup at 2025-09-06 17:56:48 =====
27
+
28
+ INFO: Started server process [1]
29
+ INFO: Waiting for application startup.
30
+ INFO:app:Loading speech models...
31
+ INFO:app:Using device: cuda
32
+ INFO:app:Cache directory: ./hf_cache
33
+ INFO:app:GPU memory before loading: 0.00 GB
34
+ INFO:app:Loading Transformers STT model (kyutai/stt-2.6b-en-trfs)...
35
+ /usr/local/lib/python3.10/site-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
36
+ warnings.warn(
37
+ INFO:app:Loading STT processor and model using Transformers integration...
38
+ INFO:app:βœ… AutoProcessor loaded successfully
39
+ Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
40
+ Fetching 2 files: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:52<00:52, 52.52s/it]
41
+ Fetching 2 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:52<00:00, 26.26s/it]
42
+ Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
43
+ Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:00<00:00, 3.31it/s]
44
+ Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 4.69it/s]
45
+ Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 4.41it/s]
46
+ INFO:app:βœ… AutoModelForSpeechSeq2Seq loaded successfully
47
+ INFO:app:Moving model to device: cuda
48
+ INFO:app:βœ… Model moved to device successfully
49
+ INFO:app:βœ… Transformers STT model loaded successfully - no fallback needed
50
+ INFO:app:GPU memory after loading: 5.23 GB
51
+ INFO:app:πŸŽ‰ Transformers STT model loaded successfully!
52
+ INFO: Application startup complete.
53
+ INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
54
+ INFO: 10.16.47.58:47060 - "GET /?logs=container HTTP/1.1" 200 OK
55
+ INFO:app:πŸ“ Loaded audio file: 370404 samples at 24000Hz from amen.wav
56
+ INFO:app:πŸ” File audio stats: min=-0.622533, max=0.636370, mean=0.000001
57
+ INFO:app:πŸ“ Starting file transcription - Audio length: 370404 samples at 24000Hz
58
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for file transcription...
59
+ INFO:app:πŸ” FILE Processor input keys: ['padding_mask', 'input_values']
60
+ INFO:app:πŸ” FILE Using **inputs (official HF pattern)
61
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
62
+ INFO:app:βœ… Transformers STT result: this is a test. we are testing our voice and seeing what we can see
63
+ INFO: 10.16.14.52:38612 - "POST /api/transcribe HTTP/1.1" 200 OK
64
+ INFO: ('10.16.14.52', 18209) - "WebSocket /ws/stream" [accepted]
65
+ INFO:app:STT WebSocket connection established
66
+ INFO:app:πŸš€ Initializing persistent streaming context for WebSocket session...
67
+ INFO:app:βœ… Transformers STT model - no persistent streaming context needed
68
+ INFO: connection open
69
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.048586, max=0.037305, mean=0.000074
70
+ INFO:app:πŸ”Š Amplified audio: min=-4.858568, max=3.730510, mean=0.007435
71
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
72
+ INFO:app:οΏ½οΏ½ Using Transformers Speech-to-Text API for streaming transcription...
73
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
74
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
75
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
76
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
77
+ INFO:app:βœ… Transformers streaming STT result:
78
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.039278, max=0.041980, mean=-0.000063
79
+ INFO:app:πŸ”Š Amplified audio: min=-3.927773, max=4.198004, mean=-0.006328
80
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
81
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
82
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
83
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
84
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
85
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
86
+ INFO:app:βœ… Transformers streaming STT result:
87
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.010576, max=0.009606, mean=0.000004
88
+ INFO:app:πŸ”Š Amplified audio: min=-1.057559, max=0.960555, mean=0.000400
89
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
90
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
91
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
92
+ INFO:app:οΏ½οΏ½ STREAMING Using **inputs (official HF pattern)
93
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
94
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
95
+ INFO:app:βœ… Transformers streaming STT result:
96
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.007496, max=0.007499, mean=0.000006
97
+ INFO:app:πŸ”Š Amplified audio: min=-0.749598, max=0.749913, mean=0.000633
98
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
99
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
100
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
101
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
102
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
103
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
104
+ INFO:app:βœ… Transformers streaming STT result:
105
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.012840, max=0.011649, mean=0.000002
106
+ INFO:app:πŸ”Š Amplified audio: min=-1.284047, max=1.164852, mean=0.000242
107
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
108
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
109
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
110
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
111
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
112
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
113
+ INFO:app:βœ… Transformers streaming STT result:
114
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.020823, max=0.022249, mean=-0.000148
115
+ INFO:app:πŸ”Š Amplified audio: min=-2.082276, max=2.224907, mean=-0.014812
116
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
117
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
118
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
119
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
120
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
121
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
122
+ INFO:app:βœ… Transformers streaming STT result:
123
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.026440, max=0.047457, mean=-0.000109
124
+ INFO:app:πŸ”Š Amplified audio: min=-2.644033, max=4.745659, mean=-0.010857
125
+ INFO:app:��️ Starting transcription - Audio length: 1920 samples at 24000Hz
126
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
127
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
128
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
129
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
130
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
131
+ INFO:app:βœ… Transformers streaming STT result:
132
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.030757, max=0.030710, mean=0.000025
133
+ INFO:app:πŸ”Š Amplified audio: min=-3.075743, max=3.070996, mean=0.002525
134
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
135
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
136
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
137
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
138
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
139
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
140
+ INFO:app:βœ… Transformers streaming STT result:
141
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.012824, max=0.011835, mean=-0.000005
142
+ INFO:app:πŸ”Š Amplified audio: min=-1.282417, max=1.183518, mean=-0.000458
143
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
144
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
145
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
146
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
147
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
148
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
149
+ INFO:app:βœ… Transformers streaming STT result:
150
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.006210, max=0.007050, mean=0.000051
151
+ INFO:app:πŸ”Š Amplified audio: min=-0.621018, max=0.704980, mean=0.005057
152
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
153
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
154
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
155
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
156
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
157
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
158
+ INFO:app:βœ… Transformers streaming STT result:
159
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.004548, max=0.005576, mean=-0.000016
160
+ INFO:app:πŸ”Š Amplified audio: min=-0.454812, max=0.557579, mean=-0.001610
161
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
162
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
163
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
164
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
165
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
166
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
167
+ INFO:app:βœ… Transformers streaming STT result:
168
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.046061, max=0.045076, mean=0.000045
169
+ INFO:app:πŸ”Š Amplified audio: min=-4.606120, max=4.507604, mean=0.004472
170
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
171
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
172
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
173
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
174
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
175
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
176
+ INFO:app:βœ… Transformers streaming STT result:
177
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.023628, max=0.026821, mean=-0.000014
178
+ INFO:app:πŸ”Š Amplified audio: min=-2.362792, max=2.682130, mean=-0.001412
179
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
180
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
181
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
182
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
183
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
184
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
185
+ INFO:app:βœ… Transformers streaming STT result:
186
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.009469, max=0.008941, mean=0.000008
187
+ INFO:app:πŸ”Š Amplified audio: min=-0.946919, max=0.894060, mean=0.000835
188
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
189
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
190
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
191
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
192
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
193
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
194
+ INFO:app:βœ… Transformers streaming STT result:
195
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.004617, max=0.004328, mean=0.000002
196
+ INFO:app:πŸ”Š Amplified audio: min=-0.461711, max=0.432806, mean=0.000190
197
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
198
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
199
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
200
+ INFO:app:οΏ½οΏ½ STREAMING Using **inputs (official HF pattern)
201
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
202
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
203
+ INFO:app:βœ… Transformers streaming STT result:
204
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.002112, max=0.002095, mean=0.000013
205
+ INFO:app:πŸ”Š Amplified audio: min=-0.211236, max=0.209467, mean=0.001254
206
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
207
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
208
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
209
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
210
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
211
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
212
+ INFO:app:βœ… Transformers streaming STT result:
213
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.002045, max=0.002118, mean=0.000001
214
+ INFO:app:πŸ”Š Amplified audio: min=-0.204470, max=0.211755, mean=0.000096
215
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
216
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
217
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
218
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
219
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
220
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
221
+ INFO:app:βœ… Transformers streaming STT result:
222
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.005168, max=0.004811, mean=-0.000014
223
+ INFO:app:πŸ”Š Amplified audio: min=-0.516791, max=0.481129, mean=-0.001379
224
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
225
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
226
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
227
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
228
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
229
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
230
+ INFO:app:βœ… Transformers streaming STT result:
231
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.999198, max=0.706607, mean=0.007170
232
+ INFO:app:πŸ”Š Amplified audio: min=-99.919769, max=70.660713, mean=0.716962
233
+ INFO:app:��️ Starting transcription - Audio length: 1920 samples at 24000Hz
234
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
235
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
236
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
237
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
238
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
239
+ INFO:app:βœ… Transformers streaming STT result:
240
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-1.009716, max=0.736448, mean=-0.000202
241
+ INFO:app:πŸ”Š Amplified audio: min=-100.971581, max=73.644768, mean=-0.020214
242
+ INFO:app:��️ Starting transcription - Audio length: 1920 samples at 24000Hz
243
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
244
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
245
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
246
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
247
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
248
+ INFO:app:βœ… Transformers streaming STT result:
249
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.386286, max=0.331127, mean=-0.001304
250
+ INFO:app:πŸ”Š Amplified audio: min=-38.628555, max=33.112747, mean=-0.130388
251
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
252
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
253
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
254
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
255
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
256
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
257
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
258
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.048396, max=0.050660, mean=-0.000232
259
+ INFO:app:πŸ”Š Amplified audio: min=-4.839588, max=5.065973, mean=-0.023221
260
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
261
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
262
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
263
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
264
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
265
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
266
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
267
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.022664, max=0.028425, mean=-0.000023
268
+ INFO:app:πŸ”Š Amplified audio: min=-2.266417, max=2.842537, mean=-0.002315
269
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
270
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
271
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
272
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
273
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
274
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
275
+ INFO:app:βœ… Transformers streaming STT result:
276
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.015459, max=0.012684, mean=0.000018
277
+ INFO:app:πŸ”Š Amplified audio: min=-1.545864, max=1.268409, mean=0.001751
278
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
279
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
280
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
281
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
282
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
283
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
284
+ INFO:app:βœ… Transformers streaming STT result:
285
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.005943, max=0.005332, mean=-0.000009
286
+ INFO:app:πŸ”Š Amplified audio: min=-0.594318, max=0.533158, mean=-0.000864
287
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
288
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
289
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
290
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
291
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
292
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
293
+ INFO:app:βœ… Transformers streaming STT result:
294
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.005905, max=0.006380, mean=-0.000015
295
+ INFO:app:πŸ”Š Amplified audio: min=-0.590518, max=0.638035, mean=-0.001548
296
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
297
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
298
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
299
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
300
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
301
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
302
+ INFO:app:βœ… Transformers streaming STT result:
303
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.007763, max=0.007384, mean=0.000008
304
+ INFO:app:πŸ”Š Amplified audio: min=-0.776282, max=0.738397, mean=0.000841
305
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
306
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
307
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
308
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
309
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
310
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
311
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
312
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.032269, max=0.027695, mean=0.000006
313
+ INFO:app:πŸ”Š Amplified audio: min=-3.226939, max=2.769472, mean=0.000641
314
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
315
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
316
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
317
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
318
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
319
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
320
+ INFO:app:βœ… Transformers streaming STT result:
321
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.035382, max=0.033045, mean=-0.000012
322
+ INFO:app:πŸ”Š Amplified audio: min=-3.538235, max=3.304517, mean=-0.001195
323
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
324
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
325
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
326
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
327
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
328
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
329
+ INFO:app:βœ… Transformers streaming STT result:
330
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.015723, max=0.016396, mean=0.000009
331
+ INFO:app:πŸ”Š Amplified audio: min=-1.572296, max=1.639588, mean=0.000932
332
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
333
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
334
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
335
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
336
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
337
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
338
+ INFO:app:βœ… Transformers streaming STT result:
339
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.009426, max=0.007130, mean=0.000005
340
+ INFO:app:πŸ”Š Amplified audio: min=-0.942614, max=0.713000, mean=0.000502
341
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
342
+ INFO:app:οΏ½οΏ½ Using Transformers Speech-to-Text API for streaming transcription...
343
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
344
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
345
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
346
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
347
+ INFO:app:βœ… Transformers streaming STT result:
348
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.216286, max=0.176500, mean=0.000438
349
+ INFO:app:πŸ”Š Amplified audio: min=-21.628649, max=17.649954, mean=0.043760
350
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
351
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
352
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
353
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
354
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
355
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
356
+ INFO:app:βœ… Transformers streaming STT result: Okay.
357
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.312537, max=0.237393, mean=-0.000163
358
+ INFO:app:οΏ½οΏ½ Amplified audio: min=-31.253742, max=23.739323, mean=-0.016306
359
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
360
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
361
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
362
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
363
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
364
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
365
+ INFO:app:βœ… Transformers streaming STT result:
366
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.385227, max=0.259278, mean=-0.000210
367
+ INFO:app:πŸ”Š Amplified audio: min=-38.522701, max=25.927757, mean=-0.021037
368
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
369
+ INFO:app:οΏ½οΏ½ Using Transformers Speech-to-Text API for streaming transcription...
370
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
371
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
372
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
373
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
374
+ INFO:app:βœ… Transformers streaming STT result:
375
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.685157, max=0.379691, mean=0.000326
376
+ INFO:app:πŸ”Š Amplified audio: min=-68.515724, max=37.969090, mean=0.032562
377
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
378
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
379
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
380
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
381
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
382
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
383
+ INFO:app:βœ… Transformers streaming STT result:
384
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.365367, max=0.266523, mean=-0.000539
385
+ INFO:app:πŸ”Š Amplified audio: min=-36.536671, max=26.652321, mean=-0.053918
386
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
387
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
388
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
389
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
390
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
391
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
392
+ INFO:app:βœ… Transformers streaming STT result:
393
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.162448, max=0.126396, mean=0.000251
394
+ INFO:app:πŸ”Š Amplified audio: min=-16.244793, max=12.639629, mean=0.025095
395
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
396
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
397
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
398
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
399
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
400
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
401
+ INFO:app:βœ… Transformers streaming STT result:
402
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.309009, max=0.214467, mean=-0.001591
403
+ INFO:app:πŸ”Š Amplified audio: min=-30.900850, max=21.446678, mean=-0.159052
404
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
405
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
406
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
407
+ INFO:app:οΏ½οΏ½ STREAMING Using **inputs (official HF pattern)
408
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
409
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
410
+ INFO:app:βœ… Transformers streaming STT result:
411
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.417965, max=0.334323, mean=0.002206
412
+ INFO:app:πŸ”Š Amplified audio: min=-41.796520, max=33.432289, mean=0.220560
413
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
414
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
415
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
416
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
417
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
418
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
419
+ INFO:app:βœ… Transformers streaming STT result:
420
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.316443, max=0.253125, mean=-0.000574
421
+ INFO:app:πŸ”Š Amplified audio: min=-31.644335, max=25.312542, mean=-0.057435
422
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
423
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
424
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
425
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
426
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
427
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
428
+ INFO:app:βœ… Transformers streaming STT result:
429
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.156280, max=0.102331, mean=-0.000296
430
+ INFO:app:πŸ”Š Amplified audio: min=-15.627993, max=10.233070, mean=-0.029559
431
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
432
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
433
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
434
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
435
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
436
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
437
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
438
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.078169, max=0.066128, mean=0.000112
439
+ INFO:app:πŸ”Š Amplified audio: min=-7.816890, max=6.612817, mean=0.011213
440
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
441
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
442
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
443
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
444
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
445
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
446
+ INFO:app:βœ… Transformers streaming STT result:
447
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.217433, max=0.300689, mean=-0.000123
448
+ INFO:app:πŸ”Š Amplified audio: min=-21.743336, max=30.068880, mean=-0.012314
449
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
450
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
451
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
452
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
453
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
454
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
455
+ INFO:app:βœ… Transformers streaming STT result:
456
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.108164, max=0.212096, mean=0.000849
457
+ INFO:app:πŸ”Š Amplified audio: min=-10.816364, max=21.209583, mean=0.084887
458
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
459
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
460
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
461
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
462
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
463
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
464
+ INFO:app:βœ… Transformers streaming STT result:
465
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.088860, max=0.183484, mean=0.000520
466
+ INFO:app:πŸ”Š Amplified audio: min=-8.885959, max=18.348396, mean=0.051968
467
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
468
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
469
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
470
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
471
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
472
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
473
+ INFO:app:βœ… Transformers streaming STT result:
474
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.104860, max=0.219530, mean=-0.000913
475
+ INFO:app:πŸ”Š Amplified audio: min=-10.486033, max=21.953033, mean=-0.091296
476
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
477
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
478
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
479
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
480
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
481
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
482
+ INFO:app:βœ… Transformers streaming STT result:
483
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.094694, max=0.209158, mean=0.000242
484
+ INFO:app:πŸ”Š Amplified audio: min=-9.469412, max=20.915785, mean=0.024152
485
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
486
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
487
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
488
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
489
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
490
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
491
+ INFO:app:βœ… Transformers streaming STT result:
492
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.091275, max=0.180358, mean=-0.000044
493
+ INFO:app:πŸ”Š Amplified audio: min=-9.127468, max=18.035841, mean=-0.004403
494
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
495
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
496
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
497
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
498
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
499
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
500
+ INFO:app:βœ… Transformers streaming STT result:
501
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.091177, max=0.137886, mean=0.000350
502
+ INFO:app:πŸ”Š Amplified audio: min=-9.117688, max=13.788627, mean=0.035010
503
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
504
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
505
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
506
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
507
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
508
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
509
+ INFO:app:βœ… Transformers streaming STT result:
510
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.066525, max=0.104910, mean=-0.000634
511
+ INFO:app:πŸ”Š Amplified audio: min=-6.652477, max=10.491001, mean=-0.063372
512
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
513
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
514
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
515
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
516
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
517
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
518
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
519
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.030459, max=0.033373, mean=-0.000320
520
+ INFO:app:πŸ”Š Amplified audio: min=-3.045944, max=3.337340, mean=-0.032031
521
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
522
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
523
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
524
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
525
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
526
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
527
+ INFO:app:βœ… Transformers streaming STT result:
528
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.056069, max=0.061884, mean=-0.000423
529
+ INFO:app:πŸ”Š Amplified audio: min=-5.606874, max=6.188380, mean=-0.042286
530
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
531
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
532
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
533
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
534
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
535
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
536
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
537
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.074319, max=0.069445, mean=-0.000011
538
+ INFO:app:πŸ”Š Amplified audio: min=-7.431859, max=6.944468, mean=-0.001129
539
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
540
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
541
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
542
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
543
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
544
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
545
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
546
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.083021, max=0.084014, mean=0.000265
547
+ INFO:app:πŸ”Š Amplified audio: min=-8.302102, max=8.401437, mean=0.026454
548
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
549
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
550
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
551
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
552
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
553
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
554
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
555
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.084696, max=0.084850, mean=0.000086
556
+ INFO:app:πŸ”Š Amplified audio: min=-8.469565, max=8.485031, mean=0.008553
557
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
558
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
559
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
560
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
561
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
562
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
563
+ INFO:app:βœ… Transformers streaming STT result:
564
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.097942, max=0.090344, mean=0.000149
565
+ INFO:app:πŸ”Š Amplified audio: min=-9.794166, max=9.034397, mean=0.014860
566
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
567
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
568
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
569
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
570
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
571
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
572
+ INFO:app:βœ… Transformers streaming STT result:
573
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.050666, max=0.031321, mean=0.000240
574
+ INFO:app:πŸ”Š Amplified audio: min=-5.066576, max=3.132134, mean=0.024006
575
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
576
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
577
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
578
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
579
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
580
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
581
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
582
+ INFO:app:πŸ” Audio data stats: shape=(1920,), min=-0.081810, max=0.083186, mean=0.000352
583
+ INFO:app:πŸ”Š Amplified audio: min=-8.181005, max=8.318566, mean=0.035191
584
+ INFO:app:πŸŽ™οΈ Starting transcription - Audio length: 1920 samples at 24000Hz
585
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for streaming transcription...
586
+ INFO:app:πŸ” STREAMING Processor input keys: ['padding_mask', 'input_values']
587
+ INFO:app:πŸ” STREAMING Using **inputs (official HF pattern)
588
+ `max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
589
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
590
+ INFO:app:βœ… Transformers streaming STT result: Yeah.
591
+ INFO:app:STT WebSocket connection closed
592
+ INFO:app:βœ… Closed Transformers STT streaming context
593
+ INFO: connection closed
594
+ INFO: 10.16.14.52:2040 - "GET / HTTP/1.1" 200 OK
595
+ INFO:app:πŸ“ Loaded audio file: 370404 samples at 24000Hz from amen.wav
596
+ INFO:app:πŸ” File audio stats: min=-0.622533, max=0.636370, mean=0.000001
597
+ INFO:app:πŸ“ Starting file transcription - Audio length: 370404 samples at 24000Hz
598
+ INFO:app:πŸš€ Using Transformers Speech-to-Text API for file transcription...
599
+ INFO:app:πŸ” FILE Processor input keys: ['padding_mask', 'input_values']
600
+ INFO:app:πŸ” FILE Using **inputs (official HF pattern)
601
+ The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
602
+ INFO:app:βœ… Transformers STT result: this is a test. we are testing our voice and seeing what we can see
603
+ INFO: 10.16.0.208:62787 - "POST /api/transcribe HTTP/1.1" 200 OK
604
+