Spaces:

pgits
/

stt-gpu-service-python-v4

Runtime error

pgits Claude commited on Sep 6, 2025

Commit

9dffff0

1 Parent(s): bdfd367

v4.7.1-fix-buffering-messages: Fix streaming buffer status messages

- Return None from transcribe_audio_stream() when buffering instead of status text
- Modify WebSocket handler to only send transcription messages when result is not None
- Eliminate buffering status messages appearing as transcription text
- Clean streaming experience: only send actual transcription results to client

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show

test.out +604 -0

test.out ADDED Viewed

	@@ -0,0 +1,604 @@

+Spaces
+Hugging Face's logo
+pgits
+/
+stt-gpu-service-v5
+like
+0
+Logs
+App
+Files
+Community
+Settings
+Logs
+build
+container
+===== Application Startup at 2025-09-06 17:56:48 =====
+INFO:     Started server process [1]
+INFO:     Waiting for application startup.
+INFO:app:Loading speech models...
+INFO:app:Using device: cuda
+INFO:app:Cache directory: ./hf_cache
+INFO:app:GPU memory before loading: 0.00 GB
+INFO:app:Loading Transformers STT model (kyutai/stt-2.6b-en-trfs)...
+/usr/local/lib/python3.10/site-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+INFO:app:Loading STT processor and model using Transformers integration...
+INFO:app:✅ AutoProcessor loaded successfully
+Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]
+Fetching 2 files:  50%|█████     | 1/2 [00:52<00:52, 52.52s/it]
+Fetching 2 files: 100%|██████████| 2/2 [00:52<00:00, 26.26s/it]
+Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
+Loading checkpoint shards:  50%|█████     | 1/2 [00:00<00:00,  3.31it/s]
+Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  4.69it/s]
+Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  4.41it/s]
+INFO:app:✅ AutoModelForSpeechSeq2Seq loaded successfully
+INFO:app:Moving model to device: cuda
+INFO:app:✅ Model moved to device successfully
+INFO:app:✅ Transformers STT model loaded successfully - no fallback needed
+INFO:app:GPU memory after loading: 5.23 GB
+INFO:app:🎉 Transformers STT model loaded successfully!
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
+INFO:     10.16.47.58:47060 - "GET /?logs=container HTTP/1.1" 200 OK
+INFO:app:📁 Loaded audio file: 370404 samples at 24000Hz from amen.wav
+INFO:app:🔍 File audio stats: min=-0.622533, max=0.636370, mean=0.000001
+INFO:app:📁 Starting file transcription - Audio length: 370404 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for file transcription...
+INFO:app:🔍 FILE Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 FILE Using **inputs (official HF pattern)
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers STT result: this is a test. we are testing our voice and seeing what we can see
+INFO:     10.16.14.52:38612 - "POST /api/transcribe HTTP/1.1" 200 OK
+INFO:     ('10.16.14.52', 18209) - "WebSocket /ws/stream" [accepted]
+INFO:app:STT WebSocket connection established
+INFO:app:🚀 Initializing persistent streaming context for WebSocket session...
+INFO:app:✅ Transformers STT model - no persistent streaming context needed
+INFO:     connection open
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.048586, max=0.037305, mean=0.000074
+INFO:app:🔊 Amplified audio: min=-4.858568, max=3.730510, mean=0.007435
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:�� Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.039278, max=0.041980, mean=-0.000063
+INFO:app:🔊 Amplified audio: min=-3.927773, max=4.198004, mean=-0.006328
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.010576, max=0.009606, mean=0.000004
+INFO:app:🔊 Amplified audio: min=-1.057559, max=0.960555, mean=0.000400
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:�� STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.007496, max=0.007499, mean=0.000006
+INFO:app:🔊 Amplified audio: min=-0.749598, max=0.749913, mean=0.000633
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.012840, max=0.011649, mean=0.000002
+INFO:app:🔊 Amplified audio: min=-1.284047, max=1.164852, mean=0.000242
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.020823, max=0.022249, mean=-0.000148
+INFO:app:🔊 Amplified audio: min=-2.082276, max=2.224907, mean=-0.014812
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.026440, max=0.047457, mean=-0.000109
+INFO:app:🔊 Amplified audio: min=-2.644033, max=4.745659, mean=-0.010857
+INFO:app:��️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.030757, max=0.030710, mean=0.000025
+INFO:app:🔊 Amplified audio: min=-3.075743, max=3.070996, mean=0.002525
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.012824, max=0.011835, mean=-0.000005
+INFO:app:🔊 Amplified audio: min=-1.282417, max=1.183518, mean=-0.000458
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.006210, max=0.007050, mean=0.000051
+INFO:app:🔊 Amplified audio: min=-0.621018, max=0.704980, mean=0.005057
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.004548, max=0.005576, mean=-0.000016
+INFO:app:🔊 Amplified audio: min=-0.454812, max=0.557579, mean=-0.001610
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.046061, max=0.045076, mean=0.000045
+INFO:app:🔊 Amplified audio: min=-4.606120, max=4.507604, mean=0.004472
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.023628, max=0.026821, mean=-0.000014
+INFO:app:🔊 Amplified audio: min=-2.362792, max=2.682130, mean=-0.001412
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.009469, max=0.008941, mean=0.000008
+INFO:app:🔊 Amplified audio: min=-0.946919, max=0.894060, mean=0.000835
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.004617, max=0.004328, mean=0.000002
+INFO:app:🔊 Amplified audio: min=-0.461711, max=0.432806, mean=0.000190
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:�� STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.002112, max=0.002095, mean=0.000013
+INFO:app:🔊 Amplified audio: min=-0.211236, max=0.209467, mean=0.001254
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.002045, max=0.002118, mean=0.000001
+INFO:app:🔊 Amplified audio: min=-0.204470, max=0.211755, mean=0.000096
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.005168, max=0.004811, mean=-0.000014
+INFO:app:🔊 Amplified audio: min=-0.516791, max=0.481129, mean=-0.001379
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.999198, max=0.706607, mean=0.007170
+INFO:app:🔊 Amplified audio: min=-99.919769, max=70.660713, mean=0.716962
+INFO:app:��️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-1.009716, max=0.736448, mean=-0.000202
+INFO:app:🔊 Amplified audio: min=-100.971581, max=73.644768, mean=-0.020214
+INFO:app:��️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.386286, max=0.331127, mean=-0.001304
+INFO:app:🔊 Amplified audio: min=-38.628555, max=33.112747, mean=-0.130388
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.048396, max=0.050660, mean=-0.000232
+INFO:app:🔊 Amplified audio: min=-4.839588, max=5.065973, mean=-0.023221
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.022664, max=0.028425, mean=-0.000023
+INFO:app:🔊 Amplified audio: min=-2.266417, max=2.842537, mean=-0.002315
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.015459, max=0.012684, mean=0.000018
+INFO:app:🔊 Amplified audio: min=-1.545864, max=1.268409, mean=0.001751
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.005943, max=0.005332, mean=-0.000009
+INFO:app:🔊 Amplified audio: min=-0.594318, max=0.533158, mean=-0.000864
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.005905, max=0.006380, mean=-0.000015
+INFO:app:🔊 Amplified audio: min=-0.590518, max=0.638035, mean=-0.001548
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.007763, max=0.007384, mean=0.000008
+INFO:app:🔊 Amplified audio: min=-0.776282, max=0.738397, mean=0.000841
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.032269, max=0.027695, mean=0.000006
+INFO:app:🔊 Amplified audio: min=-3.226939, max=2.769472, mean=0.000641
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.035382, max=0.033045, mean=-0.000012
+INFO:app:🔊 Amplified audio: min=-3.538235, max=3.304517, mean=-0.001195
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.015723, max=0.016396, mean=0.000009
+INFO:app:🔊 Amplified audio: min=-1.572296, max=1.639588, mean=0.000932
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.009426, max=0.007130, mean=0.000005
+INFO:app:🔊 Amplified audio: min=-0.942614, max=0.713000, mean=0.000502
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:�� Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.216286, max=0.176500, mean=0.000438
+INFO:app:🔊 Amplified audio: min=-21.628649, max=17.649954, mean=0.043760
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Okay.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.312537, max=0.237393, mean=-0.000163
+INFO:app:�� Amplified audio: min=-31.253742, max=23.739323, mean=-0.016306
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.385227, max=0.259278, mean=-0.000210
+INFO:app:🔊 Amplified audio: min=-38.522701, max=25.927757, mean=-0.021037
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:�� Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.685157, max=0.379691, mean=0.000326
+INFO:app:🔊 Amplified audio: min=-68.515724, max=37.969090, mean=0.032562
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.365367, max=0.266523, mean=-0.000539
+INFO:app:🔊 Amplified audio: min=-36.536671, max=26.652321, mean=-0.053918
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.162448, max=0.126396, mean=0.000251
+INFO:app:🔊 Amplified audio: min=-16.244793, max=12.639629, mean=0.025095
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.309009, max=0.214467, mean=-0.001591
+INFO:app:🔊 Amplified audio: min=-30.900850, max=21.446678, mean=-0.159052
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:�� STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.417965, max=0.334323, mean=0.002206
+INFO:app:🔊 Amplified audio: min=-41.796520, max=33.432289, mean=0.220560
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.316443, max=0.253125, mean=-0.000574
+INFO:app:🔊 Amplified audio: min=-31.644335, max=25.312542, mean=-0.057435
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.156280, max=0.102331, mean=-0.000296
+INFO:app:🔊 Amplified audio: min=-15.627993, max=10.233070, mean=-0.029559
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.078169, max=0.066128, mean=0.000112
+INFO:app:🔊 Amplified audio: min=-7.816890, max=6.612817, mean=0.011213
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.217433, max=0.300689, mean=-0.000123
+INFO:app:🔊 Amplified audio: min=-21.743336, max=30.068880, mean=-0.012314
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.108164, max=0.212096, mean=0.000849
+INFO:app:🔊 Amplified audio: min=-10.816364, max=21.209583, mean=0.084887
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.088860, max=0.183484, mean=0.000520
+INFO:app:🔊 Amplified audio: min=-8.885959, max=18.348396, mean=0.051968
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.104860, max=0.219530, mean=-0.000913
+INFO:app:🔊 Amplified audio: min=-10.486033, max=21.953033, mean=-0.091296
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.094694, max=0.209158, mean=0.000242
+INFO:app:🔊 Amplified audio: min=-9.469412, max=20.915785, mean=0.024152
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.091275, max=0.180358, mean=-0.000044
+INFO:app:🔊 Amplified audio: min=-9.127468, max=18.035841, mean=-0.004403
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.091177, max=0.137886, mean=0.000350
+INFO:app:🔊 Amplified audio: min=-9.117688, max=13.788627, mean=0.035010
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.066525, max=0.104910, mean=-0.000634
+INFO:app:🔊 Amplified audio: min=-6.652477, max=10.491001, mean=-0.063372
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.030459, max=0.033373, mean=-0.000320
+INFO:app:🔊 Amplified audio: min=-3.045944, max=3.337340, mean=-0.032031
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.056069, max=0.061884, mean=-0.000423
+INFO:app:🔊 Amplified audio: min=-5.606874, max=6.188380, mean=-0.042286
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.074319, max=0.069445, mean=-0.000011
+INFO:app:🔊 Amplified audio: min=-7.431859, max=6.944468, mean=-0.001129
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.083021, max=0.084014, mean=0.000265
+INFO:app:🔊 Amplified audio: min=-8.302102, max=8.401437, mean=0.026454
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.084696, max=0.084850, mean=0.000086
+INFO:app:🔊 Amplified audio: min=-8.469565, max=8.485031, mean=0.008553
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.097942, max=0.090344, mean=0.000149
+INFO:app:🔊 Amplified audio: min=-9.794166, max=9.034397, mean=0.014860
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result:
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.050666, max=0.031321, mean=0.000240
+INFO:app:🔊 Amplified audio: min=-5.066576, max=3.132134, mean=0.024006
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:🔍 Audio data stats: shape=(1920,), min=-0.081810, max=0.083186, mean=0.000352
+INFO:app:🔊 Amplified audio: min=-8.181005, max=8.318566, mean=0.035191
+INFO:app:🎙️ Starting transcription - Audio length: 1920 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for streaming transcription...
+INFO:app:🔍 STREAMING Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 STREAMING Using **inputs (official HF pattern)
+`max_new_tokens` (128) is greater than the maximum number of audio frames (57).Setting `max_new_tokens` to 57.
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers streaming STT result: Yeah.
+INFO:app:STT WebSocket connection closed
+INFO:app:✅ Closed Transformers STT streaming context
+INFO:     connection closed
+INFO:     10.16.14.52:2040 - "GET / HTTP/1.1" 200 OK
+INFO:app:📁 Loaded audio file: 370404 samples at 24000Hz from amen.wav
+INFO:app:🔍 File audio stats: min=-0.622533, max=0.636370, mean=0.000001
+INFO:app:📁 Starting file transcription - Audio length: 370404 samples at 24000Hz
+INFO:app:🚀 Using Transformers Speech-to-Text API for file transcription...
+INFO:app:🔍 FILE Processor input keys: ['padding_mask', 'input_values']
+INFO:app:🔍 FILE Using **inputs (official HF pattern)
+The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
+INFO:app:✅ Transformers STT result: this is a test. we are testing our voice and seeing what we can see
+INFO:     10.16.0.208:62787 - "POST /api/transcribe HTTP/1.1" 200 OK