Spaces:

shreyas-joshi
/

CoreReader

Sleeping

shreyas-joshi Cursor commited on Feb 19

Commit

8eeaf9e

1 Parent(s): 7348977

Fix audio jitter: remove real-time sleep, add ms_start to sentence events

- Remove per-frame asyncio.sleep pacing. Frames now send as fast as
synthesis allows, letting the client buffer audio ahead of playback.
- Track cumulative_samples per chapter; include ms_start (ms from
chapter start) in every sentence JSON event so the client can fire
highlights at the correct playback position via getStreamTimeConsumed.
- Default prefetch raised to 6 in client (was 3).

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (1) hide show

backend/server.py +16 -7

backend/server.py CHANGED Viewed

@@ -372,6 +372,11 @@ async def websocket_endpoint(websocket: WebSocket):
                     )
                     last_key = None
                     try:
                         control_task: asyncio.Task[str] | None = asyncio.create_task(websocket.receive_text())
@@ -423,26 +428,30 @@ async def websocket_endpoint(websocket: WebSocket):
                             if cancel_event.is_set():
                                 break
                             key = (p_idx + start_paragraph, s_idx, sentence)
                             if key != last_key:
                                 last_key = key
                                 await websocket.send_json(
                                     {
                                         "type": "sentence",
                                         "text": sentence,
                                         "paragraph_index": int(p_idx + start_paragraph),
                                         "sentence_index": int(s_idx),
                                     }
                                 )
                             await websocket.send_bytes(audio_frame)
-                            # Pace frames close to real-time so UI updates (sentence highlighting)
-                            # match what is audible, even when synthesis runs faster than realtime.
-                            if realtime:
-                                try:
-                                    await asyncio.sleep(len(audio_frame) / (2 * app.state.tts.sample_rate))
-                                except Exception:
-                                    pass
                         if control_task is not None:
                             control_task.cancel()

                     )
                     last_key = None
+                    # Cumulative samples sent so far — used to stamp ms_start on each
+                    # sentence event so the client can fire highlights at the right
+                    # playback position rather than at message-arrival time.
+                    cumulative_samples = 0
+                    sample_rate = app.state.tts.sample_rate
                     try:
                         control_task: asyncio.Task[str] | None = asyncio.create_task(websocket.receive_text())
                             if cancel_event.is_set():
                                 break
                             key = (p_idx + start_paragraph, s_idx, sentence)
                             if key != last_key:
                                 last_key = key
+                                # ms_start lets the client fire this highlight exactly when
+                                # the audio reaches this sentence, regardless of buffering.
+                                ms_start = (cumulative_samples * 1000) // sample_rate
                                 await websocket.send_json(
                                     {
                                         "type": "sentence",
                                         "text": sentence,
                                         "paragraph_index": int(p_idx + start_paragraph),
                                         "sentence_index": int(s_idx),
+                                        "ms_start": ms_start,
                                     }
                                 )
                             await websocket.send_bytes(audio_frame)
+                            # Track cumulative audio sent (int16 = 2 bytes per sample).
+                            cumulative_samples += len(audio_frame) // 2
+                            # No real-time sleep: send frames as fast as synthesis allows.
+                            # The client buffers audio and fires highlights via ms_start +
+                            # getStreamTimeConsumed, so no pacing is needed here.
+                            # For offline downloads (realtime=False) the same path applies.
                         if control_task is not None:
                             control_task.cancel()