Quran-multi-aligner

Running on Zero

hetchyy Claude Opus 4.6 commited on Feb 28

Commit

84de10e

1 Parent(s): bddcc14

Simplify client API docs and rename endpoints

Rewrite client_api.md descriptions to be less technical (remove internal
details like storage formats, VAD/ASR jargon). Rename API endpoints:
resegment_session → resegment, retranscribe_session → retranscribe,
mfa_timestamps_session → timestamps, mfa_timestamps_direct →
timestamps_direct. Update function names, event wiring, and api.md to
match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (3) hide show

docs/client_api.md +50 -52
src/api/session_api.py +10 -10
src/ui/event_wiring.py +13 -13

docs/client_api.md CHANGED Viewed

@@ -24,10 +24,10 @@ result = client.predict(
 audio_id = result["audio_id"]
 # Re-segment with different params (reuses cached audio + VAD)
-result = client.predict(audio_id, 600, 1500, 300, "Base", "GPU", api_name="/resegment_session")
 # Re-transcribe with a different model (reuses cached segments)
-result = client.predict(audio_id, "Large", "GPU", api_name="/retranscribe_session")
 # Realign with custom timestamps
 result = client.predict(
@@ -37,14 +37,14 @@ result = client.predict(
     api_name="/realign_from_timestamps"
 )
-# Compute MFA word timestamps (uses stored session segments)
-mfa = client.predict(audio_id, None, "words", api_name="/mfa_timestamps_session")
-# Compute MFA word + letter timestamps
-mfa = client.predict(audio_id, None, "words+chars", api_name="/mfa_timestamps_session")
-# Direct MFA timestamps (no session needed)
-mfa = client.predict("recitation.mp3", result["segments"], "words", api_name="/mfa_timestamps_direct")
 ```
 ---
@@ -55,13 +55,13 @@ The first call returns an `audio_id` (32-character hex string). Pass it to subse
 **What the server caches per session:**
-| Data | Storage | Mutates on follow-up calls? |
-|---|---|---|
-| Preprocessed audio (16kHz float32) | Disk (npy) | No |
-| Raw VAD speech intervals | Disk (pickle) | No |
-| Cleaned segment boundaries | Disk (JSON) | Yes (resegment / realign) |
-| Model name | Disk (JSON) | Yes (retranscribe) |
-| Alignment segments | Disk (JSON) | Yes (any alignment call) |
 If `audio_id` is missing, expired, or invalid:
 ```json
@@ -74,13 +74,13 @@ If `audio_id` is missing, expired, or invalid:
 ### `POST /estimate_duration`
-Estimate how long a processing endpoint will take before calling it.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 | `endpoint` | str | required | Target endpoint name (e.g. `"process_audio_session"`) |
 | `audio_duration_s` | float | `None` | Audio length in seconds. Required if no `audio_id` |
-| `audio_id` | str | `None` | Session ID — infers audio duration from session metadata |
 | `model_name` | str | `"Base"` | `"Base"` or `"Large"` |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
@@ -97,11 +97,11 @@ est = client.predict(
 print(f"Estimated time: {est['estimated_duration_s']}s")
 ```
-**Example — with existing session (e.g. before MFA):**
 ```python
 est = client.predict(
-    "mfa_timestamps_session",  # endpoint
-    None,                      # audio_duration_s (inferred from session)
     audio_id,                  # audio_id
     "Base",                    # model_name
     "GPU",                     # device
@@ -122,18 +122,18 @@ est = client.predict(
 ### `POST /process_audio_session`
-Full pipeline: preprocess → VAD → ASR → alignment. Creates a server-side session.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
-| `audio` | file | required | Audio file (any format, converted to 16kHz mono) |
 | `min_silence_ms` | int | 200 | Minimum silence gap to split segments |
 | `min_speech_ms` | int | 1000 | Minimum speech duration to keep a segment |
 | `pad_ms` | int | 100 | Padding added to each side of a segment |
-| `model_name` | str | `"Base"` | `"Base"` (95M, faster) or `"Large"` (1B, more accurate, slower) |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
-If GPU quota is exhausted, automatically falls back to CPU processing rather than throwing an error. When this happens, a `"warning"` field is included in the response (see [GPU Fallback Warning](#gpu-fallback-warning) below).
 **Segmentation presets:**
@@ -185,8 +185,8 @@ If GPU quota is exhausted, automatically falls back to CPU processing rather tha
 | `ref_from` | str | First matched word as `"surah:ayah:word"`. Empty string for special segments |
 | `ref_to` | str | Last matched word as `"surah:ayah:word"`. Empty string for special segments |
 | `matched_text` | str | Quran text for the matched range (or special segment text) |
-| `confidence` | float | 0.0–1.0 (3 decimal places) |
-| `has_missing_words` | bool | True if the alignment detected skipped/missing words in the recitation |
 | `special_type` | str | Only present for special (non-Quranic) segments — see below. Absent for normal segments |
 | `error` | str? | Error message if alignment failed, else `null` |
@@ -208,7 +208,7 @@ Non-Quranic segments detected within recitations. When `special_type` is present
 ## GPU Fallback Warning
-When GPU quota is exhausted and the server falls back to CPU, all endpoints include a `"warning"` field in the response:
 ```json
 {
@@ -231,18 +231,18 @@ All errors follow the same shape: `{"error": "...", "segments": []}`. Endpoints
 | Session not found or expired | `"Session not found or expired"` | No |
 | No speech detected (process) | `"No speech detected in audio"` | No (no session created) |
 | No segments after resegment | `"No segments with these settings"` | Yes |
-| Retranscribe with same model | `"Model and boundaries unchanged. Change model_name or call /resegment_session first."` | Yes |
 | Retranscription failed | `"Retranscription failed"` | Yes |
 | Realignment failed | `"Alignment failed"` | Yes |
-| No segments in session (MFA) | `"No segments found in session"` | Yes |
-| MFA alignment failed | `"MFA alignment failed: ..."` | Yes (session) / No (direct) |
-| No segments provided (MFA direct) | `"No segments provided"` | No |
 ---
-### `POST /resegment_session`
-Re-cleans VAD boundaries with new segmentation parameters and re-runs ASR. Skips audio upload, preprocessing, and VAD inference.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
@@ -257,9 +257,9 @@ Re-cleans VAD boundaries with new segmentation parameters and re-runs ASR. Skips
 ---
-### `POST /retranscribe_session`
-Re-runs ASR with a different model on the current segment boundaries. Skips audio upload, preprocessing, VAD, and segmentation.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
@@ -269,13 +269,13 @@ Re-runs ASR with a different model on the current segment boundaries. Skips audi
 **Response:** Same shape as `/process_audio_session`. Session model and results are updated.
-> **Note:** Returns an error if `model_name` is the same as the current session's model. To re-run with the same model on different boundaries, use `/resegment_session` or `/realign_from_timestamps` instead (they already include ASR + alignment).
 ---
 ### `POST /realign_from_timestamps`
-Accepts arbitrary `(start, end)` timestamp pairs and runs ASR + alignment on each slice. The client defines segment boundaries directly — use this for manual splitting, merging, or dragging boundaries in a timeline editor.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
@@ -300,13 +300,11 @@ Accepts arbitrary `(start, end)` timestamp pairs and runs ASR + alignment on eac
 **Response:** Same shape as `/process_audio_session`. Session boundaries are replaced with the provided timestamps.
-This endpoint subsumes split, merge, and boundary adjustment — the client computes the desired timestamps locally and sends them in one call.
 ---
-### `POST /mfa_timestamps_session`
-Compute word-level (and optionally letter-level) MFA timestamps using session audio. Segments come from the stored session or can be overridden.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
@@ -320,7 +318,7 @@ result = client.predict(
     "a1b2c3d4e5f67890a1b2c3d4e5f67890",  # audio_id
     None,                                # segments (null = use stored)
     "words",                             # granularity
-    api_name="/mfa_timestamps_session",
 )
 ```
@@ -332,8 +330,8 @@ result = client.predict(
         {"time_from": 0.48, "time_to": 2.88, "ref_from": "112:1:1", "ref_to": "112:1:4"},
         {"time_from": 3.12, "time_to": 5.44, "ref_from": "112:2:1", "ref_to": "112:2:3"},
     ],
-    "words+chars",
-    api_name="/mfa_timestamps_session",
 )
 ```
@@ -347,8 +345,8 @@ result = client.predict(
 | Field | Type | Required | Description |
 |---|---|---|---|
-| `time_from` | float | yes | Start time in seconds (used to slice audio) |
-| `time_to` | float | yes | End time in seconds (used to slice audio) |
 | `ref_from` | str | yes | First word as `"surah:ayah:word"`. Empty for special segments |
 | `ref_to` | str | yes | Last word as `"surah:ayah:word"`. Empty for special segments |
 | `segment` | int | no | 1-indexed segment number. Auto-assigned from position if omitted |
@@ -389,17 +387,17 @@ With `granularity="words+chars"`, each word includes a 4th element — letter ti
 ---
-### `POST /mfa_timestamps_direct`
-Compute MFA timestamps with a provided audio file and segments. No session required — standalone endpoint.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
-| `audio` | file | required | Audio file (any format) |
 | `segments` | list | required | Segment list with `time_from`/`time_to` boundaries |
 | `granularity` | str | `"words"` | `"words"` or `"words+chars"` |
-**Response:** Same shape as `/mfa_timestamps_session` but without `audio_id`.
 **Example (minimal):**
 ```python
@@ -410,8 +408,8 @@ result = client.predict(
         {"time_from": 3.12, "time_to": 5.44, "ref_from": "112:2:1", "ref_to": "112:2:3"},
     ],
     "words+chars",
-    api_name="/mfa_timestamps_direct",
 )
 ```
-Segment input format is the same as for `/mfa_timestamps_session` — see above.

 audio_id = result["audio_id"]
 # Re-segment with different params (reuses cached audio + VAD)
+result = client.predict(audio_id, 600, 1500, 300, "Base", "GPU", api_name="/resegment")
 # Re-transcribe with a different model (reuses cached segments)
+result = client.predict(audio_id, "Large", "GPU", api_name="/retranscribe")
 # Realign with custom timestamps
 result = client.predict(
     api_name="/realign_from_timestamps"
 )
+# Get word-level timestamps (uses stored session segments)
+mfa = client.predict(audio_id, None, "words", api_name="/timestamps")
+# Get word + letter timestamps
+mfa = client.predict(audio_id, None, "words+chars", api_name="/timestamps")
+# Get timestamps without a session (standalone)
+mfa = client.predict("recitation.mp3", result["segments"], "words", api_name="/timestamps_direct")
 ```
 ---
 **What the server caches per session:**
+| Data | Updated by |
+|---|---|
+| Preprocessed audio | — |
+| Raw VAD speech intervals | — |
+| Cleaned segment boundaries | `/resegment`, `/realign_from_timestamps` |
+| Model name | `/retranscribe` |
+| Alignment segments | Any alignment call |
 If `audio_id` is missing, expired, or invalid:
 ```json
 ### `POST /estimate_duration`
+Estimate processing time before starting a request.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 | `endpoint` | str | required | Target endpoint name (e.g. `"process_audio_session"`) |
 | `audio_duration_s` | float | `None` | Audio length in seconds. Required if no `audio_id` |
+| `audio_id` | str | `None` | Session ID — looks up audio duration from the session |
 | `model_name` | str | `"Base"` | `"Base"` or `"Large"` |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 print(f"Estimated time: {est['estimated_duration_s']}s")
 ```
+**Example — with existing session (e.g. before getting timestamps):**
 ```python
 est = client.predict(
+    "timestamps",              # endpoint
+    None,                      # audio_duration_s (looked up from session)
     audio_id,                  # audio_id
     "Base",                    # model_name
     "GPU",                     # device
 ### `POST /process_audio_session`
+Processes a recitation audio file: detects speech segments, recognizes text, and aligns with the Quran. Creates a session for follow-up calls.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
+| `audio` | file | required | Audio file (any common format) |
 | `min_silence_ms` | int | 200 | Minimum silence gap to split segments |
 | `min_speech_ms` | int | 1000 | Minimum speech duration to keep a segment |
 | `pad_ms` | int | 100 | Padding added to each side of a segment |
+| `model_name` | str | `"Base"` | `"Base"` (faster) or `"Large"` (more accurate) |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
+If the GPU is temporarily unavailable, processing continues on CPU (slower). When this happens, a `"warning"` field is included in the response (see [GPU Fallback Warning](#gpu-fallback-warning) below).
 **Segmentation presets:**
 | `ref_from` | str | First matched word as `"surah:ayah:word"`. Empty string for special segments |
 | `ref_to` | str | Last matched word as `"surah:ayah:word"`. Empty string for special segments |
 | `matched_text` | str | Quran text for the matched range (or special segment text) |
+| `confidence` | float | 0.0–1.0 — how well the segment matched the Quran text |
+| `has_missing_words` | bool | Whether some expected words were not found in the audio |
 | `special_type` | str | Only present for special (non-Quranic) segments — see below. Absent for normal segments |
 | `error` | str? | Error message if alignment failed, else `null` |
 ## GPU Fallback Warning
+When the server's GPU is temporarily unavailable, processing continues on CPU (slower). All endpoints include a `"warning"` field in the response:
 ```json
 {
 | Session not found or expired | `"Session not found or expired"` | No |
 | No speech detected (process) | `"No speech detected in audio"` | No (no session created) |
 | No segments after resegment | `"No segments with these settings"` | Yes |
+| Retranscribe with same model | `"Model and boundaries unchanged. Change model_name or call /resegment first."` | Yes |
 | Retranscription failed | `"Retranscription failed"` | Yes |
 | Realignment failed | `"Alignment failed"` | Yes |
+| No segments in session (timestamps) | `"No segments found in session"` | Yes |
+| Timestamp alignment failed | `"MFA alignment failed: ..."` | Yes (session) / No (direct) |
+| No segments provided (timestamps direct) | `"No segments provided"` | No |
 ---
+### `POST /resegment`
+Re-splits the audio into segments using different silence/speech settings, then re-aligns. Reuses the uploaded audio.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 ---
+### `POST /retranscribe`
+Re-recognizes text using a different model on the same segments, then re-aligns.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 **Response:** Same shape as `/process_audio_session`. Session model and results are updated.
+> **Note:** Returns an error if `model_name` is the same as the current session's model. To re-run with the same model on different boundaries, use `/resegment` or `/realign_from_timestamps` instead (they already include recognition + alignment).
 ---
 ### `POST /realign_from_timestamps`
+Aligns audio using custom time boundaries you provide. Useful for manually adjusting where segments start and end.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 **Response:** Same shape as `/process_audio_session`. Session boundaries are replaced with the provided timestamps.
 ---
+### `POST /timestamps`
+Gets precise word-level (and optionally letter-level) timing for each word in the aligned segments.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
     "a1b2c3d4e5f67890a1b2c3d4e5f67890",  # audio_id
     None,                                # segments (null = use stored)
     "words",                             # granularity
+    api_name="/timestamps",
 )
 ```
         {"time_from": 0.48, "time_to": 2.88, "ref_from": "112:1:1", "ref_to": "112:1:4"},
         {"time_from": 3.12, "time_to": 5.44, "ref_from": "112:2:1", "ref_to": "112:2:3"},
     ],
+    "words+chars",
+    api_name="/timestamps",
 )
 ```
 | Field | Type | Required | Description |
 |---|---|---|---|
+| `time_from` | float | yes | Start time in seconds |
+| `time_to` | float | yes | End time in seconds |
 | `ref_from` | str | yes | First word as `"surah:ayah:word"`. Empty for special segments |
 | `ref_to` | str | yes | Last word as `"surah:ayah:word"`. Empty for special segments |
 | `segment` | int | no | 1-indexed segment number. Auto-assigned from position if omitted |
 ---
+### `POST /timestamps_direct`
+Same as `/timestamps` but accepts an audio file directly — no session needed.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
+| `audio` | file | required | Audio file (any common format) |
 | `segments` | list | required | Segment list with `time_from`/`time_to` boundaries |
 | `granularity` | str | `"words"` | `"words"` or `"words+chars"` |
+**Response:** Same shape as `/timestamps` but without `audio_id`.
 **Example (minimal):**
 ```python
         {"time_from": 3.12, "time_to": 5.44, "ref_from": "112:2:1", "ref_to": "112:2:3"},
     ],
     "words+chars",
+    api_name="/timestamps_direct",
 )
 ```
+Segment input format is the same as for `/timestamps` — see above.

src/api/session_api.py CHANGED Viewed

@@ -188,14 +188,14 @@ _SESSION_ERROR = {"error": "Session not found or expired", "segments": []}
 _ESTIMABLE_ENDPOINTS = {
     "process_audio_session",
-    "resegment_session",
-    "retranscribe_session",
     "realign_from_timestamps",
-    "mfa_timestamps_session",
-    "mfa_timestamps_direct",
 }
-_MFA_ENDPOINTS = {"mfa_timestamps_session", "mfa_timestamps_direct"}
 _VAD_ENDPOINTS = {"process_audio_session"}
@@ -347,7 +347,7 @@ def process_audio_session(audio_data, min_silence_ms, min_speech_ms, pad_ms,
     return _format_response(audio_id, json_output, warning=quota_warning)
-def resegment_session(audio_id, min_silence_ms, min_speech_ms, pad_ms,
                        model_name="Base", device="GPU",
                        request: gr.Request = None):
     """Re-clean VAD boundaries with new params and re-run ASR + alignment."""
@@ -383,7 +383,7 @@ def resegment_session(audio_id, min_silence_ms, min_speech_ms, pad_ms,
     return _format_response(audio_id, json_output, warning=quota_warning)
-def retranscribe_session(audio_id, model_name="Base", device="GPU",
                           request: gr.Request = None):
     """Re-run ASR with a different model on current segment boundaries."""
     session = load_session(audio_id)
@@ -395,7 +395,7 @@ def retranscribe_session(audio_id, model_name="Base", device="GPU",
             and _intervals_hash(session["intervals"]) == session["intervals_hash"]):
         return {
             "audio_id": audio_id,
-            "error": "Model and boundaries unchanged. Change model_name or call /resegment_session first.",
             "segments": [],
         }
@@ -557,7 +557,7 @@ def _normalize_segments(segments):
 # MFA timestamp endpoints
 # ---------------------------------------------------------------------------
-def mfa_timestamps_session(audio_id, segments_json=None, granularity="words"):
     """Compute MFA word/letter timestamps using session audio."""
     session = load_session(audio_id)
     if session is None:
@@ -590,7 +590,7 @@ def mfa_timestamps_session(audio_id, segments_json=None, granularity="words"):
     return result
-def mfa_timestamps_direct(audio_data, segments_json, granularity="words"):
     """Compute MFA word/letter timestamps with provided audio and segments."""
     # Parse segments
     if isinstance(segments_json, str):

 _ESTIMABLE_ENDPOINTS = {
     "process_audio_session",
+    "resegment",
+    "retranscribe",
     "realign_from_timestamps",
+    "timestamps",
+    "timestamps_direct",
 }
+_MFA_ENDPOINTS = {"timestamps", "timestamps_direct"}
 _VAD_ENDPOINTS = {"process_audio_session"}
     return _format_response(audio_id, json_output, warning=quota_warning)
+def resegment(audio_id, min_silence_ms, min_speech_ms, pad_ms,
                        model_name="Base", device="GPU",
                        request: gr.Request = None):
     """Re-clean VAD boundaries with new params and re-run ASR + alignment."""
     return _format_response(audio_id, json_output, warning=quota_warning)
+def retranscribe(audio_id, model_name="Base", device="GPU",
                           request: gr.Request = None):
     """Re-run ASR with a different model on current segment boundaries."""
     session = load_session(audio_id)
             and _intervals_hash(session["intervals"]) == session["intervals_hash"]):
         return {
             "audio_id": audio_id,
+            "error": "Model and boundaries unchanged. Change model_name or call /resegment first.",
             "segments": [],
         }
 # MFA timestamp endpoints
 # ---------------------------------------------------------------------------
+def timestamps(audio_id, segments_json=None, granularity="words"):
     """Compute MFA word/letter timestamps using session audio."""
     session = load_session(audio_id)
     if session is None:
     return result
+def timestamps_direct(audio_data, segments_json, granularity="words"):
     """Compute MFA word/letter timestamps with provided audio and segments."""
     # Parse segments
     if isinstance(segments_json, str):

src/ui/event_wiring.py CHANGED Viewed

@@ -9,9 +9,9 @@ from src.pipeline import (
 )
 from src.api.session_api import (
     estimate_duration,
-    process_audio_session, resegment_session,
-    retranscribe_session, realign_from_timestamps,
-    mfa_timestamps_session, mfa_timestamps_direct,
 )
 from src.mfa import compute_mfa_timestamps
 from src.ui.progress_bar import pipeline_progress_bar_html
@@ -186,7 +186,7 @@ def _wire_resegment_chain(c):
                        request: gr.Request = None):
         # Compute estimate and show progress bar
         audio_dur = len(audio) / 16000 if audio is not None and hasattr(audio, '__len__') else None
-        est = estimate_duration("resegment_session", audio_dur, model_name=model, device=device)
         est_s = est.get("estimated_duration_s") or 15
         bar_html = pipeline_progress_bar_html(est_s)
@@ -261,7 +261,7 @@ def _wire_retranscribe_chain(c):
                           request: gr.Request = None):
         # Compute estimate and show progress bar
         audio_dur = len(audio) / 16000 if audio is not None and hasattr(audio, '__len__') else None
-        est = estimate_duration("retranscribe_session", audio_dur, model_name=model_name, device=device)
         est_s = est.get("estimated_duration_s") or 15
         bar_html = pipeline_progress_bar_html(est_s)
@@ -540,17 +540,17 @@ def _wire_api_endpoint(c):
         api_name="process_audio_session",
     )
     gr.Button(visible=False).click(
-        fn=resegment_session,
         inputs=[c.api_audio_id, c.api_silence, c.api_speech, c.api_pad,
                 c.api_model, c.api_device],
         outputs=[c.api_result],
-        api_name="resegment_session",
     )
     gr.Button(visible=False).click(
-        fn=retranscribe_session,
         inputs=[c.api_audio_id, c.api_model, c.api_device],
         outputs=[c.api_result],
-        api_name="retranscribe_session",
     )
     gr.Button(visible=False).click(
         fn=realign_from_timestamps,
@@ -559,16 +559,16 @@ def _wire_api_endpoint(c):
         api_name="realign_from_timestamps",
     )
     gr.Button(visible=False).click(
-        fn=mfa_timestamps_session,
         inputs=[c.api_audio_id, c.api_mfa_segments, c.api_mfa_granularity],
         outputs=[c.api_result],
-        api_name="mfa_timestamps_session",
     )
     gr.Button(visible=False).click(
-        fn=mfa_timestamps_direct,
         inputs=[c.api_audio, c.api_mfa_segments, c.api_mfa_granularity],
         outputs=[c.api_result],
-        api_name="mfa_timestamps_direct",
     )

 )
 from src.api.session_api import (
     estimate_duration,
+    process_audio_session, resegment,
+    retranscribe, realign_from_timestamps,
+    timestamps, timestamps_direct,
 )
 from src.mfa import compute_mfa_timestamps
 from src.ui.progress_bar import pipeline_progress_bar_html
                        request: gr.Request = None):
         # Compute estimate and show progress bar
         audio_dur = len(audio) / 16000 if audio is not None and hasattr(audio, '__len__') else None
+        est = estimate_duration("resegment", audio_dur, model_name=model, device=device)
         est_s = est.get("estimated_duration_s") or 15
         bar_html = pipeline_progress_bar_html(est_s)
                           request: gr.Request = None):
         # Compute estimate and show progress bar
         audio_dur = len(audio) / 16000 if audio is not None and hasattr(audio, '__len__') else None
+        est = estimate_duration("retranscribe", audio_dur, model_name=model_name, device=device)
         est_s = est.get("estimated_duration_s") or 15
         bar_html = pipeline_progress_bar_html(est_s)
         api_name="process_audio_session",
     )
     gr.Button(visible=False).click(
+        fn=resegment,
         inputs=[c.api_audio_id, c.api_silence, c.api_speech, c.api_pad,
                 c.api_model, c.api_device],
         outputs=[c.api_result],
+        api_name="resegment",
     )
     gr.Button(visible=False).click(
+        fn=retranscribe,
         inputs=[c.api_audio_id, c.api_model, c.api_device],
         outputs=[c.api_result],
+        api_name="retranscribe",
     )
     gr.Button(visible=False).click(
         fn=realign_from_timestamps,
         api_name="realign_from_timestamps",
     )
     gr.Button(visible=False).click(
+        fn=timestamps,
         inputs=[c.api_audio_id, c.api_mfa_segments, c.api_mfa_granularity],
         outputs=[c.api_result],
+        api_name="timestamps",
     )
     gr.Button(visible=False).click(
+        fn=timestamps_direct,
         inputs=[c.api_audio, c.api_mfa_segments, c.api_mfa_granularity],
         outputs=[c.api_result],
+        api_name="timestamps_direct",
     )