Quran-multi-aligner

Running on Zero

App Files Files Community

hetchyy commited on Mar 1

Commit

3f29284

1 Parent(s): acac310

Update API doc

Browse files

Files changed (2) hide show

docs/client_api.md +13 -6
src/api/session_api.py +24 -1

docs/client_api.md CHANGED Viewed

@@ -9,12 +9,18 @@
 ---
 ## Quick Start
 ```python
 from gradio_client import Client
-client = Client("https://your-space.hf.space")
 # Full pipeline
 result = client.predict(
@@ -87,7 +93,7 @@ Processes a recitation audio file: detects speech segments, recognizes text, and
 | `min_silence_ms` | int | 200 | Minimum silence gap to split segments |
 | `min_speech_ms` | int | 1000 | Minimum speech duration to keep a segment |
 | `pad_ms` | int | 100 | Padding added to each side of a segment |
-| `model_name` | str | `"Base"` | `"Base"` (faster) or `"Large"` (more accurate) |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 If the GPU is temporarily unavailable, processing continues on CPU (slower). When this happens, a `"warning"` field is included in the response (see [GPU Fallback Warning](#gpu-fallback-warning)).
@@ -146,7 +152,7 @@ Re-splits the audio into segments using different silence/speech settings, then
 | `min_silence_ms` | int | 200 | New minimum silence gap |
 | `min_speech_ms` | int | 1000 | New minimum speech duration |
 | `pad_ms` | int | 100 | New padding |
-| `model_name` | str | `"Base"` | `"Base"` or `"Large"` |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Response:** Same shape as `/process_audio_session`. Session boundaries are updated.
@@ -160,7 +166,7 @@ Re-recognizes text using a different model on the same segments, then re-aligns.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 | `audio_id` | str | required | Session ID from a previous call |
-| `model_name` | str | `"Base"` | `"Base"` or `"Large"` |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Response:** Same shape as `/process_audio_session`. Session model and results are updated.
@@ -177,7 +183,7 @@ Aligns audio using custom time boundaries you provide. Useful for manually adjus
 |---|---|---|---|
 | `audio_id` | str | required | Session ID from a previous call |
 | `timestamps` | list | required | Array of `{"start": float, "end": float}` in seconds |
-| `model_name` | str | `"Base"` | `"Base"` or `"Large"` |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Example request body:**
@@ -313,7 +319,7 @@ Estimate processing time before starting a request.
 | `endpoint` | str | required | Target endpoint name (e.g. `"process_audio_session"`) |
 | `audio_duration_s` | float | `None` | Audio length in seconds. Required if no `audio_id` |
 | `audio_id` | str | `None` | Session ID — looks up audio duration from the session |
-| `model_name` | str | `"Base"` | `"Base"` or `"Large"` |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Example — before first processing call:**
@@ -425,6 +431,7 @@ All errors follow the same shape: `{"error": "...", "segments": []}`. Endpoints
 | Session not found or expired | `"Session not found or expired"` | No |
 | No speech detected (process) | `"No speech detected in audio"` | No (no session created) |
 | No segments after resegment | `"No segments with these settings"` | Yes |
 | Retranscribe with same model | `"Model and boundaries unchanged. Change model_name or call /resegment first."` | Yes |
 | Retranscription failed | `"Retranscription failed"` | Yes |
 | Realignment failed | `"Alignment failed"` | Yes |

 ---
+> **GPU Usage & Access**
+>
+> - **Free Tier:** Every user receives **free daily zero-cost GPU quota**. Once your daily GPU quota is exhausted, you can continue using unlimited CPU processing for all endpoints.
+> - **Unlimited GPU Access:** If you need unlimited API access on GPU (e.g., for high-volume or production use), please get in touch to arrange a payment plan and higher limits.
+> - **Note:** CPU processing is always unlimited and available, but is much slower. When GPU quota is exceeded, requests will be automatically routed to CPU and a warning will appear in the response.
 ## Quick Start
 ```python
 from gradio_client import Client
+client = Client("https://hetchyy-quran-multi-aligner.hf.space")
 # Full pipeline
 result = client.predict(
 | `min_silence_ms` | int | 200 | Minimum silence gap to split segments |
 | `min_speech_ms` | int | 1000 | Minimum speech duration to keep a segment |
 | `pad_ms` | int | 100 | Padding added to each side of a segment |
+| `model_name` | str | `"Base"` | `"Base"` (faster) or `"Large"` (more accurate). **Only these two values are accepted** — any other value will cause an error |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 If the GPU is temporarily unavailable, processing continues on CPU (slower). When this happens, a `"warning"` field is included in the response (see [GPU Fallback Warning](#gpu-fallback-warning)).
 | `min_silence_ms` | int | 200 | New minimum silence gap |
 | `min_speech_ms` | int | 1000 | New minimum speech duration |
 | `pad_ms` | int | 100 | New padding |
+| `model_name` | str | `"Base"` | `"Base"` or `"Large"` only |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Response:** Same shape as `/process_audio_session`. Session boundaries are updated.
 | Parameter | Type | Default | Description |
 |---|---|---|---|
 | `audio_id` | str | required | Session ID from a previous call |
+| `model_name` | str | `"Base"` | `"Base"` or `"Large"` only |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Response:** Same shape as `/process_audio_session`. Session model and results are updated.
 |---|---|---|---|
 | `audio_id` | str | required | Session ID from a previous call |
 | `timestamps` | list | required | Array of `{"start": float, "end": float}` in seconds |
+| `model_name` | str | `"Base"` | `"Base"` or `"Large"` only |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Example request body:**
 | `endpoint` | str | required | Target endpoint name (e.g. `"process_audio_session"`) |
 | `audio_duration_s` | float | `None` | Audio length in seconds. Required if no `audio_id` |
 | `audio_id` | str | `None` | Session ID — looks up audio duration from the session |
+| `model_name` | str | `"Base"` | `"Base"` or `"Large"` only |
 | `device` | str | `"GPU"` | `"GPU"` or `"CPU"` |
 **Example — before first processing call:**
 | Session not found or expired | `"Session not found or expired"` | No |
 | No speech detected (process) | `"No speech detected in audio"` | No (no session created) |
 | No segments after resegment | `"No segments with these settings"` | Yes |
+| Invalid model name | `"Invalid model_name '...'. Must be one of: Base, Large"` | Depends on endpoint |
 | Retranscribe with same model | `"Model and boundaries unchanged. Change model_name or call /resegment first."` | Yes |
 | Retranscription failed | `"Retranscription failed"` | Yes |
 | Realignment failed | `"Alignment failed"` | Yes |

src/api/session_api.py CHANGED Viewed

@@ -18,7 +18,7 @@ import uuid
 import gradio as gr
 import numpy as np
-from config import SESSION_DIR, SESSION_EXPIRY_SECONDS
 from src.core.zero_gpu import QuotaExhaustedError
 # ---------------------------------------------------------------------------
@@ -29,6 +29,14 @@ _last_cleanup_time = 0.0
 _CLEANUP_INTERVAL = 1800  # sweep at most every 30 min
 _VALID_ID = re.compile(r"^[0-9a-f]{32}$")
 def _session_dir(audio_id: str):
@@ -332,6 +340,9 @@ def process_audio_session(audio_data, min_silence_ms, min_speech_ms, pad_ms,
                           model_name="Base", device="GPU",
                           request: gr.Request = None):
     """Full pipeline: preprocess -> VAD -> ASR -> alignment. Creates session."""
     from src.pipeline import process_audio
     quota_warning = None
@@ -368,6 +379,10 @@ def resegment(audio_id, min_silence_ms, min_speech_ms, pad_ms,
                        model_name="Base", device="GPU",
                        request: gr.Request = None):
     """Re-clean VAD boundaries with new params and re-run ASR + alignment."""
     session = load_session(audio_id)
     if session is None:
         return _SESSION_ERROR
@@ -403,6 +418,10 @@ def resegment(audio_id, min_silence_ms, min_speech_ms, pad_ms,
 def retranscribe(audio_id, model_name="Base", device="GPU",
                           request: gr.Request = None):
     """Re-run ASR with a different model on current segment boundaries."""
     session = load_session(audio_id)
     if session is None:
         return _SESSION_ERROR
@@ -446,6 +465,10 @@ def retranscribe(audio_id, model_name="Base", device="GPU",
 def realign_from_timestamps(audio_id, timestamps, model_name="Base", device="GPU",
                              request: gr.Request = None):
     """Run ASR + alignment on caller-provided timestamp intervals."""
     session = load_session(audio_id)
     if session is None:
         return _SESSION_ERROR

 import gradio as gr
 import numpy as np
+from config import SESSION_DIR, SESSION_EXPIRY_SECONDS, PHONEME_ASR_MODELS
 from src.core.zero_gpu import QuotaExhaustedError
 # ---------------------------------------------------------------------------
 _CLEANUP_INTERVAL = 1800  # sweep at most every 30 min
 _VALID_ID = re.compile(r"^[0-9a-f]{32}$")
+_VALID_MODELS = set(PHONEME_ASR_MODELS.keys())
+def _validate_model_name(model_name):
+    """Return an error dict if model_name is invalid, else None."""
+    if model_name not in _VALID_MODELS:
+        valid = ", ".join(sorted(_VALID_MODELS))
+        return {"error": f"Invalid model_name '{model_name}'. Must be one of: {valid}", "segments": []}
 def _session_dir(audio_id: str):
                           model_name="Base", device="GPU",
                           request: gr.Request = None):
     """Full pipeline: preprocess -> VAD -> ASR -> alignment. Creates session."""
+    err = _validate_model_name(model_name)
+    if err:
+        return err
     from src.pipeline import process_audio
     quota_warning = None
                        model_name="Base", device="GPU",
                        request: gr.Request = None):
     """Re-clean VAD boundaries with new params and re-run ASR + alignment."""
+    err = _validate_model_name(model_name)
+    if err:
+        err["audio_id"] = audio_id
+        return err
     session = load_session(audio_id)
     if session is None:
         return _SESSION_ERROR
 def retranscribe(audio_id, model_name="Base", device="GPU",
                           request: gr.Request = None):
     """Re-run ASR with a different model on current segment boundaries."""
+    err = _validate_model_name(model_name)
+    if err:
+        err["audio_id"] = audio_id
+        return err
     session = load_session(audio_id)
     if session is None:
         return _SESSION_ERROR
 def realign_from_timestamps(audio_id, timestamps, model_name="Base", device="GPU",
                              request: gr.Request = None):
     """Run ASR + alignment on caller-provided timestamp intervals."""
+    err = _validate_model_name(model_name)
+    if err:
+        err["audio_id"] = audio_id
+        return err
     session = load_session(audio_id)
     if session is None:
         return _SESSION_ERROR