Spaces:

MuhammadHijazii
/

faster_whisper_large_v3_post_processwith_advanced

Sleeping

App Files Files Community

MuhammadHijazii commited on Aug 23

Commit

c113304

·

verified ·

1 Parent(s): 52e080c

Update README.md

Files changed (1) hide show

README.md +26 -15

README.md CHANGED Viewed

@@ -10,29 +10,40 @@ pinned: false
 license: apache-2.0
 ---
-# Faster-Whisper Large-v3 — Arabic (Minimal)
-- تفريغ الصوت باستخدام `faster-whisper` مع `word_timestamps=True`.
-- إرجاع:
-  - نص كامل مجمّع من الكلمات.
-  - نص المقاطع مع الزمن.
-  - جدول كلمات واحتمالاتها.
-  - تقرير JSON صغير.
-> **ملاحظة**: نموذج `large-v3` يفضّل GPU Space.
-## API
-Endpoint اسمه `/transcribe`:
 ```python
 from gradio_client import Client, file
 client = Client("<username>/<space_name>")
-transcript, segments, report, table = client.predict(
     audio=file("audio.wav"),
-    model_name="large-v3",
-    compute_type="float16",
     vad=True,
-    api_name="/transcribe"
 )
-print(report)

 license: apache-2.0
 ---
+# Samaali — Whisper ASR Post-Processing (Arabic)
+- Transcribes audio with **faster-whisper** (word timestamps + probabilities)
+- Aligns with the original text and distinguishes **ASR errors** vs **memorization errors**
+- Restores ASR errors to the ground-truth and computes:
+  - **Literal score** (Levenshtein + word-overlap + BLEU-1)
+  - **Semantic score** (SBERT + MARBERT-CLS)
+## Usage
+1. Upload/record audio and paste the **Original Text**.
+2. Pick Whisper size (`large-v3` on GPU, `small/medium` on CPU).
+3. Click **Transcribe & Evaluate**.
+Outputs:
+- **Corrected Transcript** (ASR-only corrections applied)
+- **Raw ASR Transcript**
+- **JSON Report** (scores & thresholds)
+- **Token-level decisions table**
+## API (Spaces Inference)
+Two endpoints are exposed:
+### 1) `/run/evaluate` (UI-equivalent)
+**Python**
 ```python
 from gradio_client import Client, file
 client = Client("<username>/<space_name>")
+corrected, asr_out, report, table = client.predict(
     audio=file("audio.wav"),
+    original_text="النص الأصلي...",
+    whisper_size="small",
+    compute_type="int8",
     vad=True,
+    use_marbert=False,   # True if GPU
+    api_name="/evaluate"
 )
+print(report)  # JSON