MuhammadHijazii commited on
Commit
301f516
·
verified ·
1 Parent(s): 15e02f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -34
README.md CHANGED
@@ -10,40 +10,15 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- # Samaali — Whisper ASR Post-Processing (Arabic)
14
 
15
- - Transcribes audio with **faster-whisper** (word timestamps + probabilities)
16
- - Aligns with the original text and distinguishes **ASR errors** vs **memorization errors**
17
- - Restores ASR errors to the ground-truth and computes:
18
- - **Literal score** (Levenshtein + word-overlap + BLEU-1)
19
- - **Semantic score** (SBERT + MARBERT-CLS)
20
 
21
- ## Usage
22
- 1. Upload/record audio and paste the **Original Text**.
23
- 2. Pick Whisper size (`large-v3` on GPU, `small/medium` on CPU).
24
- 3. Click **Transcribe & Evaluate**.
25
 
26
- Outputs:
27
- - **Corrected Transcript** (ASR-only corrections applied)
28
- - **Raw ASR Transcript**
29
- - **JSON Report** (scores & thresholds)
30
- - **Token-level decisions table**
31
-
32
- ## API (Spaces Inference)
33
- Two endpoints are exposed:
34
-
35
- ### 1) `/run/evaluate` (UI-equivalent)
36
- **Python**
37
- ```python
38
- from gradio_client import Client, file
39
- client = Client("<username>/<space_name>")
40
- corrected, asr_out, report, table = client.predict(
41
- audio=file("audio.wav"),
42
- original_text="النص الأصلي...",
43
- whisper_size="small",
44
- compute_type="int8",
45
- vad=True,
46
- use_marbert=False, # True if GPU
47
- api_name="/evaluate"
48
- )
49
- print(report) # JSON
 
10
  license: apache-2.0
11
  ---
12
 
13
+ ## Samaali — Whisper ASR Post-Processing (Arabic)
14
 
15
+ - Word-level timestamps & probabilities (faster-whisper)
16
+ - Alignment to GT + ASR-vs-Memorization classification
17
+ - Confidence gating + Numbers handling
18
+ - Literal & Semantic scores
 
19
 
20
+ ### API
21
+ - `/run/evaluate` (UI outputs)
22
+ - `/run/predict` (JSON-only)
 
23
 
24
+ **Note (CPU Spaces):** the app enforces `whisper=small`, `compute=int8`, and disables MARBERT by default to avoid OOM.