Spaces:

tvosch
/

Dutch-ASR-Leaderboard

Running

App Files Files Community

tvosch commited on 12 days ago

Commit

dcdbf5e

1 Parent(s): 7f2ce48

RTX with batch size of 1

Browse files

Files changed (1) hide show

app.py +3 -4

app.py CHANGED Viewed

@@ -128,7 +128,7 @@ Local models are benchmarked on **H100 HBM2e GPUs** for consistent performance m
 **Word Error Rate (WER)** measures the percentage of words transcribed incorrectly compared to a reference transcript. It is calculated as `(substitutions + deletions + insertions) / total reference words × 100`. Lower is better. Results are normalized before scoring: lowercase, no punctuation, digits expanded to words, fillers removed.
-**Real-Time Factor (RTF)** measures how fast a model transcribes relative to the audio duration. An RTF of 0.1 means 1 second of audio is processed in 100 ms. Lower is faster. RTF measured via HTTP API includes network overhead.
 ## Datasets
@@ -145,10 +145,9 @@ with gr.Blocks(title="Dutch ASR Leaderboard", theme=gr.themes.Default()) as demo
         "# Dutch ASR Leaderboard\n"
         "**An independent, community-driven benchmark for Dutch automatic speech recognition.**  \n"
         "Models are evaluated on standardized public test sets. Lower WER is better. "
-        "Rankings serve as a proxy for comparison — performance on your data may differ.\n\n"
-        "> **Note:** Some models may be benchmaxxed — trained or fine-tuned on data that overlaps "
         "with these test sets. Treat results as indicative, not definitive. "
-        "How models compare here may not reflect how they perform on your specific domain, audio conditions, or use case.\n\n"
         "[Submit your model on GitHub →](https://github.com/tvosch/Dutch-ASR-leaderboard)"
     )

 **Word Error Rate (WER)** measures the percentage of words transcribed incorrectly compared to a reference transcript. It is calculated as `(substitutions + deletions + insertions) / total reference words × 100`. Lower is better. Results are normalized before scoring: lowercase, no punctuation, digits expanded to words, fillers removed.
+**Real-Time Factor (RTF)** measures how fast a model transcribes relative to the audio duration. An RTF of 0.1 means 1 second of audio is processed in 100 ms. Lower is faster. Measured here at batch size 1; RTF measured via HTTP API includes network overhead.
 ## Datasets
         "# Dutch ASR Leaderboard\n"
         "**An independent, community-driven benchmark for Dutch automatic speech recognition.**  \n"
         "Models are evaluated on standardized public test sets. Lower WER is better. "
+        "Rankings serve as a proxy for comparison, performance on your data may differ.\n\n"
+        "> **Note:** Some models may be benchmaxxed: trained or fine-tuned on data that overlaps "
         "with these test sets. Treat results as indicative, not definitive. "
         "[Submit your model on GitHub →](https://github.com/tvosch/Dutch-ASR-leaderboard)"
     )