Spaces:
Running
Running
RTX with batch size of 1
Browse files
app.py
CHANGED
|
@@ -128,7 +128,7 @@ Local models are benchmarked on **H100 HBM2e GPUs** for consistent performance m
|
|
| 128 |
|
| 129 |
**Word Error Rate (WER)** measures the percentage of words transcribed incorrectly compared to a reference transcript. It is calculated as `(substitutions + deletions + insertions) / total reference words × 100`. Lower is better. Results are normalized before scoring: lowercase, no punctuation, digits expanded to words, fillers removed.
|
| 130 |
|
| 131 |
-
**Real-Time Factor (RTF)** measures how fast a model transcribes relative to the audio duration. An RTF of 0.1 means 1 second of audio is processed in 100 ms. Lower is faster. RTF measured via HTTP API includes network overhead.
|
| 132 |
|
| 133 |
## Datasets
|
| 134 |
|
|
@@ -145,10 +145,9 @@ with gr.Blocks(title="Dutch ASR Leaderboard", theme=gr.themes.Default()) as demo
|
|
| 145 |
"# Dutch ASR Leaderboard\n"
|
| 146 |
"**An independent, community-driven benchmark for Dutch automatic speech recognition.** \n"
|
| 147 |
"Models are evaluated on standardized public test sets. Lower WER is better. "
|
| 148 |
-
"Rankings serve as a proxy for comparison
|
| 149 |
-
"> **Note:** Some models may be benchmaxxed
|
| 150 |
"with these test sets. Treat results as indicative, not definitive. "
|
| 151 |
-
"How models compare here may not reflect how they perform on your specific domain, audio conditions, or use case.\n\n"
|
| 152 |
"[Submit your model on GitHub →](https://github.com/tvosch/Dutch-ASR-leaderboard)"
|
| 153 |
)
|
| 154 |
|
|
|
|
| 128 |
|
| 129 |
**Word Error Rate (WER)** measures the percentage of words transcribed incorrectly compared to a reference transcript. It is calculated as `(substitutions + deletions + insertions) / total reference words × 100`. Lower is better. Results are normalized before scoring: lowercase, no punctuation, digits expanded to words, fillers removed.
|
| 130 |
|
| 131 |
+
**Real-Time Factor (RTF)** measures how fast a model transcribes relative to the audio duration. An RTF of 0.1 means 1 second of audio is processed in 100 ms. Lower is faster. Measured here at batch size 1; RTF measured via HTTP API includes network overhead.
|
| 132 |
|
| 133 |
## Datasets
|
| 134 |
|
|
|
|
| 145 |
"# Dutch ASR Leaderboard\n"
|
| 146 |
"**An independent, community-driven benchmark for Dutch automatic speech recognition.** \n"
|
| 147 |
"Models are evaluated on standardized public test sets. Lower WER is better. "
|
| 148 |
+
"Rankings serve as a proxy for comparison, performance on your data may differ.\n\n"
|
| 149 |
+
"> **Note:** Some models may be benchmaxxed: trained or fine-tuned on data that overlaps "
|
| 150 |
"with these test sets. Treat results as indicative, not definitive. "
|
|
|
|
| 151 |
"[Submit your model on GitHub →](https://github.com/tvosch/Dutch-ASR-leaderboard)"
|
| 152 |
)
|
| 153 |
|