Spaces:
Running
Running
Update templates/index.html
Browse files- templates/index.html +66 -6
templates/index.html
CHANGED
|
@@ -57,16 +57,76 @@
|
|
| 57 |
<div class="container py-5">
|
| 58 |
<h3>Welcome to the Speech-to-Speech Model Evaluation</h3>
|
| 59 |
|
| 60 |
-
<div id="evaluation-info" class="mb-
|
| 61 |
-
<p>
|
| 62 |
<strong>Welcome to the Speech-to-Speech (S2S) Model Evaluation!</strong>
|
| 63 |
<br><br>
|
| 64 |
-
In this evaluation, you will assess the performance of
|
| 65 |
<strong>ChatGPT-4o</strong>, <strong>FunAudioLLM</strong>, <strong>SpeechGPT</strong>, and
|
| 66 |
-
<strong>Mini-Omni</strong>.
|
|
|
|
| 67 |
<br><br>
|
| 68 |
-
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
</p>
|
| 71 |
</div>
|
| 72 |
|
|
|
|
| 57 |
<div class="container py-5">
|
| 58 |
<h3>Welcome to the Speech-to-Speech Model Evaluation</h3>
|
| 59 |
|
| 60 |
+
<div id="evaluation-info" class="mb-5">
|
| 61 |
+
<p class="text-start">
|
| 62 |
<strong>Welcome to the Speech-to-Speech (S2S) Model Evaluation!</strong>
|
| 63 |
<br><br>
|
| 64 |
+
In this evaluation, you will assess the performance of 4 S2S models:
|
| 65 |
<strong>ChatGPT-4o</strong>, <strong>FunAudioLLM</strong>, <strong>SpeechGPT</strong>, and
|
| 66 |
+
<strong>Mini-Omni</strong>.
|
| 67 |
+
The goal is to evaluate how well these models handle various speech tasks across different domains.
|
| 68 |
<br><br>
|
| 69 |
+
Once you select a specific domain and task (e.g., <em>Educational Tutoring</em> and <em>Rhythm Control</em>),
|
| 70 |
+
you will proceed to the evaluation stage. In each round, you will be presented with an audio input.
|
| 71 |
+
For example:
|
| 72 |
+
<br><br>
|
| 73 |
+
|
| 74 |
+
<!-- Left-aligned Audio Sample and Audio Control -->
|
| 75 |
+
<span style="vertical-align: middle; line-height: 1.2; display: inline-block;"><strong>Audio Sample:</strong></span>
|
| 76 |
+
<audio controls style="vertical-align: middle;">
|
| 77 |
+
<source src="/static/audio/sample/input_audio.wav" type="audio/wav">
|
| 78 |
+
</audio>
|
| 79 |
+
|
| 80 |
+
<br><br>
|
| 81 |
+
The corresponding text is:
|
| 82 |
+
<em>"Say the following sentence at my speed first, then say it again very slowly:
|
| 83 |
+
'Artificial intelligence is changing the world in many ways.'" </em>
|
| 84 |
+
<small>(Note: the audio plays at 1.5x the normal speed.)</small>
|
| 85 |
+
<br><br>
|
| 86 |
+
The responses of different S2S models will be provided, and your task is to choose which response best follows
|
| 87 |
+
the instructions. For example<small>(Note: During the evaluation process, you will be provided with responses from only the two models that have the most comparative significance.)</small>:
|
| 88 |
+
<br><br>
|
| 89 |
+
|
| 90 |
+
<!-- ChatGPT-4o Output -->
|
| 91 |
+
<span><strong>ChatGPT-4o:</strong></span>
|
| 92 |
+
<audio controls style="vertical-align: middle;">
|
| 93 |
+
<source src="/static/audio/sample/4o_audio.wav" type="audio/wav">
|
| 94 |
+
</audio>
|
| 95 |
+
<p class="text-start" style="margin-left: 20px;">
|
| 96 |
+
<strong>Performance:</strong> Speech: Partially followed the instruction on speed. Semantics: Accurately followed the instruction, with no semantic deviation or missing information.
|
| 97 |
+
</p>
|
| 98 |
+
|
| 99 |
+
<!-- FunAudioLLM Output -->
|
| 100 |
+
<span><strong>FunAudioLLM:</strong></span>
|
| 101 |
+
<audio controls style="vertical-align: middle;">
|
| 102 |
+
<source src="/static/audio/sample/FunAudio_audio.wav" type="audio/wav">
|
| 103 |
+
</audio>
|
| 104 |
+
<p class="text-start" style="margin-left: 20px;">
|
| 105 |
+
<strong>Performance:</strong> Speech: Partially followed the instruction on speed. Semantics: Accurately followed the instruction, with no semantic deviation or missing information.
|
| 106 |
+
</p>
|
| 107 |
+
|
| 108 |
+
<!-- SpeechGPT Output -->
|
| 109 |
+
<span><strong>SpeechGPT:</strong></span>
|
| 110 |
+
<audio controls style="vertical-align: middle;">
|
| 111 |
+
<source src="/static/audio/sample/SpeechGPT.wav" type="audio/wav">
|
| 112 |
+
</audio>
|
| 113 |
+
<p class="text-start" style="margin-left: 20px;">
|
| 114 |
+
<strong>Performance:</strong> Speech: Did not follow the instruction on speed. Semantics: Partially followed the instruction, with minor semantic deviation and missing information.
|
| 115 |
+
</p>
|
| 116 |
+
|
| 117 |
+
<!-- Mini-Omni Output -->
|
| 118 |
+
<span><strong>Mini-Omni:</strong></span>
|
| 119 |
+
<audio controls style="vertical-align: middle;">
|
| 120 |
+
<source src="/static/audio/sample/mini-omni.wav" type="audio/wav">
|
| 121 |
+
</audio>
|
| 122 |
+
<p class="text-start" style="margin-left: 20px;">
|
| 123 |
+
<strong>Performance:</strong> Speech: Did not follow the instruction on speed. Semantics: Did not follow the instruction, with significant semantic deviation and missing information.
|
| 124 |
+
</p>
|
| 125 |
+
|
| 126 |
+
<p class="text-start">
|
| 127 |
+
After making your choice, you'll proceed to the next round.
|
| 128 |
+
</p>
|
| 129 |
+
<strong>Please enter your username and start the evaluation!</strong>
|
| 130 |
</p>
|
| 131 |
</div>
|
| 132 |
|