Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ metrics:
|
|
| 20 |
# Description
|
| 21 |
|
| 22 |
This model is a Reward Model trained on the [RobotsMali transcription scorer dataset](https://huggingface.co/datasets/RobotsMali/transcription-scorer), where the scores were assigned by human annotators.
|
| 23 |
-
It predicts a continuous score between 0 and 1 for a pair (audio
|
| 24 |
|
| 25 |
The model can be integrated as a Reward Model within RLHF pipelines to evaluate or fine-tune ASR models based on human preference scores.
|
| 26 |
|
|
@@ -34,7 +34,7 @@ The model consists of two main encoders — one for audio and one for text — f
|
|
| 34 |
### Audio Encoder
|
| 35 |
|
| 36 |
Input: Raw waveform (16 kHz)
|
| 37 |
-
Feature extraction: Mel-spectrogram computed from waveform using WhisperFeatureExtractor
|
| 38 |
|
| 39 |
Parameters:
|
| 40 |
- n_fft: 1024
|
|
|
|
| 20 |
# Description
|
| 21 |
|
| 22 |
This model is a Reward Model trained on the [RobotsMali transcription scorer dataset](https://huggingface.co/datasets/RobotsMali/transcription-scorer), where the scores were assigned by human annotators.
|
| 23 |
+
It predicts a continuous score between 0 and 1 for a pair (**audio**, **text**), representing how well the text matches the spoken audio.
|
| 24 |
|
| 25 |
The model can be integrated as a Reward Model within RLHF pipelines to evaluate or fine-tune ASR models based on human preference scores.
|
| 26 |
|
|
|
|
| 34 |
### Audio Encoder
|
| 35 |
|
| 36 |
Input: Raw waveform (16 kHz)
|
| 37 |
+
Feature extraction: Mel-spectrogram computed from waveform using ***WhisperFeatureExtractor***
|
| 38 |
|
| 39 |
Parameters:
|
| 40 |
- n_fft: 1024
|