Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,33 +14,52 @@ pipeline_tag: automatic-speech-recognition
|
|
| 14 |
|
| 15 |
# whisper-base-it
|
| 16 |
|
| 17 |
-
Fine-tuned [openai/whisper-base](https://huggingface.co/openai/whisper-base) for Italian automatic speech recognition (ASR).
|
| 18 |
|
| 19 |
**Author:** Ettore Di Giacinto
|
| 20 |
|
| 21 |
Brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team. This model can be used directly with [LocalAI](https://localai.io).
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
## Training Details
|
| 24 |
|
| 25 |
-
- **Base model:** openai/whisper-base
|
| 26 |
- **Dataset:** Common Voice 25.0 Italian (173k train, 15k dev, 15k test)
|
| 27 |
-
- **Steps:** 10,000
|
| 28 |
-
- **
|
|
|
|
| 29 |
|
| 30 |
## Usage
|
| 31 |
|
| 32 |
### Transformers
|
| 33 |
|
| 34 |
-
|
| 35 |
from transformers import pipeline
|
|
|
|
| 36 |
pipe = pipeline("automatic-speech-recognition", model="LocalAI-io/whisper-base-it")
|
| 37 |
result = pipe("audio.mp3", generate_kwargs={"language": "it", "task": "transcribe"})
|
| 38 |
print(result["text"])
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
###
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
## Links
|
| 46 |
|
|
|
|
| 14 |
|
| 15 |
# whisper-base-it
|
| 16 |
|
| 17 |
+
Fine-tuned [openai/whisper-base](https://huggingface.co/openai/whisper-base) (74M params) for Italian automatic speech recognition (ASR).
|
| 18 |
|
| 19 |
**Author:** Ettore Di Giacinto
|
| 20 |
|
| 21 |
Brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team. This model can be used directly with [LocalAI](https://localai.io).
|
| 22 |
|
| 23 |
+
## Results
|
| 24 |
+
|
| 25 |
+
Evaluated on Common Voice 25.0 Italian test set (15,184 samples):
|
| 26 |
+
|
| 27 |
+
| Step | WER |
|
| 28 |
+
|------|-----|
|
| 29 |
+
| 1000 | 26.5% |
|
| 30 |
+
| 2000 | 24.0% |
|
| 31 |
+
| 3000 | 22.4% |
|
| 32 |
+
| 5000 | 20.6% |
|
| 33 |
+
| 7000 | 19.9% |
|
| 34 |
+
| 10000 | **19.2%** |
|
| 35 |
+
|
| 36 |
## Training Details
|
| 37 |
|
| 38 |
+
- **Base model:** openai/whisper-base (74M parameters)
|
| 39 |
- **Dataset:** Common Voice 25.0 Italian (173k train, 15k dev, 15k test)
|
| 40 |
+
- **Steps:** 10,000 (batch size 32, ~1.8 epochs)
|
| 41 |
+
- **Learning rate:** 1e-5 with 500 warmup steps
|
| 42 |
+
- **Precision:** bf16 on NVIDIA GB10
|
| 43 |
|
| 44 |
## Usage
|
| 45 |
|
| 46 |
### Transformers
|
| 47 |
|
| 48 |
+
```python
|
| 49 |
from transformers import pipeline
|
| 50 |
+
|
| 51 |
pipe = pipeline("automatic-speech-recognition", model="LocalAI-io/whisper-base-it")
|
| 52 |
result = pipe("audio.mp3", generate_kwargs={"language": "it", "task": "transcribe"})
|
| 53 |
print(result["text"])
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
### CTranslate2 / faster-whisper
|
| 57 |
+
|
| 58 |
+
For optimized CPU inference, use the INT8 quantized version: [LocalAI-io/whisper-base-it-ct2-int8](https://huggingface.co/LocalAI-io/whisper-base-it-ct2-int8) (79MB).
|
| 59 |
|
| 60 |
+
### LocalAI
|
| 61 |
|
| 62 |
+
This model is compatible with [LocalAI](https://github.com/mudler/LocalAI) for local, self-hosted AI inference.
|
| 63 |
|
| 64 |
## Links
|
| 65 |
|