whisper-small-it / README.md
mudler's picture
Upload README.md with huggingface_hub
74732d9 verified
metadata
language: it
license: mit
tags:
  - whisper
  - automatic-speech-recognition
  - italian
  - localai
datasets:
  - mozilla-foundation/common_voice_25_0
base_model: openai/whisper-small
pipeline_tag: automatic-speech-recognition

whisper-small-it

Fine-tuned openai/whisper-small (244M params) for Italian ASR.

Author: Ettore Di Giacinto

Brought to you by the LocalAI team. This model can be used directly with LocalAI.

Usage with LocalAI

This model is ready to use with LocalAI via the whisperx backend.

Save the following as whisperx-small-it.yaml in your LocalAI models directory:

name: whisperx-small-it
backend: whisperx
known_usecases:
  - transcript
parameters:
  model: LocalAI-io/whisper-small-it-ct2-int8
  language: it

Then transcribe audio via the OpenAI-compatible endpoint:

curl http://localhost:8080/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F file="@audio.mp3" \
  -F model="whisperx-small-it"

Results

Evaluated on Common Voice 25.0 Italian test set (15,184 samples):

Step WER
1000 18.36%
3000 15.45%
5000 14.58%
7000 13.61%
10000 13.0%

Training Details

  • Base model: openai/whisper-small (244M parameters)
  • Dataset: Common Voice 25.0 Italian (173k train, 15k dev, 15k test)
  • Steps: 10,000
  • Precision: bf16 on NVIDIA GB10

Usage

Transformers

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="LocalAI-io/whisper-small-it")
result = pipe("audio.mp3", generate_kwargs={"language": "it", "task": "transcribe"})
print(result["text"])

CTranslate2 / faster-whisper

For optimized CPU inference: LocalAI-io/whisper-small-it-ct2-int8

Links