You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for whisper-small-formosan-all

This model is a fine-tuned version of the Taiwanese indigenous openai/whisper-small.
Note: we use indonesian as whisper language id

Training process

The training of the model was performed with the following hyperparameters

Batch size: 32*4 (on 4 L40s GPU)
Gradient accumulation steps: 8
Total steps: 1600
Learning rate: 1.25e-5
Data augmentation: No
Optimizer: schedule_free_adamw
LR scheduler type: constant

How to use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "formospeech/whisper-small-formosan-all"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    torch_dtype=torch_dtype,
    device=device,
)
generate_kwargs = {"language": "id"}
transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs)
print(transcription)

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for formospeech/whisper-small-formosan-all

Base model

openai/whisper-small

Finetuned

(3671)

this model