You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for whisper-small-formosan-all

This model is a fine-tuned version of the Taiwanese indigenous openai/whisper-small.
Note: we use indonesian as whisper language id

Training process

The training of the model was performed with the following hyperparameters

  • Batch size: 32*4 (on 4 L40s GPU)
  • Gradient accumulation steps: 8
  • Total steps: 1600
  • Learning rate: 1.25e-5
  • Data augmentation: No
  • Optimizer: schedule_free_adamw
  • LR scheduler type: constant

How to use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "formospeech/whisper-small-formosan-all"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    torch_dtype=torch_dtype,
    device=device,
)
generate_kwargs = {"language": "id"}
transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs)
print(transcription)
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for formospeech/whisper-small-formosan-all

Finetuned
(3518)
this model