timestamp decoding

by StephennFernandes - opened Jul 12, 2023

Discussion

StephennFernandes

Jul 12, 2023

Hi there, is there a way to let mms have timestamp decoding similar to openai whisper models ?

sanchit-gandhi

Jul 13, 2023

Yep, easiest done with the pipeline. For character level timestamps:

from transformers import pipeline

transcriber = pipeline(model="facebook/mms-1b-all")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", return_timestamps="char")

For word-level timestamps:

from transformers import pipeline

transcriber = pipeline(model="facebook/mms-1b-all")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", return_timestamps="word")

See docs for more details: https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline

StephennFernandes

Jul 13, 2023

hey @sanchit-gandhi thanks a ton for taking the time to reply.

could you please tell me how i could do this in the regular inference mode as well

eg:

inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment