Instructions to use facebook/mms-1b-all with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/mms-1b-all with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="facebook/mms-1b-all")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("facebook/mms-1b-all") model = AutoModelForCTC.from_pretrained("facebook/mms-1b-all") - Notebooks
- Google Colab
- Kaggle
timestamp decoding
#9
by StephennFernandes - opened
Hi there, is there a way to let mms have timestamp decoding similar to openai whisper models ?
Yep, easiest done with the pipeline. For character level timestamps:
from transformers import pipeline
transcriber = pipeline(model="facebook/mms-1b-all")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", return_timestamps="char")
For word-level timestamps:
from transformers import pipeline
transcriber = pipeline(model="facebook/mms-1b-all")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", return_timestamps="word")
See docs for more details: https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline
hey @sanchit-gandhi thanks a ton for taking the time to reply.
could you please tell me how i could do this in the regular inference mode as well
eg:
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)