Error with loading tokenizer.

by lilias - opened Oct 20, 2024

Oct 20, 2024

Hello,
I am using: AutoProcessor.from_pretrained("utter-project/mHuBERT-147")

I receive this error:
OSError: Can't load tokenizer for 'utter-project/mHuBERT-147'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'utter-project/mHuBERT-147' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Oct 22, 2024

Hello,

Thanks for the interest in using our model.
There is no tokenizer associated to this release, as mHuBERT-147 is not an ASR model. It is a speech representation model.

You can, however, use it to train an ASR system, if you want. :)

All the best,

mzboito changed discussion status to closed Oct 22, 2024

lilias

Oct 22, 2024

Hello,
Thank you for your answer.
I don't use it as an ASR model.

So my question is:
self.feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("utter-project/mHuBERT-147")
self.hubert = HubertModel.from_pretrained("utter-project/mHuBERT-147")

Do I have to use it in this way?

mzboito

UTTER - Unified Transcription and Translation for Extended Reality org Oct 22, 2024

I'm not sure what you want to do, but the code you sent is correct. It loads the pretrained model correctly!

clementruhm

Jan 30, 2025

I think they try to follow generic Hubert class documentation: https://huggingface.co/docs/transformers/en/model_doc/hubert#transformers.HubertModel.forward.example

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment