mms-tts-lif

#1
by SystemSolution21 - opened

from transformers import VitsModel, AutoTokenizer
import torch
import scipy
from IPython.display import Audio

Initialize Model

model = VitsModel.from_pretrained("facebook/mms-tts-lif")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-lif")

Input Limbu text

text = """
ᤛᤣᤘᤠᤖᤥ ᤀᤠᤍᤠᤱᤒᤠ ᤀᤠᤍᤠᤱᤔᤠᤛᤣ ᤀᤠᤏᤡ ᤕᤠᤰᤌᤢᤱ ᤛᤡᤰᤁᤢᤶ ᤏᤡᤱᤘᤠ ᤋᤅᤡ ᤖᤥ ᤀᤠᤏᤡ ᤕᤠᤰᤌᤢᤱ ᤐᤠᤏ᤻ᤍᤠᤱᤜᤠ ᤐᤠᤖᤡᤖᤥ ᤛᤰᤛᤰᤜᤠ
ᤏᤡᤖᤢᤶᤗᤥ ᤛᤡᤖᤡᤈᤱᤃᤠ ᤛᤠᤵᤔᤢᤴᤎᤢᤶᤜᤠ ᤜᤢ ᤀᤠᤛᤡᤖᤥ ᤛᤡᤖᤡᤈᤱᤃᤠ ᤐᤠᤏ᤻ᤍᤠᤱᤅᤡᤴᤃ ᤀᤠᤏᤡ ᤕᤠᤰᤌᤢᤱ ᤔᤠ᤺ᤐᤠᤏ᤻ᤗᤥ
"""
inputs = tokenizer(text, return_tensors="pt")

Call model

with torch.no_grad():
output = model(**inputs).waveform

Output save as .wav file

scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output)

Display output

Audio(output, rate=model.config.sampling_rate)

================================================================================================
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

Cast input_ids to long tensor

inputs['input_ids'] = inputs['input_ids'].type(torch.LongTensor)

RuntimeError: The input size 0, plus negative padding 0 and 0 resulted in a negative output size, which is invalid. Check dimension 1 of your input.

Sign up or log in to comment