atlithor/talromur3_without_emotions
Viewer • Updated • 15.1k • 17
How to use atlithor/RepeaTTS-level-1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-to-speech", model="atlithor/RepeaTTS-level-1") # Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("atlithor/RepeaTTS-level-1", dtype="auto")See Emotive Icelandic for more information about this model and the data that it is trained on. The RepeaTTS series is trained on the same data as Emotive Icelandic, but without emotive content disclosure.
This model, level-1, corresponds to a model without any further refinement fine-tuning.
Use the code below to get started with the model.
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("atlithor/RepeaTTS-level-1").to(device)
tokenizer = AutoTokenizer.from_pretrained("atlithor/EmotiveIcelandic")
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
prompt = "Þetta er frábær hugmynd!" # E: this is a great idea!
description = "The recording is of very high quality, with Ingrid's voice sounding clear and very close up."
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("ingrid.wav", audio_arr, model.config.sampling_rate)
coming later
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Base model
parler-tts/parler-tts-mini-multilingual-v1.1