Choosing the same voice for all audio generated files

#58
by abesimi - opened

Hi,

I am using lately csm-1b to produce audio in python and it works fine, but it seems to me that I cannot assign the same speaker for different executions.

Each time I run the script, a different voice is produced.

Any feedback to keep consistent speaker is welcomed.

Something like Google Gemini has, a set of enumerated speakers...

Here's my code.

def getAudioFromText(text: str, tempID: str) -> bool:
conversation = [
{"role": "0", "content": [{"type": "text", "text": text}]},
]
inputs = aprocessor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to(device)

# infer the model
try:
    audio = model.generate(**inputs, output_audio=True,)
    audio_url=os.path.join(tempID, f"output_audio.wav")

    aprocessor.save_audio(audio, audio_url, sampling_rate=24000, format="wav")
    #why the audio file is missing the last word or second?
    #fix by adding silence at the end of the audio  for 200 ms

    return True
except Exception as e:
    return False

Sign up or log in to comment