Dev372/Medical_STT_Dataset_1.0
Viewer • Updated • 2.22k • 7 • 2
How to use khazarai/Medical-TTS with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/csm-1b")
model = PeftModel.from_pretrained(base_model, "khazarai/Medical-TTS")How to use khazarai/Medical-TTS with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-to-speech", model="khazarai/Medical-TTS") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("khazarai/Medical-TTS", dtype="auto")How to use khazarai/Medical-TTS with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for khazarai/Medical-TTS to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for khazarai/Medical-TTS to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for khazarai/Medical-TTS to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="khazarai/Medical-TTS",
max_seq_length=2048,
)This model is a fine-tuned version of csm-1B for medical text-to-speech tasks. It was trained on a curated dataset of ~2,000 medical text-to-speech pairs, focusing on clinical terminology, healthcare instructions, and patient–doctor communication scenarios.
Use the code below to get started with the model.
import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
import soundfile as sf
from peft import PeftModel
model_id = "unsloth/csm-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(model_id)
base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(base_model, "khazarai/Medical-TTS")
text = "Mild dorsal angulation of the distal radius reflective of the fracture."
speaker_id = 0
conversation = [
{"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
audio_values = model.generate(
**processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to("cuda"),
max_new_tokens=650,
# play with these parameters to tweak results
# depth_decoder_top_k=0,
# depth_decoder_top_p=0.9,
# depth_decoder_do_sample=True,
# depth_decoder_temperature=0.9,
# top_k=0,
# top_p=1.0,
# temperature=0.9,
# do_sample=True,
#########################################################
output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example.wav", audio, 24000)