--- base_model: unsloth/csm-1b library_name: peft license: mit datasets: - Dev372/Medical_STT_Dataset_1.0 language: - en pipeline_tag: text-to-speech tags: - unsloth - trl - transformers --- # Model Card for Model ID ## Model Details ### Model Description This model is a fine-tuned version of csm-1B for medical text-to-speech tasks. It was trained on a curated dataset of ~2,000 medical text-to-speech pairs, focusing on clinical terminology, healthcare instructions, and patient–doctor communication scenarios. - **Fine-tuned for:** Medical-domain text-to-speech synthesis - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model :** csm-1b ## Uses ### Direct Use - Generating synthetic speech from medical text for research, prototyping, and educational purposes - Assisting in medical transcription-to-speech applications - Supporting voice-based healthcare assistants ## Bias, Risks, and Limitations - The model is not a substitute for professional medical advice. - Trained on a relatively small dataset (~2K samples) → performance may be limited outside the fine-tuned domain. - Bias & hallucinations: The model may mispronounce rare terms or produce inaccurate speech in critical scenarios. - Should not be used in real clinical decision-making without proper validation. ## How to Get Started with the Model Use the code below to get started with the model. ```python import torch from transformers import CsmForConditionalGeneration, AutoProcessor import soundfile as sf from peft import PeftModel model_id = "unsloth/csm-1b" device = "cuda" if torch.cuda.is_available() else "cpu" processor = AutoProcessor.from_pretrained(model_id) base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device) model = PeftModel.from_pretrained(base_model, "khazarai/Medical-TTS") text = "Mild dorsal angulation of the distal radius reflective of the fracture." speaker_id = 0 conversation = [ {"role": str(speaker_id), "content": [{"type": "text", "text": text}]}, ] audio_values = model.generate( **processor.apply_chat_template( conversation, tokenize=True, return_dict=True, ).to("cuda"), max_new_tokens=650, # play with these parameters to tweak results # depth_decoder_top_k=0, # depth_decoder_top_p=0.9, # depth_decoder_do_sample=True, # depth_decoder_temperature=0.9, # top_k=0, # top_p=1.0, # temperature=0.9, # do_sample=True, ######################################################### output_audio=True ) audio = audio_values[0].to(torch.float32).cpu().numpy() sf.write("example.wav", audio, 24000) ``` ### Framework versions - PEFT 0.15.2