--- base_model: unsloth/csm-1b pipeline_tag: text-to-speech tags: - base_model:adapter:unsloth/csm-1b - lora - transformers - unsloth license: apache-2.0 language: - el new_version: moiraai2024/GreekTTS-1.5 --- # Description Website: https://moira-ai.com/ Email: moira.ai2024@gmail.com Report: https://moiraai2024.github.io/GreekTTS-demo/ Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation. Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text. Key Features: - Fine-tuned specifically for Greek TTS. - Built on the robust sesame/csm-1b model, ensuring high-quality performance. - Capable of generating natural-sounding, expressive Greek speech. - Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek. **Explore the model and see how it can enhance your Greek TTS applications!** # How to use it https://docs.unsloth.ai/get-started/install-and-update/conda-install ```python conda create --name unsloth_env \ python=3.11 \ pytorch-cuda=12.1 \ pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \ -y ``` ``` conda activate unsloth_env ``` ``` pip install unsloth ``` ```python from unsloth import FastModel from transformers import CsmForConditionalGeneration import torch gpu_stats = torch.cuda.get_device_properties(0) start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") print(f"{start_gpu_memory} GB of memory reserved.") from unsloth import FastLanguageModel as FastModel from peft import PeftModel from IPython.display import Audio # --- 1. Load the Base Unsloth Model and Processor --- # This setup must be identical to your training script. print("Loading the base model and processor...") model, processor = FastModel.from_pretrained( model_name = "unsloth/csm-1b", max_seq_length = 2048, dtype = None, auto_model = CsmForConditionalGeneration, load_in_4bit = False, ) # --- 2. Identify and Load Your Best LoRA Checkpoint --- # !!! IMPORTANT: Change this path to your best checkpoint folder !!! # (The one you found in trainer_state.json) int_check = 30_000 final_int =94_764 best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int) print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}") # This command seamlessly merges your trained adapter weights onto the base model model = PeftModel.from_pretrained(model, best_checkpoint_path) print("\nFine-tuned model is ready for inference!") # Unsloth automatically handles moving the model to the GPU ``` ```python from transformers import AutoProcessor processor = AutoProcessor.from_pretrained("unsloth/csm-1b") ``` ```python greek_sentences = [ "Σου μιλάααανε!", "Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.", "Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!", "Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες", ] ``` ```python from IPython.display import Audio, display import soundfile as sf ``` ```python # --- Configure the Generation --- int_ = 1 text_to_synthesize = greek_sentences[int_] print(f"\nSynthesizing text: '{text_to_synthesize}'") speaker_id = 0 inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda") audio_values = model.generate( **inputs, max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this # play with these parameters to tweak results # depth_decoder_top_k=0, # depth_decoder_top_p=0.9, # depth_decoder_do_sample=True, # depth_decoder_temperature=0.9, # top_k=0, # top_p=1.0, # temperature=0.9, # do_sample=True, ######################################################### output_audio=True ) ``` ```python audio = audio_values[0].to(torch.float32).cpu().numpy() sf.write("example_without_context.wav", audio, 24000) display(Audio(audio, rate=24000)) ``` # 📖 How to Cite This Model ``` @misc{moira2025greektts15, title = {GreekTTS-1.0: A State-of-the-Art System for Greek Text-to-Speech Synthesis}, author = {Moira.AI}, year = {2025}, month = {sep}, day = {22}, url = {https://moira-ai.com/}, note = {Demo report: https://moiraai2024.github.io/GreekTTS-demo/} } ```