| | --- |
| | base_model: unsloth/csm-1b |
| | pipeline_tag: text-to-speech |
| | tags: |
| | - base_model:adapter:unsloth/csm-1b |
| | - lora |
| | - transformers |
| | - unsloth |
| | license: apache-2.0 |
| | language: |
| | - el |
| | new_version: moiraai2024/GreekTTS-1.5 |
| | --- |
| | |
| |
|
| | # Description |
| | Website: https://moira-ai.com/ |
| |
|
| | Email: moira.ai2024@gmail.com |
| |
|
| | Report: https://moiraai2024.github.io/GreekTTS-demo/ |
| |
|
| | Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation. |
| |
|
| | Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text. |
| |
|
| | Key Features: |
| |
|
| | - Fine-tuned specifically for Greek TTS. |
| | - Built on the robust sesame/csm-1b model, ensuring high-quality performance. |
| | - Capable of generating natural-sounding, expressive Greek speech. |
| | - Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek. |
| |
|
| | **Explore the model and see how it can enhance your Greek TTS applications!** |
| |
|
| |
|
| | # How to use it |
| | https://docs.unsloth.ai/get-started/install-and-update/conda-install |
| |
|
| |
|
| | ```python |
| | conda create --name unsloth_env \ |
| | python=3.11 \ |
| | pytorch-cuda=12.1 \ |
| | pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \ |
| | -y |
| | ``` |
| |
|
| | ``` |
| | conda activate unsloth_env |
| | ``` |
| | ``` |
| | pip install unsloth |
| | ``` |
| |
|
| | ```python |
| | from unsloth import FastModel |
| | from transformers import CsmForConditionalGeneration |
| | import torch |
| | |
| | gpu_stats = torch.cuda.get_device_properties(0) |
| | start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) |
| | max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) |
| | print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") |
| | print(f"{start_gpu_memory} GB of memory reserved.") |
| | |
| | from unsloth import FastLanguageModel as FastModel |
| | from peft import PeftModel |
| | from IPython.display import Audio |
| | |
| | # --- 1. Load the Base Unsloth Model and Processor --- |
| | # This setup must be identical to your training script. |
| | print("Loading the base model and processor...") |
| | model, processor = FastModel.from_pretrained( |
| | model_name = "unsloth/csm-1b", |
| | max_seq_length = 2048, |
| | dtype = None, |
| | auto_model = CsmForConditionalGeneration, |
| | load_in_4bit = False, |
| | ) |
| | |
| | # --- 2. Identify and Load Your Best LoRA Checkpoint --- |
| | # !!! IMPORTANT: Change this path to your best checkpoint folder !!! |
| | # (The one you found in trainer_state.json) |
| | int_check = 30_000 |
| | |
| | final_int =94_764 |
| | best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int) |
| | |
| | print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}") |
| | |
| | # This command seamlessly merges your trained adapter weights onto the base model |
| | model = PeftModel.from_pretrained(model, best_checkpoint_path) |
| | |
| | print("\nFine-tuned model is ready for inference!") |
| | # Unsloth automatically handles moving the model to the GPU |
| | ``` |
| |
|
| | ```python |
| | from transformers import AutoProcessor |
| | processor = AutoProcessor.from_pretrained("unsloth/csm-1b") |
| | ``` |
| |
|
| | ```python |
| | greek_sentences = [ |
| | "Σου μιλάααανε!", |
| | "Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.", |
| | "Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!", |
| | "Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες", |
| | ] |
| | ``` |
| |
|
| | ```python |
| | from IPython.display import Audio, display |
| | import soundfile as sf |
| | ``` |
| |
|
| | ```python |
| | # --- Configure the Generation --- |
| | |
| | int_ = 1 |
| | text_to_synthesize = greek_sentences[int_] |
| | |
| | print(f"\nSynthesizing text: '{text_to_synthesize}'") |
| | |
| | speaker_id = 0 |
| | inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda") |
| | |
| | audio_values = model.generate( |
| | **inputs, |
| | max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this |
| | # play with these parameters to tweak results |
| | # depth_decoder_top_k=0, |
| | # depth_decoder_top_p=0.9, |
| | # depth_decoder_do_sample=True, |
| | # depth_decoder_temperature=0.9, |
| | # top_k=0, |
| | # top_p=1.0, |
| | # temperature=0.9, |
| | # do_sample=True, |
| | ######################################################### |
| | output_audio=True |
| | ) |
| | ``` |
| |
|
| | ```python |
| | audio = audio_values[0].to(torch.float32).cpu().numpy() |
| | sf.write("example_without_context.wav", audio, 24000) |
| | display(Audio(audio, rate=24000)) |
| | ``` |
| |
|
| | # 📖 How to Cite This Model |
| | ``` |
| | @misc{moira2025greektts15, |
| | title = {GreekTTS-1.0: A State-of-the-Art System for Greek Text-to-Speech Synthesis}, |
| | author = {Moira.AI}, |
| | year = {2025}, |
| | month = {sep}, |
| | day = {22}, |
| | url = {https://moira-ai.com/}, |
| | note = {Demo report: https://moiraai2024.github.io/GreekTTS-demo/} |
| | } |
| | ``` |