moiraai2024 commited on
Commit
6b37498
·
verified ·
1 Parent(s): c6d7c7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -3
README.md CHANGED
@@ -1,3 +1,136 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - el
5
+ base_model:
6
+ - sesame/csm-1b
7
+ pipeline_tag: text-to-speech
8
+ ---
9
+
10
+ # Description
11
+ Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation.
12
+
13
+ Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text.
14
+
15
+ Key Features:
16
+
17
+ - Fine-tuned specifically for Greek TTS.
18
+ - Built on the robust sesame/csm-1b model, ensuring high-quality performance.
19
+ - Capable of generating natural-sounding, expressive Greek speech.
20
+ - Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek.
21
+ - Explore the model and see how it can enhance your Greek TTS applications!
22
+
23
+
24
+ # How to use it
25
+ https://docs.unsloth.ai/get-started/install-and-update/conda-install
26
+
27
+ ```
28
+ conda create --name unsloth_env \
29
+ python=3.11 \
30
+ pytorch-cuda=12.1 \
31
+ pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
32
+ -y
33
+ ```
34
+
35
+ ```
36
+ conda activate unsloth_env
37
+ ```
38
+ ```
39
+ pip install unsloth
40
+ ```
41
+
42
+ ```
43
+ from unsloth import FastModel
44
+ from transformers import CsmForConditionalGeneration
45
+ import torch
46
+
47
+ gpu_stats = torch.cuda.get_device_properties(0)
48
+ start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
49
+ max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
50
+ print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
51
+ print(f"{start_gpu_memory} GB of memory reserved.")
52
+
53
+ from unsloth import FastLanguageModel as FastModel
54
+ from peft import PeftModel
55
+ from IPython.display import Audio
56
+
57
+ # --- 1. Load the Base Unsloth Model and Processor ---
58
+ # This setup must be identical to your training script.
59
+ print("Loading the base model and processor...")
60
+ model, processor = FastModel.from_pretrained(
61
+ model_name = "unsloth/csm-1b",
62
+ max_seq_length = 2048,
63
+ dtype = None,
64
+ auto_model = CsmForConditionalGeneration,
65
+ load_in_4bit = False,
66
+ )
67
+
68
+ # --- 2. Identify and Load Your Best LoRA Checkpoint ---
69
+ # !!! IMPORTANT: Change this path to your best checkpoint folder !!!
70
+ # (The one you found in trainer_state.json)
71
+ int_check = 30_000
72
+
73
+ final_int =94_764
74
+ best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int)
75
+
76
+ print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}")
77
+
78
+ # This command seamlessly merges your trained adapter weights onto the base model
79
+ model = PeftModel.from_pretrained(model, best_checkpoint_path)
80
+
81
+ print("\nFine-tuned model is ready for inference!")
82
+ # Unsloth automatically handles moving the model to the GPU
83
+ ```
84
+
85
+ ```
86
+ from transformers import AutoProcessor
87
+ processor = AutoProcessor.from_pretrained("unsloth/csm-1b")
88
+ ```
89
+
90
+ ```
91
+ greek_sentences = [
92
+ "Σου μιλάααανε!",
93
+ "Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.",
94
+ "Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!",
95
+ "Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες",
96
+ ]
97
+ ```
98
+
99
+ ```
100
+ from IPython.display import Audio, display
101
+ import soundfile as sf
102
+ ```
103
+
104
+ ```
105
+ # --- Configure the Generation ---
106
+
107
+ int_ = 1
108
+ text_to_synthesize = greek_sentences[int_]
109
+
110
+ print(f"\nSynthesizing text: '{text_to_synthesize}'")
111
+
112
+ speaker_id = 0
113
+ inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda")
114
+
115
+ audio_values = model.generate(
116
+ **inputs,
117
+ max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this
118
+ # play with these parameters to tweak results
119
+ # depth_decoder_top_k=0,
120
+ # depth_decoder_top_p=0.9,
121
+ # depth_decoder_do_sample=True,
122
+ # depth_decoder_temperature=0.9,
123
+ # top_k=0,
124
+ # top_p=1.0,
125
+ # temperature=0.9,
126
+ # do_sample=True,
127
+ #########################################################
128
+ output_audio=True
129
+ )
130
+ ```
131
+
132
+ ```
133
+ audio = audio_values[0].to(torch.float32).cpu().numpy()
134
+ sf.write("example_without_context.wav", audio, 24000)
135
+ display(Audio(audio, rate=24000))
136
+ ```