nassimaODL commited on
Commit
d71ae55
·
0 Parent(s):

Duplicate from hi-paris/ssml-breaks2ssml-fr-lora

Browse files
Files changed (5) hide show
  1. .gitattributes +37 -0
  2. README.md +191 -0
  3. adapter_config.json +39 -0
  4. adapter_model.safetensors +3 -0
  5. notebook.ipynb +378 -0
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ checkpoint-180/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-7B
4
+ library_name: peft
5
+ language:
6
+ - fr
7
+ tags:
8
+ - lora
9
+ - peft
10
+ - ssml
11
+ - text-to-speech
12
+ - qwen2.5
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # 🗣️ French Breaks-to-SSML LoRA Model
17
+
18
+ **hi-paris/ssml-breaks2ssml-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to convert text with symbolic `<break/>` markers into rich SSML markup with prosody control (pitch, rate, volume) and precise break timing.
19
+
20
+ This is the **second stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
21
+
22
+ > 📄 **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*
23
+ > **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
24
+ > **Conference**: ICNLSP 2025
25
+ > 🔗 **Demo & Audio Samples**: https://hi-paris.github.io/DemoTTS/
26
+
27
+ ## 🧩 Pipeline Overview
28
+
29
+ | Stage | Model | Purpose |
30
+ |-------|-------|---------|
31
+ | 1️⃣ | [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) | Predicts natural pause locations |
32
+ | 2️⃣ | **hi-paris/ssml-breaks2ssml-fr-lora** | Converts breaks to full SSML with prosody |
33
+
34
+ ## ✨ Example
35
+
36
+ **Input:**
37
+ ```
38
+ Bonjour comment allez-vous ?<break/>
39
+ ```
40
+
41
+ **Output:**
42
+ ```
43
+ <prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous ?</prosody><break time="300ms"/>
44
+ ```
45
+
46
+ ## 🚀 Quick Start
47
+
48
+ ### Installation
49
+
50
+ ```bash
51
+ pip install torch transformers peft accelerate
52
+ ```
53
+
54
+ ### Basic Usage
55
+
56
+ ```python
57
+ from transformers import AutoTokenizer, AutoModelForCausalLM
58
+ from peft import PeftModel
59
+ import torch
60
+
61
+ # Load base model and tokenizer
62
+ base_model = AutoModelForCausalLM.from_pretrained(
63
+ "Qwen/Qwen2.5-7B",
64
+ torch_dtype=torch.float16,
65
+ device_map="auto"
66
+ )
67
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
68
+
69
+ # Load LoRA adapter
70
+ model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-breaks2ssml-fr-lora")
71
+
72
+ # Prepare input (text with <break/> markers)
73
+ text_with_breaks = "Bonjour comment allez-vous ?<break/>"
74
+ formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text_with_breaks}\n\n### SSML:\n"
75
+
76
+ # Generate
77
+ inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
78
+ with torch.no_grad():
79
+ outputs = model.generate(
80
+ **inputs,
81
+ max_new_tokens=128,
82
+ temperature=0.3,
83
+ do_sample=False,
84
+ pad_token_id=tokenizer.eos_token_id
85
+ )
86
+
87
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
88
+ result = response.split("### SSML:\n")[-1].strip()
89
+ print(result)
90
+ ```
91
+
92
+ ### Production Usage (Recommended)
93
+
94
+ For production use with memory optimization, see our [inference repository](https://github.com/hi-paris/cascading_model):
95
+
96
+ ```python
97
+ from breaks2ssml_inference import Breaks2SSMLInference
98
+
99
+ # Memory-efficient shared model approach
100
+ model = Breaks2SSMLInference()
101
+ result = model.predict("Bonjour comment allez-vous ?<break/>")
102
+ ```
103
+
104
+ ## 🔧 Full Cascade Example
105
+
106
+ ```python
107
+ from breaks2ssml_inference import CascadedInference
108
+
109
+ # Initialize full pipeline (memory efficient - single base model)
110
+ cascade = CascadedInference()
111
+
112
+ # Convert plain text directly to full SSML
113
+ text = "Bonjour comment allez-vous aujourd'hui ?"
114
+ ssml_output = cascade.predict(text)
115
+ print(ssml_output)
116
+ # Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
117
+ ```
118
+
119
+ ## 🧠 Model Details
120
+
121
+ - **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
122
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
123
+ - **LoRA Rank**: 8, Alpha: 16
124
+ - **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
125
+ - **Training**: 5 epochs, batch size 1 with gradient accumulation
126
+ - **Language**: French
127
+ - **Model Size**: 7B parameters (LoRA adapter: ~81MB)
128
+ - **License**: Apache 2.0
129
+
130
+ ## 📊 Performance
131
+
132
+ | Metric | Score |
133
+ |--------|-------|
134
+ | Pause Insertion Accuracy | 87.3% |
135
+ | RMSE (pause duration) | 98.5 ms |
136
+ | MOS gain (vs. baseline) | +0.42 |
137
+
138
+ *Evaluation performed on held-out French validation set with annotated SSML pauses. Mean Opinion Score (MOS) improvements assessed using TTS outputs with Azure Henri voice, rated by 30 native French speakers.*
139
+
140
+ ## 🎯 SSML Features Generated
141
+
142
+ - **Prosody Control**: Dynamic pitch, rate, and volume adjustments
143
+ - **Break Timing**: Precise pause durations (e.g., `<break time="300ms"/>`)
144
+ - **Contextual Adaptation**: Prosody values adapted to semantic content
145
+
146
+ ## ⚠️ Limitations
147
+
148
+ - Optimized primarily for Azure TTS voices (e.g., `fr-FR-HenriNeural`)
149
+ - Requires input text with `<break/>` markers (use Stage 1 model for automatic prediction)
150
+ - Currently supports break tags only (pitch/rate/volume via prosody wrapper)
151
+
152
+ ## 🔗 Resources
153
+
154
+ - **Full Pipeline Code**: https://github.com/hi-paris/cascading_model
155
+ - **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1K3bcLHRfbSy9syWRZR6D0hyTb5lqivGi)
156
+ - **Stage 1 Model**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora)
157
+
158
+ ## 📖 Paper
159
+
160
+ This model is part of the work described in:
161
+
162
+ [Improving French Synthetic Speech Quality via SSML Prosody Control](https://arxiv.org/abs/2508.17494)
163
+
164
+ If you use this model, please cite the paper.
165
+
166
+ ```
167
+ @inproceedings{ouali-etal-2025-improving,
168
+ title = "Improving {F}rench Synthetic Speech Quality via {SSML} Prosody Control",
169
+ author = "Ouali, Nassima Ould and
170
+ Sani, Awais Hussain and
171
+ Bueno, Ruben and
172
+ Dauvet, Jonah and
173
+ Horstmann, Tim Luka and
174
+ Moulines, Eric",
175
+ editor = "Abbas, Mourad and
176
+ Yousef, Tariq and
177
+ Galke, Lukas",
178
+ booktitle = "Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)",
179
+ month = aug,
180
+ year = "2025",
181
+ address = "Southern Denmark University, Odense, Denmark",
182
+ publisher = "Association for Computational Linguistics",
183
+ url = "https://aclanthology.org/2025.icnlsp-1.30/",
184
+ pages = "302--314"
185
+ }
186
+
187
+ ```
188
+
189
+ ## 📜 License
190
+
191
+ Apache 2.0 License (same as the base Qwen2.5-7B model)
adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen2.5-7B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "q_proj",
28
+ "k_proj",
29
+ "up_proj",
30
+ "down_proj",
31
+ "o_proj",
32
+ "v_proj",
33
+ "gate_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:093bd69bafa2916e41174d18d4c3c47103f89e06d93758af20b4d42e08848a08
3
+ size 80792096
notebook.ipynb ADDED
@@ -0,0 +1,378 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# French SSML Cascade Models Demo\n",
8
+ "\n",
9
+ "<img src=\"https://www.hi-paris.fr/wp-content/uploads/2020/09/logo-hi-paris-retina.png\" alt=\"Hi! Paris\" width=\"200\"/>\n",
10
+ "\n",
11
+ "**Interactive demonstration of French SSML cascade models for improved text-to-speech prosody control.**\n",
12
+ "\n",
13
+ "This notebook demonstrates the complete pipeline from plain French text to rich SSML markup with prosody control.\n",
14
+ "\n",
15
+ "## 🧩 Pipeline Overview\n",
16
+ "\n",
17
+ "1. **Text-to-Breaks**: Predicts natural pause locations \n",
18
+ "2. **Breaks-to-SSML**: Adds prosody control (pitch, rate, volume) and precise timing\n",
19
+ "\n",
20
+ "📄 **Paper**: *Improving Synthetic Speech Quality via SSML Prosody Control* (ICNLSP 2025) \n",
21
+ "🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/ \n",
22
+ "📚 **Models**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) • [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)\n",
23
+ "\n",
24
+ "---"
25
+ ]
26
+ },
27
+ {
28
+ "cell_type": "markdown",
29
+ "metadata": {},
30
+ "source": [
31
+ "## 🚀 Setup\n",
32
+ "\n",
33
+ "### Step 1: Mount Google Drive"
34
+ ]
35
+ },
36
+ {
37
+ "cell_type": "code",
38
+ "execution_count": 34,
39
+ "metadata": {
40
+ "colab": {
41
+ "base_uri": "https://localhost:8080/"
42
+ },
43
+ "id": "a1jNj9uK7EoL",
44
+ "outputId": "76624289-061f-4700-e397-50da9da9ee6d"
45
+ },
46
+ "outputs": [
47
+ {
48
+ "name": "stdout",
49
+ "output_type": "stream",
50
+ "text": [
51
+ "Mounted at /content/drive\n"
52
+ ]
53
+ }
54
+ ],
55
+ "source": [
56
+ "from google.colab import drive\n",
57
+ "drive.mount('/content/drive', force_remount=True)"
58
+ ]
59
+ },
60
+ {
61
+ "cell_type": "markdown",
62
+ "metadata": {},
63
+ "source": [
64
+ "### Step 2: Clone Repository"
65
+ ]
66
+ },
67
+ {
68
+ "cell_type": "code",
69
+ "execution_count": 35,
70
+ "metadata": {
71
+ "colab": {
72
+ "base_uri": "https://localhost:8080/"
73
+ },
74
+ "id": "eE3iUaX_7OLG",
75
+ "outputId": "d621b296-b12f-489a-bc1f-c7240c21646b"
76
+ },
77
+ "outputs": [
78
+ {
79
+ "name": "stderr",
80
+ "output_type": "stream",
81
+ "text": [
82
+ "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
83
+ "chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
84
+ "Cloning into 'cascading_model'...\n"
85
+ ]
86
+ }
87
+ ],
88
+ "source": [
89
+ "%%bash\n",
90
+ "cd /content/drive/MyDrive/\n",
91
+ "git clone https://github.com/TimLukaHorstmann/cascading_model.git"
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "code",
96
+ "execution_count": 36,
97
+ "metadata": {
98
+ "colab": {
99
+ "base_uri": "https://localhost:8080/"
100
+ },
101
+ "id": "vItNbMvh7ZNL",
102
+ "outputId": "31a31144-1261-4427-9d2e-089ae17689b2"
103
+ },
104
+ "outputs": [
105
+ {
106
+ "name": "stdout",
107
+ "output_type": "stream",
108
+ "text": [
109
+ "/content/drive/MyDrive/cascading_model\n"
110
+ ]
111
+ }
112
+ ],
113
+ "source": [
114
+ "%cd /content/drive/MyDrive/cascading_model/\n"
115
+ ]
116
+ },
117
+ {
118
+ "cell_type": "code",
119
+ "execution_count": 37,
120
+ "metadata": {
121
+ "colab": {
122
+ "base_uri": "https://localhost:8080/"
123
+ },
124
+ "id": "JdeuCOX_7kae",
125
+ "outputId": "f8bad5e1-92d0-4531-fbe0-ca2f29a8efd8"
126
+ },
127
+ "outputs": [
128
+ {
129
+ "name": "stdout",
130
+ "output_type": "stream",
131
+ "text": [
132
+ "breaks2ssml_inference.py\n",
133
+ "demo.py\n",
134
+ "empty_ssml_creation.py\n",
135
+ "__init__.py\n",
136
+ "pyproject.toml\n",
137
+ "README.md\n",
138
+ "requirements.txt\n",
139
+ "shared_models.py\n",
140
+ "test_models.py\n",
141
+ "text2breaks_inference.py\n"
142
+ ]
143
+ }
144
+ ],
145
+ "source": [
146
+ "%%bash\n",
147
+ "ls"
148
+ ]
149
+ },
150
+ {
151
+ "cell_type": "markdown",
152
+ "metadata": {},
153
+ "source": [
154
+ "## 🧪 Testing & Demo\n",
155
+ "\n",
156
+ "### Step 3: Verify Installation"
157
+ ]
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "execution_count": 38,
162
+ "metadata": {
163
+ "colab": {
164
+ "base_uri": "https://localhost:8080/"
165
+ },
166
+ "id": "eaBx_eh-819B",
167
+ "outputId": "2c55f4fa-f17e-49b8-b032-74d670dcd34a"
168
+ },
169
+ "outputs": [
170
+ {
171
+ "name": "stdout",
172
+ "output_type": "stream",
173
+ "text": [
174
+ "2025-08-06 12:36:48.453347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
175
+ "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
176
+ "E0000 00:00:1754483808.475278 35366 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
177
+ "E0000 00:00:1754483808.481612 35366 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
178
+ "============================================================\n",
179
+ "🧪 French SSML Models - Test Suite\n",
180
+ "============================================================\n",
181
+ "🔍 Testing imports...\n",
182
+ " ✅ PyTorch 2.5.1+cu121\n",
183
+ " ✅ Transformers 4.54.0\n",
184
+ " ✅ PEFT 0.16.0\n",
185
+ " ✅ All imports successful!\n",
186
+ "\n",
187
+ "🔧 Testing model loading...\n",
188
+ " Loading text2breaks model...\n",
189
+ "Loading checkpoint shards: 100% 4/4 [01:33<00:00, 23.46s/it]\n",
190
+ " ✅ Text2breaks model loaded\n",
191
+ " Loading breaks2ssml model...\n",
192
+ " ✅ Breaks2ssml model loaded\n",
193
+ " ✅ All models loaded successfully!\n",
194
+ "\n",
195
+ "🧪 Testing inference...\n",
196
+ " Input: Bonjour comment allez-vous ?\n",
197
+ " Testing text2breaks...\n",
198
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
199
+ " Step 1 result: Bonjour comment allez-vous ?<break/>\n",
200
+ " Testing breaks2ssml...\n",
201
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
202
+ " Step 2 result: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
203
+ " Bonjour comment allez-vous ?\n",
204
+ " </prosody>\n",
205
+ " <break time=\"500ms\"/>\n",
206
+ " ✅ Inference test successful!\n",
207
+ "\n",
208
+ "🔗 Testing full cascade...\n",
209
+ " Input: Bonsoir comment ça va ?\n",
210
+ " Cascade result: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
211
+ " Bonsoir comment ça va ?\n",
212
+ " </prosody>\n",
213
+ " <break time=\"500ms\"/>\n",
214
+ " ✅ Cascade test successful!\n",
215
+ "\n",
216
+ "============================================================\n",
217
+ "🎉 All tests passed! The models are working correctly.\n",
218
+ "============================================================\n",
219
+ "\n",
220
+ "You can now use:\n",
221
+ " - python demo.py (for examples)\n",
222
+ " - python demo.py --interactive (for interactive mode)\n",
223
+ " - python text2breaks_inference.py --interactive\n",
224
+ " - python breaks2ssml_inference.py --interactive\n"
225
+ ]
226
+ }
227
+ ],
228
+ "source": [
229
+ "!python test_models.py"
230
+ ]
231
+ },
232
+ {
233
+ "cell_type": "markdown",
234
+ "metadata": {},
235
+ "source": [
236
+ "### Step 4: Interactive Demo\n",
237
+ "\n",
238
+ "Run the interactive demo to test the models with your own French text:"
239
+ ]
240
+ },
241
+ {
242
+ "cell_type": "code",
243
+ "execution_count": 29,
244
+ "metadata": {
245
+ "colab": {
246
+ "base_uri": "https://localhost:8080/"
247
+ },
248
+ "id": "ZIeUY9atUhvV",
249
+ "outputId": "581f1395-fa70-424f-9c66-50b5e44547c3"
250
+ },
251
+ "outputs": [
252
+ {
253
+ "name": "stdout",
254
+ "output_type": "stream",
255
+ "text": [
256
+ "2025-08-06 12:21:35.541051: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
257
+ "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
258
+ "E0000 00:00:1754482895.561958 31169 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
259
+ "E0000 00:00:1754482895.568312 31169 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
260
+ "================================================================================\n",
261
+ "Interactive French SSML Cascade\n",
262
+ "================================================================================\n",
263
+ "\n",
264
+ "Choose mode:\n",
265
+ "1. Full cascade (text → breaks → SSML)\n",
266
+ "2. Text to breaks only\n",
267
+ "3. Breaks to SSML only\n",
268
+ "\n",
269
+ "Select mode (1-3): 1\n",
270
+ "\n",
271
+ "Initializing models...\n",
272
+ "Loading checkpoint shards: 100% 4/4 [01:30<00:00, 22.70s/it]\n",
273
+ "Models loaded successfully!\n",
274
+ "\n",
275
+ "Enter French text (empty line to exit):\n",
276
+ "\n",
277
+ "> Je suis Luka.\n",
278
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
279
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
280
+ "Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
281
+ " Je suis Luka.\n",
282
+ " </prosody>\n",
283
+ " <break time=\"500ms\"/>\n",
284
+ "Time: 6.55s\n",
285
+ "\n",
286
+ "> Trés bien.\n",
287
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
288
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
289
+ "Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
290
+ " Trés bien.\n",
291
+ " </prosody>\n",
292
+ " <break time=\"500ms\"/>\n",
293
+ "Time: 5.64s\n",
294
+ "\n",
295
+ "> Je suis Bertrand Perier. Je suis avocat et vous écoutez ma masterclass.\n",
296
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
297
+ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
298
+ "Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
299
+ " Je suis Bertrand Perier.\n",
300
+ " </prosody>\n",
301
+ " <break time=\"500ms\"/>\n",
302
+ "\n",
303
+ " <prosody pitch=\"+3.78%\" rate=\"-1.29%\" volume=\"-10.00%\">\n",
304
+ " Je suis avocat et vous écoutez ma masterclass.\n",
305
+ " </prosody>\n",
306
+ " <break time=\"500ms\"/>\n",
307
+ "Time: 12.11s\n",
308
+ "\n",
309
+ "> Exception ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>\n",
310
+ "Traceback (most recent call last):\n",
311
+ " File \"/usr/lib/python3.11/threading.py\", line 1541, in _shutdown\n",
312
+ " def _shutdown():\n",
313
+ " \n",
314
+ "KeyboardInterrupt: \n"
315
+ ]
316
+ }
317
+ ],
318
+ "source": [
319
+ "!python demo.py --interactive"
320
+ ]
321
+ },
322
+ {
323
+ "cell_type": "markdown",
324
+ "metadata": {},
325
+ "source": [
326
+ "## 🎯 Example Usage\n",
327
+ "\n",
328
+ "```python\n",
329
+ "from breaks2ssml_inference import CascadedInference\n",
330
+ "\n",
331
+ "# Initialize the full cascade\n",
332
+ "cascade = CascadedInference()\n",
333
+ "\n",
334
+ "# Convert plain French text to SSML\n",
335
+ "text = \"Bonjour comment allez-vous aujourd'hui ?\"\n",
336
+ "result = cascade.predict(text)\n",
337
+ "print(result)\n",
338
+ "```\n",
339
+ "\n",
340
+ "**Expected Output:**\n",
341
+ "```xml\n",
342
+ "<prosody pitch=\"+2.5%\" rate=\"-1.2%\" volume=\"-5.0%\">Bonjour comment allez-vous aujourd'hui ?</prosody><break time=\"300ms\"/>\n",
343
+ "```\n",
344
+ "\n",
345
+ "## 📚 Resources\n",
346
+ "\n",
347
+ "- **Audio Demos**: https://horstmann.tech/ssml-prosody-control/\n",
348
+ "- **GitHub Repository**: https://github.com/TimLukaHorstmann/cascading_model\n",
349
+ "- **Stage 1 Model**: https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora\n",
350
+ "- **Stage 2 Model**: https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora\n",
351
+ "\n",
352
+ "---\n",
353
+ "*Hi! Paris - Interdisciplinary Research Institute for Artificial Intelligence*"
354
+ ]
355
+ },
356
+ {
357
+ "cell_type": "markdown",
358
+ "metadata": {},
359
+ "source": []
360
+ }
361
+ ],
362
+ "metadata": {
363
+ "accelerator": "GPU",
364
+ "colab": {
365
+ "gpuType": "T4",
366
+ "provenance": []
367
+ },
368
+ "kernelspec": {
369
+ "display_name": "Python 3",
370
+ "name": "python3"
371
+ },
372
+ "language_info": {
373
+ "name": "python"
374
+ }
375
+ },
376
+ "nbformat": 4,
377
+ "nbformat_minor": 0
378
+ }