xezpeleta commited on
Commit
36d34fd
·
verified ·
1 Parent(s): dd827da

Add Antzoki TTS LoRA — Basque fine-tune of DramaBox (10k steps)

Browse files
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - eu
4
+ license: apache-2.0
5
+ tags:
6
+ - text-to-speech
7
+ - tts
8
+ - lora
9
+ - basque
10
+ - euskera
11
+ - voice-cloning
12
+ - speech-synthesis
13
+ - dramabox
14
+ - ltx2
15
+ - expressive-tts
16
+ base_model: ResembleAI/DramaBox
17
+ datasets:
18
+ - openslr/openslr76
19
+ metrics: []
20
+ pipeline_tag: text-to-speech
21
+ library_name: peft
22
+ ---
23
+
24
+ # Antzoki TTS — Basque LoRA for DramaBox
25
+
26
+ **Antzoki TTS** is a LoRA adapter for [DramaBox](https://github.com/resemble-ai/DramaBox) (Resemble AI), fine-tuned on the [OpenSLR76](https://www.openslr.org/76/) Basque speech corpus to improve Basque-language synthesis quality.
27
+
28
+ > **Antzoki** (Basque) — *theatre*, *stage*
29
+
30
+ The base DramaBox model is a highly expressive, cinematic TTS system capable of voice cloning, dramatic acting, and detailed emotional direction. This LoRA shifts its phonetic prior toward Basque, reducing the English accent while preserving dramatic and expressive capabilities.
31
+
32
+ ---
33
+
34
+ ## Model Details
35
+
36
+ | | |
37
+ |---|---|
38
+ | **Base model** | DramaBox DiT v1 (`dev` schedule, non-distilled) |
39
+ | **Adapter type** | LoRA (PEFT) |
40
+ | **LoRA rank** | 128 |
41
+ | **LoRA alpha** | 128 |
42
+ | **Target modules** | `audio_attn1.{to_q,to_k,to_v,to_out.0}`, `audio_ff.{net.0.proj,net.2}` — 288 weight pairs across 48 transformer blocks |
43
+ | **Training steps** | 10 000 |
44
+ | **Learning rate** | 1e-4 (cosine schedule) |
45
+ | **Dataset** | OpenSLR76 — 7 136 utterances, 52 speakers (29 F / 23 M), ~13.9 h total audio |
46
+ | **Hardware** | NVIDIA L40S (46 GB VRAM) |
47
+ | **Training time** | ~6 hours |
48
+
49
+ ### Checkpoints included
50
+
51
+ | File | Description |
52
+ |---|---|
53
+ | `lora_step_10000.safetensors` | Final checkpoint (step 10 000) |
54
+ | `best_step_06850.safetensors` | Best validation loss checkpoint (step 6 850) |
55
+ | `adapter_config.json` | PEFT adapter configuration |
56
+
57
+ > **Recommended**: `best_step_06850.safetensors` for best balance of Basque prosody and expressive acting range. `lora_step_10000.safetensors` may offer better Basque phonetics at the cost of some expressiveness.
58
+
59
+ ---
60
+
61
+ ## Usage
62
+
63
+ Requires [DramaBox](https://github.com/resemble-ai/DramaBox) to be set up locally.
64
+
65
+ ```bash
66
+ cd DramaBox
67
+
68
+ CUDA_VISIBLE_DEVICES=0 PYTHONPATH=ltx2 python src/inference.py \
69
+ --checkpoint dramabox-dit-v1.safetensors \
70
+ --full-checkpoint dramabox-audio-components.safetensors \
71
+ --lora /path/to/best_step_06850.safetensors \
72
+ --voice-sample /path/to/reference.wav \
73
+ --prompt "Your director-style prompt here" \
74
+ --output output.wav \
75
+ --cfg-scale 2.5 \
76
+ --stg-scale 1.5
77
+ ```
78
+
79
+ > The LoRA is **never merged** — always loaded via `--lora` at inference time.
80
+
81
+ ---
82
+
83
+ ## Prompt Format
84
+
85
+ DramaBox uses a **director-style prompt** format: narrative context outside quotes, spoken text inside quotes.
86
+
87
+ ```
88
+ A [character description], [action/emotion]. "[spoken text]"
89
+ ```
90
+
91
+ ### Example prompts
92
+
93
+ **Villain — dramatic menace (voice clone)**
94
+ ```
95
+ A shadowy villain speaks with cold menace, "Nire lurretan sartu zara, morroi"
96
+ He chuckles darkly, "Erruz ordainduko duzu."
97
+ His voice rises with fury, "Belaunikatu, edo suntsituko zaitut!!"
98
+ ```
99
+
100
+ **Documentary narrator — radio host (no voice clone)**
101
+ ```
102
+ A professional woman in her mid-thirties with a warm, rhythmic storyteller's voice
103
+ speaks with clear authority and growing excitement.
104
+ She leans into the microphone, her breath audible.
105
+ "Kaixo guztioi! Gaur denboran atzera egingo dugu, duela hirurogeita sei milioi urteko mundu harrigarri hartara."
106
+ She pauses for a moment, letting the tension build, then speaks with dramatic intensity.
107
+ "Bat-batean, zerua argitu zen. Asteroide erraldoi batek Lurra jo zuen eta dinosauroen erregealdia betiko amaitu zen!"
108
+ She chuckles softly, a smile evident in her tone.
109
+ "Nola aldatu zuen kolpe hark planetaren patua? Segituan kontatuko dizuegu!"
110
+ ```
111
+
112
+ **Joyful child — wonder and excitement (voice clone)**
113
+ ```
114
+ A bright-eyed girl spins in a field of wildflowers, her voice bubbling with pure, breathless wonder:
115
+ "Aizu, aitona! Entzun duzu?!"
116
+ She laughs, a sound as clear as a mountain stream.
117
+ "Makina batek hitz egiten duela dirudi, baina hain da erreala!"
118
+ She spreads her arms wide, looking up at the sky in disbelief.
119
+ "Sinestezina da... adimen artifizialak nire ahotsa sortu du!!"
120
+ ```
121
+
122
+ **Neutral Basque (simple wrapper)**
123
+ ```
124
+ A woman speaks in Basque, "Kaixo, nola zaude gaur?"
125
+ ```
126
+
127
+ ---
128
+
129
+ ## Limitations
130
+
131
+ - Trained exclusively on read speech (OpenSLR76). Expressive/dramatic output relies on DramaBox's pretrained prior.
132
+ - Accent reduction is significant but not complete — residual English prosody may appear in some phonetic contexts.
133
+ - Best results with voice cloning (`--voice-sample`) from a Basque speaker.
134
+ - Very short prompts (<3 s target duration) may produce less stable output.
135
+
136
+ ---
137
+
138
+ ## Training Data
139
+
140
+ [OpenSLR76](https://www.openslr.org/76/) — Crowdsourced Basque speech corpus:
141
+ - 7 136 utterances across 52 speakers (29 female, 23 male)
142
+ - 3–15.5 s per clip, mean ~7 s, ~13.9 h total
143
+ - Read speech style
144
+
145
+ ---
146
+
147
+ ## Acknowledgements
148
+
149
+ - **[DramaBox](https://github.com/resemble-ai/DramaBox)** — Resemble AI. The base TTS model this LoRA is trained on. DramaBox is built on the LTX-2 architecture.
150
+ - **[LTX-2](https://github.com/Lightricks/LTX-Video)** — Lightricks. The underlying DiT architecture powering DramaBox.
151
+ - **[OpenSLR76](https://www.openslr.org/76/)** — Crowdsourced Basque speech dataset used for fine-tuning.
152
+
153
+ ---
154
+
155
+ ## License
156
+
157
+ This LoRA adapter is released under **Apache 2.0**.
158
+ DramaBox base model weights are subject to [Resemble AI's terms](https://github.com/resemble-ai/DramaBox/blob/main/LICENSE).
159
+
160
+ ---
161
+
162
+ *Part of the [Itzune](https://huggingface.co/itzune) project — Basque-language AI tools.*
adapter_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": {
6
+ "base_model_class": "LTXModel",
7
+ "parent_library": "ltx_core.model.transformer.model"
8
+ },
9
+ "base_model_name_or_path": null,
10
+ "bias": "none",
11
+ "corda_config": null,
12
+ "ensure_weight_tying": false,
13
+ "eva_config": null,
14
+ "exclude_modules": null,
15
+ "fan_in_fan_out": false,
16
+ "inference_mode": true,
17
+ "init_lora_weights": true,
18
+ "layer_replication": null,
19
+ "layers_pattern": null,
20
+ "layers_to_transform": null,
21
+ "loftq_config": {},
22
+ "lora_alpha": 128,
23
+ "lora_bias": false,
24
+ "lora_dropout": 0.1,
25
+ "lora_ga_config": null,
26
+ "megatron_config": null,
27
+ "megatron_core": "megatron.core",
28
+ "modules_to_save": null,
29
+ "peft_type": "LORA",
30
+ "peft_version": "0.19.1",
31
+ "qalora_group_size": 16,
32
+ "r": 128,
33
+ "rank_pattern": {},
34
+ "revision": null,
35
+ "target_modules": [
36
+ "audio_attn1.to_k",
37
+ "audio_attn1.to_v",
38
+ "audio_ff.net.0.proj",
39
+ "audio_attn1.to_q",
40
+ "audio_attn1.to_out.0",
41
+ "audio_ff.net.2"
42
+ ],
43
+ "target_parameters": null,
44
+ "task_type": null,
45
+ "trainable_token_indices": null,
46
+ "use_bdlora": null,
47
+ "use_dora": false,
48
+ "use_qalora": false,
49
+ "use_rslora": false
50
+ }
best_step_06850.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0ad66b9d6edb03a2ae660a0478a094bb00644689c82995fd996d1fd6047c0fc
3
+ size 906052856
lora_step_10000.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f42f6b94b99bb4844c021448b525e2655a9892f8316df0b8de6e8c8624480a6
3
+ size 906052856