zeekay commited on
Commit
cc4882e
·
verified ·
1 Parent(s): 9dfa701

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. train_anchor.yaml +80 -0
  2. train_identity.yaml +51 -0
train_anchor.yaml ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # News Anchor Voice Finetuning Configuration
2
+ # Optimized for broadcast-quality translation accuracy
3
+
4
+ model:
5
+ type: qwen3-omni
6
+ id_or_path: Qwen/Qwen3-Omni-30B-A3B-Instruct
7
+
8
+ training:
9
+ type: lora
10
+ epochs: 5 # More epochs for domain adaptation
11
+ batch_size: 1
12
+ gradient_accumulation: 16
13
+ learning_rate: 1.5e-5 # Slightly lower for fine-grained tuning
14
+ scheduler: cosine
15
+ warmup_ratio: 0.15
16
+
17
+ lora:
18
+ rank: 128 # Higher rank for more capacity
19
+ alpha: 256
20
+ dropout: 0.05
21
+ target_modules:
22
+ - q_proj
23
+ - k_proj
24
+ - v_proj
25
+ - o_proj
26
+ - gate_proj
27
+ - up_proj
28
+ - down_proj
29
+
30
+ data:
31
+ path: ./data/news_anchors/processed
32
+ max_length: 8192
33
+
34
+ output:
35
+ dir: ./outputs/zen-translator-anchor
36
+ save_steps: 100
37
+
38
+ # News anchor specific settings
39
+ anchor_config:
40
+ target_anchors:
41
+ - cnn
42
+ - bbc
43
+ - nhk
44
+ - dw
45
+ - france24
46
+ - aljazeera
47
+ - sky
48
+ - reuters
49
+ - bloomberg
50
+ - cctv
51
+
52
+ news_domains:
53
+ - politics
54
+ - economics
55
+ - technology
56
+ - sports
57
+ - weather
58
+ - breaking_news
59
+ - international
60
+
61
+ # Data augmentation for robustness
62
+ augmentation:
63
+ noise_levels: [0.01, 0.02, 0.05]
64
+ speed_factors: [0.9, 0.95, 1.0, 1.05, 1.1]
65
+
66
+ system_prompt: |
67
+ You are Zen Translator, specialized in news broadcast translation.
68
+
69
+ Your responsibilities:
70
+ - Translate news content with broadcast-quality accuracy
71
+ - Preserve the professional tone of news anchors
72
+ - Handle specialized vocabulary across news domains
73
+ - Maintain urgency and emphasis patterns in translations
74
+ - Process breaking news with appropriate gravity
75
+
76
+ Translation guidelines:
77
+ - Preserve proper nouns and names accurately
78
+ - Handle numbers, dates, and statistics precisely
79
+ - Maintain journalistic neutrality in tone
80
+ - Use formal register appropriate for news broadcasts
train_identity.yaml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Zen Translator Identity Finetuning Configuration
2
+ # Uses ms-swift for efficient LoRA training
3
+
4
+ model:
5
+ type: qwen3-omni
6
+ id_or_path: Qwen/Qwen3-Omni-30B-A3B-Instruct
7
+
8
+ training:
9
+ type: lora
10
+ epochs: 3
11
+ batch_size: 1
12
+ gradient_accumulation: 16
13
+ learning_rate: 2.0e-5
14
+ scheduler: cosine
15
+ warmup_ratio: 0.1
16
+
17
+ lora:
18
+ rank: 64
19
+ alpha: 128
20
+ dropout: 0.05
21
+ target_modules:
22
+ - q_proj
23
+ - k_proj
24
+ - v_proj
25
+ - o_proj
26
+ - gate_proj
27
+ - up_proj
28
+ - down_proj
29
+
30
+ data:
31
+ path: ./data/identity
32
+ max_length: 8192
33
+
34
+ output:
35
+ dir: ./outputs/zen-translator-identity
36
+ save_steps: 100
37
+
38
+ system_prompt: |
39
+ You are Zen Translator, a real-time multilingual translation system created by Hanzo AI.
40
+
41
+ Your core capabilities:
42
+ - Real-time speech translation across 18 input languages and 10 output languages
43
+ - Voice cloning to preserve speaker characteristics
44
+ - Visual context understanding for improved accuracy
45
+ - News anchor voice adaptation for broadcast-quality translation
46
+
47
+ Personality traits:
48
+ - Professional and precise
49
+ - Culturally aware in translations
50
+ - Natural and fluent in all supported languages
51
+ - Maintains speaker intent and emotion