ginipick commited on
Commit
8973713
ยท
verified ยท
1 Parent(s): 34e268c

Create components.py

Browse files
Files changed (1) hide show
  1. ui/components.py +1591 -0
ui/components.py ADDED
@@ -0,0 +1,1591 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ACE-Step: A Step Towards Music Generation Foundation Model
3
+
4
+ https://github.com/ace-step/ACE-Step
5
+
6
+ Apache 2.0 License
7
+ """
8
+
9
+ import gradio as gr
10
+ import librosa
11
+ import os
12
+ import random
13
+ import hashlib
14
+ import numpy as np
15
+ import json
16
+ from typing import Dict, List, Tuple, Optional
17
+ from openai import OpenAI
18
+
19
+ # OpenAI ํด๋ผ์ด์–ธํŠธ ์ดˆ๊ธฐํ™”
20
+ try:
21
+ client = OpenAI(api_key=os.getenv("LLM_API"))
22
+ except:
23
+ client = None
24
+
25
+ TAG_DEFAULT = "funk, pop, soul, rock, melodic, guitar, drums, bass, keyboard, percussion, 105 BPM, energetic, upbeat, groovy, vibrant, dynamic, duet, male and female vocals"
26
+ LYRIC_DEFAULT = """[verse - male]
27
+ Neon lights they flicker bright
28
+ City hums in dead of night
29
+ Rhythms pulse through concrete veins
30
+ Lost in echoes of refrains
31
+
32
+ [verse - female]
33
+ Bassline groovin' in my chest
34
+ Heartbeats match the city's zest
35
+ Electric whispers fill the air
36
+ Synthesized dreams everywhere
37
+
38
+ [chorus - duet]
39
+ Turn it up and let it flow
40
+ Feel the fire let it grow
41
+ In this rhythm we belong
42
+ Hear the night sing out our song
43
+
44
+ [verse - male]
45
+ Guitar strings they start to weep
46
+ Wake the soul from silent sleep
47
+ Every note a story told
48
+ In this night we're bold and gold
49
+
50
+ [bridge - female]
51
+ Voices blend in harmony
52
+ Lost in pure cacophony
53
+ Timeless echoes timeless cries
54
+ Soulful shouts beneath the skies
55
+
56
+ [verse - duet]
57
+ Keyboard dances on the keys
58
+ Melodies on evening breeze
59
+ Catch the tune and hold it tight
60
+ In this moment we take flight
61
+ """
62
+
63
+ # ํ™•์žฅ๋œ ์žฅ๋ฅด ํ”„๋ฆฌ์…‹ (๊ธฐ์กด + ๊ฐœ์„ ๋œ ํƒœ๊ทธ)
64
+ GENRE_PRESETS = {
65
+ "Modern Pop": "pop, synth, drums, guitar, 120 bpm, upbeat, catchy, vibrant, duet vocals, polished vocals, radio-ready, commercial, layered vocals",
66
+ "Rock": "rock, electric guitar, drums, bass, 130 bpm, energetic, rebellious, gritty, powerful vocals, raw vocals, power chords, driving rhythm",
67
+ "Hip Hop": "hip hop, 808 bass, hi-hats, synth, 90 bpm, bold, urban, intense, rhythmic vocals, trap beats, punchy drums",
68
+ "Country": "country, acoustic guitar, steel guitar, fiddle, 100 bpm, heartfelt, rustic, warm, twangy vocals, storytelling, americana",
69
+ "EDM": "edm, synth, bass, kick drum, 128 bpm, euphoric, pulsating, energetic, instrumental, progressive build, festival anthem, electronic",
70
+ "Reggae": "reggae, guitar, bass, drums, 80 bpm, chill, soulful, positive, smooth vocals, offbeat rhythm, island vibes",
71
+ "Classical": "classical, orchestral, strings, piano, 60 bpm, elegant, emotive, timeless, instrumental, dynamic range, sophisticated harmony",
72
+ "Jazz": "jazz, saxophone, piano, double bass, 110 bpm, smooth, improvisational, soulful, crooning vocals, swing feel, sophisticated",
73
+ "Metal": "metal, electric guitar, double kick drum, bass, 160 bpm, aggressive, intense, heavy, powerful vocals, distorted, powerful",
74
+ "R&B": "r&b, synth, bass, drums, 85 bpm, sultry, groovy, romantic, silky vocals, smooth production, neo-soul",
75
+ "K-Pop": "k-pop, synth, bass, drums, 128 bpm, catchy, energetic, polished, mixed vocals, electronic elements, danceable",
76
+ "Ballad": "ballad, piano, strings, acoustic guitar, 70 bpm, emotional, heartfelt, romantic, expressive vocals, orchestral arrangement"
77
+ }
78
+
79
+ # ๊ณก ์Šคํƒ€์ผ ์˜ต์…˜
80
+ SONG_STYLES = {
81
+ "๋“€์—ฃ (๋‚จ๋…€ ํ˜ผ์„ฑ)": "duet, male and female vocals, harmonious, call and response",
82
+ "์†”๋กœ (๋‚จ์„ฑ)": "solo, male vocals, powerful voice",
83
+ "์†”๋กœ (์—ฌ์„ฑ)": "solo, female vocals, emotional voice",
84
+ "๊ทธ๋ฃน (ํ˜ผ์„ฑ)": "group vocals, mixed gender, layered harmonies",
85
+ "ํ•ฉ์ฐฝ": "choir, multiple voices, choral arrangement",
86
+ "๋žฉ/ํž™ํ•ฉ": "rap vocals, rhythmic flow, urban style",
87
+ "์ธ์ŠคํŠธ๋ฃจ๋ฉ˜ํƒˆ": "instrumental, no vocals"
88
+ }
89
+
90
+ # AI ์ž‘์‚ฌ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ
91
+ LYRIC_SYSTEM_PROMPT = """๋„ˆ๋Š” ๋…ธ๋ž˜ ๊ฐ€์‚ฌ๋ฅผ ์ž‘์‚ฌํ•˜๋Š” ์ „๋ฌธ๊ฐ€ ์—ญํ• ์ด๋‹ค. ์ด์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•˜๋Š” ์ฃผ์ œ์™€ ์Šคํƒ€์ผ์— ๋”ฐ๋ผ ๊ด€๋ จ๋œ ๋…ธ๋ž˜ ๊ฐ€์‚ฌ๋ฅผ ์ž‘์„ฑํ•˜๋ผ.
92
+
93
+ ๊ฐ€์‚ฌ ์ž‘์„ฑ ๊ทœ์น™:
94
+ 1. ๊ตฌ์กฐ ํƒœ๊ทธ๋Š” ๋ฐ˜๋“œ์‹œ "[ ]"๋กœ ๊ตฌ๋ถ„ํ•œ๋‹ค
95
+ 2. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ตฌ์กฐ ํƒœ๊ทธ: [verse], [chorus], [bridge], [intro], [outro], [pre-chorus]
96
+ 3. ๋“€์—ฃ์ธ ๊ฒฝ์šฐ [verse - male], [verse - female], [chorus - duet] ํ˜•์‹์œผ๋กœ ํŒŒํŠธ๋ฅผ ๋ช…์‹œํ•œ๋‹ค
97
+ 4. ์ž…๋ ฅ ์–ธ์–ด์™€ ๋™์ผํ•œ ์–ธ์–ด๋กœ ๊ฐ€์‚ฌ๋ฅผ ์ž‘์„ฑํ•œ๋‹ค
98
+ 5. ๊ฐ ๊ตฌ์กฐ๋Š” 4-8์ค„ ์ •๋„๋กœ ์ž‘์„ฑํ•œ๋‹ค
99
+ 6. ์Œ์•… ์žฅ๋ฅด์™€ ๋ถ„์œ„๊ธฐ์— ๋งž๋Š” ๊ฐ€์‚ฌ๋ฅผ ์ž‘์„ฑํ•œ๋‹ค
100
+
101
+ ์˜ˆ์‹œ ํ˜•์‹:
102
+ [verse - male]
103
+ ์ฒซ ๋ฒˆ์งธ ๊ตฌ์ ˆ ๊ฐ€์‚ฌ
104
+ ๋‘ ๋ฒˆ์งธ ๊ตฌ์ ˆ ๊ฐ€์‚ฌ
105
+ ...
106
+
107
+ [chorus - duet]
108
+ ํ›„๋ ด๊ตฌ ๊ฐ€์‚ฌ
109
+ ...
110
+ """
111
+
112
+ def generate_lyrics_with_ai(prompt: str, genre: str, song_style: str, language: str = "auto") -> str:
113
+ """AI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์‚ฌ ์ƒ์„ฑ"""
114
+ if not client:
115
+ return LYRIC_DEFAULT
116
+
117
+ try:
118
+ # ์–ธ์–ด ๊ฐ์ง€ ๋ฐ ์Šคํƒ€์ผ ์ •๋ณด ์ถ”๊ฐ€
119
+ style_info = ""
120
+ if "๋“€์—ฃ" in song_style:
121
+ style_info = "๋‚จ๋…€ ๋“€์—ฃ ํ˜•์‹์œผ๋กœ ํŒŒํŠธ๋ฅผ ๋‚˜๋ˆ„์–ด ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”."
122
+ elif "์†”๋กœ (๋‚จ์„ฑ)" in song_style:
123
+ style_info = "๋‚จ์„ฑ ์†”๋กœ ๊ฐ€์ˆ˜๋ฅผ ์œ„ํ•œ ๊ฐ€์‚ฌ๋ฅผ ์ž‘์„ฑ๏ฟฝ๏ฟฝ์ฃผ์„ธ์š”."
124
+ elif "์†”๋กœ (์—ฌ์„ฑ)" in song_style:
125
+ style_info = "์—ฌ์„ฑ ์†”๋กœ ๊ฐ€์ˆ˜๋ฅผ ์œ„ํ•œ ๊ฐ€์‚ฌ๋ฅผ ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”."
126
+ elif "๊ทธ๋ฃน" in song_style:
127
+ style_info = "๊ทธ๋ฃน์ด ๋ถ€๋ฅด๋Š” ํ˜•์‹์œผ๋กœ ํŒŒํŠธ๋ฅผ ๋‚˜๋ˆ„์–ด ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”."
128
+
129
+ user_prompt = f"""
130
+ ์ฃผ์ œ: {prompt}
131
+ ์žฅ๋ฅด: {genre}
132
+ ์Šคํƒ€์ผ: {style_info}
133
+
134
+ ์œ„ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋…ธ๋ž˜ ๊ฐ€์‚ฌ๋ฅผ ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”. ์ž…๋ ฅ๋œ ์–ธ์–ด์™€ ๋™์ผํ•œ ์–ธ์–ด๋กœ ์ž‘์„ฑํ•˜๊ณ , ๊ตฌ์กฐ ํƒœ๊ทธ๋ฅผ ๋ฐ˜๋“œ์‹œ ํฌํ•จํ•ด์ฃผ์„ธ์š”.
135
+ """
136
+
137
+ response = client.chat.completions.create(
138
+ model="gpt-4-mini",
139
+ messages=[
140
+ {"role": "system", "content": LYRIC_SYSTEM_PROMPT},
141
+ {"role": "user", "content": user_prompt}
142
+ ],
143
+ temperature=0.8,
144
+ max_tokens=1000
145
+ )
146
+
147
+ return response.choices[0].message.content
148
+ except Exception as e:
149
+ print(f"AI ๊ฐ€์‚ฌ ์ƒ์„ฑ ์˜ค๋ฅ˜: {e}")
150
+ return LYRIC_DEFAULT
151
+
152
+ # ํ’ˆ์งˆ ํ”„๋ฆฌ์…‹ ์‹œ์Šคํ…œ ์ถ”๊ฐ€
153
+ QUALITY_PRESETS = {
154
+ "Draft (Fast)": {
155
+ "infer_step": 50,
156
+ "guidance_scale": 10.0,
157
+ "scheduler_type": "euler",
158
+ "omega_scale": 5.0,
159
+ "use_erg_diffusion": False,
160
+ "use_erg_tag": True,
161
+ "description": "๋น ๋ฅธ ์ดˆ์•ˆ ์ƒ์„ฑ (1-2๋ถ„)"
162
+ },
163
+ "Standard": {
164
+ "infer_step": 150,
165
+ "guidance_scale": 15.0,
166
+ "scheduler_type": "euler",
167
+ "omega_scale": 10.0,
168
+ "use_erg_diffusion": True,
169
+ "use_erg_tag": True,
170
+ "description": "ํ‘œ์ค€ ํ’ˆ์งˆ (3-5๋ถ„)"
171
+ },
172
+ "High Quality": {
173
+ "infer_step": 200,
174
+ "guidance_scale": 18.0,
175
+ "scheduler_type": "heun",
176
+ "omega_scale": 15.0,
177
+ "use_erg_diffusion": True,
178
+ "use_erg_tag": True,
179
+ "description": "๊ณ ํ’ˆ์งˆ ์ƒ์„ฑ (8-12๋ถ„)"
180
+ },
181
+ "Ultra (Best)": {
182
+ "infer_step": 299,
183
+ "guidance_scale": 20.0,
184
+ "scheduler_type": "heun",
185
+ "omega_scale": 20.0,
186
+ "use_erg_diffusion": True,
187
+ "use_erg_tag": True,
188
+ "description": "์ตœ๊ณ  ํ’ˆ์งˆ (15-20๋ถ„)"
189
+ }
190
+ }
191
+
192
+ # ๋‹ค์ค‘ ์‹œ๋“œ ์ƒ์„ฑ ์„ค์ •
193
+ MULTI_SEED_OPTIONS = {
194
+ "Single": 1,
195
+ "Best of 3": 3,
196
+ "Best of 5": 5,
197
+ "Best of 10": 10
198
+ }
199
+
200
+ class MusicGenerationCache:
201
+ """์ƒ์„ฑ ๊ฒฐ๊ณผ ์บ์‹ฑ ์‹œ์Šคํ…œ"""
202
+ def __init__(self):
203
+ self.cache = {}
204
+ self.max_cache_size = 50
205
+
206
+ def get_cache_key(self, params):
207
+ # ์ค‘์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ์œผ๋กœ ํ•ด์‹œ ์ƒ์„ฑ
208
+ key_params = {k: v for k, v in params.items()
209
+ if k in ['prompt', 'lyrics', 'infer_step', 'guidance_scale', 'audio_duration']}
210
+ return hashlib.md5(str(sorted(key_params.items())).encode()).hexdigest()[:16]
211
+
212
+ def get_cached_result(self, params):
213
+ key = self.get_cache_key(params)
214
+ return self.cache.get(key)
215
+
216
+ def cache_result(self, params, result):
217
+ if len(self.cache) >= self.max_cache_size:
218
+ oldest_key = next(iter(self.cache))
219
+ del self.cache[oldest_key]
220
+
221
+ key = self.get_cache_key(params)
222
+ self.cache[key] = result
223
+
224
+ # ์ „์—ญ ์บ์‹œ ์ธ์Šคํ„ด์Šค
225
+ generation_cache = MusicGenerationCache()
226
+
227
+ def enhance_prompt_with_genre(base_prompt: str, genre: str, song_style: str) -> str:
228
+ """์žฅ๋ฅด์™€ ์Šคํƒ€์ผ์— ๋”ฐ๋ฅธ ์Šค๋งˆํŠธ ํ”„๋กฌํ”„ํŠธ ํ™•์žฅ"""
229
+ if genre == "Custom" or not genre:
230
+ enhanced_prompt = base_prompt
231
+ else:
232
+ # ์žฅ๋ฅด๋ณ„ ์ถ”๊ฐ€ ๊ฐœ์„  ํƒœ๊ทธ
233
+ genre_enhancements = {
234
+ "Modern Pop": ["polished production", "mainstream appeal", "hook-driven"],
235
+ "Rock": ["guitar-driven", "powerful drums", "energetic performance"],
236
+ "Hip Hop": ["rhythmic flow", "urban atmosphere", "bass-heavy"],
237
+ "Country": ["acoustic warmth", "storytelling melody", "authentic feel"],
238
+ "EDM": ["electronic atmosphere", "build-ups", "dance-friendly"],
239
+ "Reggae": ["laid-back groove", "tropical vibes", "rhythmic guitar"],
240
+ "Classical": ["orchestral depth", "musical sophistication", "timeless beauty"],
241
+ "Jazz": ["musical complexity", "improvisational spirit", "sophisticated harmony"],
242
+ "Metal": ["aggressive energy", "powerful sound", "intense atmosphere"],
243
+ "R&B": ["smooth groove", "soulful expression", "rhythmic sophistication"],
244
+ "K-Pop": ["catchy hooks", "dynamic arrangement", "polished production"],
245
+ "Ballad": ["emotional depth", "slow tempo", "heartfelt delivery"]
246
+ }
247
+
248
+ if genre in genre_enhancements:
249
+ additional_tags = ", ".join(genre_enhancements[genre])
250
+ enhanced_prompt = f"{base_prompt}, {additional_tags}"
251
+ else:
252
+ enhanced_prompt = base_prompt
253
+
254
+ # ์Šคํƒ€์ผ ํƒœ๊ทธ ์ถ”๊ฐ€
255
+ if song_style in SONG_STYLES:
256
+ style_tags = SONG_STYLES[song_style]
257
+ enhanced_prompt = f"{enhanced_prompt}, {style_tags}"
258
+
259
+ return enhanced_prompt
260
+
261
+ def calculate_quality_score(audio_path: str) -> float:
262
+ """๊ฐ„๋‹จํ•œ ํ’ˆ์งˆ ์ ์ˆ˜ ๊ณ„์‚ฐ (์‹ค์ œ ๊ตฌํ˜„์—์„œ๋Š” ๋” ๋ณต์žกํ•œ ๋ฉ”ํŠธ๋ฆญ ์‚ฌ์šฉ)"""
263
+ try:
264
+ y, sr = librosa.load(audio_path)
265
+
266
+ # ๊ธฐ๋ณธ ํ’ˆ์งˆ ๋ฉ”ํŠธ๋ฆญ
267
+ rms_energy = np.sqrt(np.mean(y**2))
268
+ spectral_centroid = np.mean(librosa.feature.spectral_centroid(y=y, sr=sr))
269
+ zero_crossing_rate = np.mean(librosa.feature.zero_crossing_rate(y))
270
+
271
+ # ์ •๊ทœํ™”๋œ ์ ์ˆ˜ (0-100)
272
+ energy_score = min(rms_energy * 1000, 40) # 0-40์ 
273
+ spectral_score = min(spectral_centroid / 100, 40) # 0-40์ 
274
+ clarity_score = min((1 - zero_crossing_rate) * 20, 20) # 0-20์ 
275
+
276
+ total_score = energy_score + spectral_score + clarity_score
277
+ return round(total_score, 1)
278
+ except:
279
+ return 50.0 # ๊ธฐ๋ณธ๊ฐ’
280
+
281
+ def update_tags_from_preset(preset_name):
282
+ if preset_name == "Custom":
283
+ return ""
284
+ return GENRE_PRESETS.get(preset_name, "")
285
+
286
+ def update_quality_preset(preset_name):
287
+ """ํ’ˆ์งˆ ํ”„๋ฆฌ์…‹ ์ ์šฉ"""
288
+ if preset_name not in QUALITY_PRESETS:
289
+ return (100, 15.0, "euler", 10.0, True, True)
290
+
291
+ preset = QUALITY_PRESETS[preset_name]
292
+ return (
293
+ preset.get("infer_step", 100),
294
+ preset.get("guidance_scale", 15.0),
295
+ preset.get("scheduler_type", "euler"),
296
+ preset.get("omega_scale", 10.0),
297
+ preset.get("use_erg_diffusion", True),
298
+ preset.get("use_erg_tag", True)
299
+ )
300
+
301
+ def create_enhanced_process_func(original_func):
302
+ """๊ธฐ์กด ํ•จ์ˆ˜๋ฅผ ํ–ฅ์ƒ๋œ ๊ธฐ๋Šฅ์œผ๋กœ ๋ž˜ํ•‘"""
303
+
304
+ def enhanced_func(
305
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
306
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
307
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
308
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
309
+ guidance_scale_text, guidance_scale_lyric,
310
+ audio2audio_enable=False, ref_audio_strength=0.5, ref_audio_input=None,
311
+ lora_name_or_path="none", multi_seed_mode="Single",
312
+ enable_smart_enhancement=True, genre_preset="Custom", song_style="๋“€์—ฃ (๋‚จ๋…€ ํ˜ผ์„ฑ)", **kwargs
313
+ ):
314
+ # ์Šค๋งˆํŠธ ํ”„๋กฌํ”„ํŠธ ํ™•์žฅ
315
+ if enable_smart_enhancement:
316
+ prompt = enhance_prompt_with_genre(prompt, genre_preset, song_style)
317
+
318
+ # ์บ์‹œ ํ™•์ธ
319
+ cache_params = {
320
+ 'prompt': prompt, 'lyrics': lyrics, 'audio_duration': audio_duration,
321
+ 'infer_step': infer_step, 'guidance_scale': guidance_scale
322
+ }
323
+
324
+ cached_result = generation_cache.get_cached_result(cache_params)
325
+ if cached_result:
326
+ return cached_result
327
+
328
+ # ๋‹ค์ค‘ ์‹œ๋“œ ์ƒ์„ฑ
329
+ num_candidates = MULTI_SEED_OPTIONS.get(multi_seed_mode, 1)
330
+
331
+ if num_candidates == 1:
332
+ # ๊ธฐ์กด ํ•จ์ˆ˜ ํ˜ธ์ถœ
333
+ result = original_func(
334
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
335
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
336
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
337
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
338
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
339
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
340
+ )
341
+ else:
342
+ # ๋‹ค์ค‘ ์‹œ๋“œ ์ƒ์„ฑ ๋ฐ ์ตœ์  ์„ ํƒ
343
+ candidates = []
344
+
345
+ for i in range(num_candidates):
346
+ seed = random.randint(1, 10000)
347
+
348
+ try:
349
+ result = original_func(
350
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
351
+ scheduler_type, cfg_type, omega_scale, str(seed),
352
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
353
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
354
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
355
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
356
+ )
357
+
358
+ if result and len(result) > 0:
359
+ audio_path = result[0] # ์ฒซ ๋ฒˆ์งธ ๊ฒฐ๊ณผ๊ฐ€ ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฒฝ๋กœ
360
+ if audio_path and os.path.exists(audio_path):
361
+ quality_score = calculate_quality_score(audio_path)
362
+ candidates.append({
363
+ "result": result,
364
+ "quality_score": quality_score,
365
+ "seed": seed
366
+ })
367
+ except Exception as e:
368
+ print(f"Generation {i+1} failed: {e}")
369
+ continue
370
+
371
+ if candidates:
372
+ # ์ตœ๏ฟฝ๏ฟฝ ํ’ˆ์งˆ ์„ ํƒ
373
+ best_candidate = max(candidates, key=lambda x: x["quality_score"])
374
+ result = best_candidate["result"]
375
+
376
+ # ํ’ˆ์งˆ ์ •๋ณด ์ถ”๊ฐ€
377
+ if len(result) > 1 and isinstance(result[1], dict):
378
+ result[1]["quality_score"] = best_candidate["quality_score"]
379
+ result[1]["selected_seed"] = best_candidate["seed"]
380
+ result[1]["candidates_count"] = len(candidates)
381
+ else:
382
+ # ๋ชจ๋“  ์ƒ์„ฑ ์‹คํŒจ์‹œ ๊ธฐ๋ณธ ์ƒ์„ฑ
383
+ result = original_func(
384
+ audio_duration, prompt, lyrics, infer_step, guidance_scale,
385
+ scheduler_type, cfg_type, omega_scale, manual_seeds,
386
+ guidance_interval, guidance_interval_decay, min_guidance_scale,
387
+ use_erg_tag, use_erg_lyric, use_erg_diffusion, oss_steps,
388
+ guidance_scale_text, guidance_scale_lyric, audio2audio_enable,
389
+ ref_audio_strength, ref_audio_input, lora_name_or_path, **kwargs
390
+ )
391
+
392
+ # ๊ฒฐ๊ณผ ์บ์‹œ
393
+ generation_cache.cache_result(cache_params, result)
394
+ return result
395
+
396
+ return enhanced_func
397
+
398
+ def create_output_ui(task_name="Text2Music"):
399
+ # For many consumer-grade GPU devices, only one batch can be run
400
+ output_audio1 = gr.Audio(type="filepath", label=f"{task_name} Generated Audio 1")
401
+
402
+ with gr.Accordion(f"{task_name} Parameters & Quality Info", open=False):
403
+ input_params_json = gr.JSON(label=f"{task_name} Parameters")
404
+
405
+ # ํ’ˆ์งˆ ์ •๋ณด ํ‘œ์‹œ ์ถ”๊ฐ€
406
+ with gr.Row():
407
+ quality_score = gr.Number(label="Quality Score (0-100)", value=0, interactive=False)
408
+ generation_info = gr.Textbox(
409
+ label="Generation Info",
410
+ value="",
411
+ interactive=False,
412
+ max_lines=2
413
+ )
414
+
415
+ outputs = [output_audio1]
416
+ return outputs, input_params_json
417
+
418
+ def dump_func(*args):
419
+ print(args)
420
+ return []
421
+
422
+ def create_text2music_ui(
423
+ gr,
424
+ text2music_process_func,
425
+ sample_data_func=None,
426
+ load_data_func=None,
427
+ ):
428
+ # ํ–ฅ์ƒ๋œ ํ”„๋กœ์„ธ์Šค ํ•จ์ˆ˜ ์ƒ์„ฑ
429
+ enhanced_process_func = create_enhanced_process_func(text2music_process_func)
430
+
431
+ with gr.Row():
432
+ with gr.Column():
433
+ # ํ’ˆ์งˆ ๋ฐ ์„ฑ๋Šฅ ์„ค์ • ์„น์…˜ ์ถ”๊ฐ€
434
+ with gr.Group():
435
+ gr.Markdown("### โšก ํ’ˆ์งˆ & ์„ฑ๋Šฅ ์„ค์ •")
436
+ with gr.Row():
437
+ quality_preset = gr.Dropdown(
438
+ choices=list(QUALITY_PRESETS.keys()),
439
+ value="Standard",
440
+ label="ํ’ˆ์งˆ ํ”„๋ฆฌ์…‹",
441
+ scale=2
442
+ )
443
+ multi_seed_mode = gr.Dropdown(
444
+ choices=list(MULTI_SEED_OPTIONS.keys()),
445
+ value="Single",
446
+ label="๋‹ค์ค‘ ์ƒ์„ฑ ๋ชจ๋“œ",
447
+ scale=2,
448
+ info="์—ฌ๋Ÿฌ ๋ฒˆ ์ƒ์„ฑํ•˜์—ฌ ์ตœ๊ณ  ํ’ˆ์งˆ ์„ ํƒ"
449
+ )
450
+
451
+ preset_description = gr.Textbox(
452
+ value=QUALITY_PRESETS["Standard"]["description"],
453
+ label="์„ค๋ช…",
454
+ interactive=False,
455
+ max_lines=1
456
+ )
457
+
458
+ with gr.Row(equal_height=True):
459
+ # add markdown, tags and lyrics examples are from ai music generation community
460
+ audio_duration = gr.Slider(
461
+ -1,
462
+ 240.0,
463
+ step=0.00001,
464
+ value=-1,
465
+ label="Audio Duration",
466
+ interactive=True,
467
+ info="-1 means random duration (30 ~ 240).",
468
+ scale=7,
469
+ )
470
+ random_bnt = gr.Button("๐ŸŽฒ Random", variant="secondary", scale=1)
471
+ preview_bnt = gr.Button("๐ŸŽต Preview", variant="secondary", scale=2)
472
+
473
+ # audio2audio
474
+ with gr.Row(equal_height=True):
475
+ audio2audio_enable = gr.Checkbox(
476
+ label="Enable Audio2Audio",
477
+ value=False,
478
+ info="Check to enable Audio-to-Audio generation using a reference audio.",
479
+ elem_id="audio2audio_checkbox"
480
+ )
481
+ lora_name_or_path = gr.Dropdown(
482
+ label="Lora Name or Path",
483
+ choices=["ACE-Step/ACE-Step-v1-chinese-rap-LoRA", "none"],
484
+ value="none",
485
+ allow_custom_value=True,
486
+ )
487
+
488
+ ref_audio_input = gr.Audio(
489
+ type="filepath",
490
+ label="Reference Audio (for Audio2Audio)",
491
+ visible=False,
492
+ elem_id="ref_audio_input",
493
+ show_download_button=True
494
+ )
495
+ ref_audio_strength = gr.Slider(
496
+ label="Refer audio strength",
497
+ minimum=0.0,
498
+ maximum=1.0,
499
+ step=0.01,
500
+ value=0.5,
501
+ elem_id="ref_audio_strength",
502
+ visible=False,
503
+ interactive=True,
504
+ )
505
+
506
+ def toggle_ref_audio_visibility(is_checked):
507
+ return (
508
+ gr.update(visible=is_checked, elem_id="ref_audio_input"),
509
+ gr.update(visible=is_checked, elem_id="ref_audio_strength"),
510
+ )
511
+
512
+ audio2audio_enable.change(
513
+ fn=toggle_ref_audio_visibility,
514
+ inputs=[audio2audio_enable],
515
+ outputs=[ref_audio_input, ref_audio_strength],
516
+ )
517
+
518
+ with gr.Column(scale=2):
519
+ with gr.Group():
520
+ gr.Markdown("""### ๐ŸŽผ ์Šค๋งˆํŠธ ํ”„๋กฌํ”„ํŠธ ์‹œ์Šคํ…œ
521
+ <center>์žฅ๋ฅด์™€ ์Šคํƒ€์ผ์„ ์„ ํƒํ•˜๋ฉด ์ž๋™์œผ๋กœ ์ตœ์ ํ™”๋œ ํƒœ๊ทธ๊ฐ€ ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.</center>""")
522
+
523
+ with gr.Row():
524
+ genre_preset = gr.Dropdown(
525
+ choices=["Custom"] + list(GENRE_PRESETS.keys()),
526
+ value="Custom",
527
+ label="์žฅ๋ฅด ํ”„๋ฆฌ์…‹",
528
+ scale=1,
529
+ )
530
+ song_style = gr.Dropdown(
531
+ choices=list(SONG_STYLES.keys()),
532
+ value="๋“€์—ฃ (๋‚จ๋…€ ํ˜ผ์„ฑ)",
533
+ label="๊ณก ์Šคํƒ€์ผ",
534
+ scale=1,
535
+ )
536
+ enable_smart_enhancement = gr.Checkbox(
537
+ label="์Šค๋งˆํŠธ ํ–ฅ์ƒ",
538
+ value=True,
539
+ info="์ž๋™ ํƒœ๊ทธ ์ตœ์ ํ™”",
540
+ scale=1
541
+ )
542
+
543
+ prompt = gr.Textbox(
544
+ lines=2,
545
+ label="Tags",
546
+ max_lines=4,
547
+ value=TAG_DEFAULT,
548
+ placeholder="์ฝค๋งˆ๋กœ ๊ตฌ๋ถ„๋œ ํƒœ๊ทธ๋“ค...",
549
+ )
550
+
551
+ with gr.Group():
552
+ gr.Markdown("""### ๐Ÿ“ AI ์ž‘์‚ฌ ์‹œ์Šคํ…œ
553
+ <center>์ฃผ์ œ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  'AI ์ž‘์‚ฌ' ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜๋ฉด ์ž๋™์œผ๋กœ ๊ฐ€์‚ฌ๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.</center>""")
554
+
555
+ with gr.Row():
556
+ lyric_prompt = gr.Textbox(
557
+ label="์ž‘์‚ฌ ์ฃผ์ œ",
558
+ placeholder="์˜ˆ: ์ฒซ์‚ฌ๋ž‘์˜ ์„ค๋ ˜, ์ด๋ณ„์˜ ์•„ํ””, ํฌ๋ง์ฐฌ ๋‚ด์ผ...",
559
+ scale=3
560
+ )
561
+ generate_lyrics_btn = gr.Button("๐Ÿค– AI ์ž‘์‚ฌ", variant="secondary", scale=1)
562
+
563
+ lyrics = gr.Textbox(
564
+ lines=9,
565
+ label="Lyrics",
566
+ max_lines=13,
567
+ value=LYRIC_DEFAULT,
568
+ placeholder="๊ฐ€์‚ฌ๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š”. [verse], [chorus] ๋“ฑ์˜ ๊ตฌ์กฐ ํƒœ๊ทธ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค."
569
+ )
570
+
571
+ with gr.Accordion("Basic Settings", open=False):
572
+ infer_step = gr.Slider(
573
+ minimum=1,
574
+ maximum=300,
575
+ step=1,
576
+ value=150,
577
+ label="Infer Steps",
578
+ interactive=True,
579
+ )
580
+ guidance_scale = gr.Slider(
581
+ minimum=0.0,
582
+ maximum=30.0,
583
+ step=0.1,
584
+ value=15.0,
585
+ label="Guidance Scale",
586
+ interactive=True,
587
+ info="When guidance_scale_lyric > 1 and guidance_scale_text > 1, the guidance scale will not be applied.",
588
+ )
589
+ guidance_scale_text = gr.Slider(
590
+ minimum=0.0,
591
+ maximum=10.0,
592
+ step=0.1,
593
+ value=0.0,
594
+ label="Guidance Scale Text",
595
+ interactive=True,
596
+ info="Guidance scale for text condition. It can only apply to cfg. set guidance_scale_text=5.0, guidance_scale_lyric=1.5 for start",
597
+ )
598
+ guidance_scale_lyric = gr.Slider(
599
+ minimum=0.0,
600
+ maximum=10.0,
601
+ step=0.1,
602
+ value=0.0,
603
+ label="Guidance Scale Lyric",
604
+ interactive=True,
605
+ )
606
+
607
+ manual_seeds = gr.Textbox(
608
+ label="manual seeds (default None)",
609
+ placeholder="1,2,3,4",
610
+ value=None,
611
+ info="Seed for the generation",
612
+ )
613
+
614
+ with gr.Accordion("Advanced Settings", open=False):
615
+ scheduler_type = gr.Radio(
616
+ ["euler", "heun"],
617
+ value="euler",
618
+ label="Scheduler Type",
619
+ elem_id="scheduler_type",
620
+ info="Scheduler type for the generation. euler is recommended. heun will take more time.",
621
+ )
622
+ cfg_type = gr.Radio(
623
+ ["cfg", "apg", "cfg_star"],
624
+ value="apg",
625
+ label="CFG Type",
626
+ elem_id="cfg_type",
627
+ info="CFG type for the generation. apg is recommended. cfg and cfg_star are almost the same.",
628
+ )
629
+ use_erg_tag = gr.Checkbox(
630
+ label="use ERG for tag",
631
+ value=True,
632
+ info="Use Entropy Rectifying Guidance for tag. It will multiple a temperature to the attention to make a weaker tag condition and make better diversity.",
633
+ )
634
+ use_erg_lyric = gr.Checkbox(
635
+ label="use ERG for lyric",
636
+ value=False,
637
+ info="The same but apply to lyric encoder's attention.",
638
+ )
639
+ use_erg_diffusion = gr.Checkbox(
640
+ label="use ERG for diffusion",
641
+ value=True,
642
+ info="The same but apply to diffusion model's attention.",
643
+ )
644
+
645
+ omega_scale = gr.Slider(
646
+ minimum=-100.0,
647
+ maximum=100.0,
648
+ step=0.1,
649
+ value=10.0,
650
+ label="Granularity Scale",
651
+ interactive=True,
652
+ info="Granularity scale for the generation. Higher values can reduce artifacts",
653
+ )
654
+
655
+ guidance_interval = gr.Slider(
656
+ minimum=0.0,
657
+ maximum=1.0,
658
+ step=0.01,
659
+ value=0.5,
660
+ label="Guidance Interval",
661
+ interactive=True,
662
+ info="Guidance interval for the generation. 0.5 means only apply guidance in the middle steps (0.25 * infer_steps to 0.75 * infer_steps)",
663
+ )
664
+ guidance_interval_decay = gr.Slider(
665
+ minimum=0.0,
666
+ maximum=1.0,
667
+ step=0.01,
668
+ value=0.0,
669
+ label="Guidance Interval Decay",
670
+ interactive=True,
671
+ info="Guidance interval decay for the generation. Guidance scale will decay from guidance_scale to min_guidance_scale in the interval. 0.0 means no decay.",
672
+ )
673
+ min_guidance_scale = gr.Slider(
674
+ minimum=0.0,
675
+ maximum=200.0,
676
+ step=0.1,
677
+ value=3.0,
678
+ label="Min Guidance Scale",
679
+ interactive=True,
680
+ info="Min guidance scale for guidance interval decay's end scale",
681
+ )
682
+ oss_steps = gr.Textbox(
683
+ label="OSS Steps",
684
+ placeholder="16, 29, 52, 96, 129, 158, 172, 183, 189, 200",
685
+ value=None,
686
+ info="Optimal Steps for the generation. But not test well",
687
+ )
688
+
689
+ text2music_bnt = gr.Button("๐ŸŽต Generate Music", variant="primary", size="lg")
690
+
691
+ # AI ์ž‘์‚ฌ ๋ฒ„ํŠผ ์ด๋ฒคํŠธ
692
+ def generate_ai_lyrics(lyric_prompt, genre_preset, song_style):
693
+ if not lyric_prompt:
694
+ return "์ž‘์‚ฌ ์ฃผ์ œ๋ฅผ ์ž…๋ ฅํ•ด์ฃผ์„ธ์š”."
695
+ return generate_lyrics_with_ai(lyric_prompt, genre_preset, song_style)
696
+
697
+ generate_lyrics_btn.click(
698
+ fn=generate_ai_lyrics,
699
+ inputs=[lyric_prompt, genre_preset, song_style],
700
+ outputs=[lyrics]
701
+ )
702
+
703
+ # ๋žœ๋ค ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํ•จ์ˆ˜
704
+ def generate_random_music_data(genre_preset, song_style):
705
+ # ๋žœ๋ค ์žฅ๋ฅด ์„ ํƒ
706
+ if genre_preset == "Custom":
707
+ genre = random.choice(list(GENRE_PRESETS.keys()))
708
+ else:
709
+ genre = genre_preset
710
+
711
+ # ๋žœ๋ค ์ฃผ์ œ ๋ฆฌ์ŠคํŠธ
712
+ themes = [
713
+ "๋„์‹œ์˜ ๋ฐค", "์ฒซ์‚ฌ๋ž‘์˜ ์ถ”์–ต", "์—ฌ๋ฆ„๋‚ ์˜ ํ•ด๋ณ€", "๊ฐ€์„์˜ ์ •์ทจ",
714
+ "ํฌ๋ง์ฐฌ ๋‚ด์ผ", "์ž์œ ๋กœ์šด ์˜ํ˜ผ", "๋ณ„๋น› ์•„๋ž˜ ์ถค", "์ฒญ์ถ˜์˜ ์—ด์ •",
715
+ "๋น„ ์˜ค๋Š” ๋‚ ์˜ ๊ฐ์„ฑ", "๊ฟˆ์„ ํ–ฅํ•œ ๋„์ „", "์ด๋ณ„ ํ›„์˜ ์„ฑ์žฅ", "์ƒˆ๋กœ์šด ์‹œ์ž‘"
716
+ ]
717
+
718
+ # ๋žœ๋ค ์„ค์ •
719
+ duration = random.choice([30, 60, 90, 120, 180])
720
+ theme = random.choice(themes)
721
+
722
+ # AI๋กœ ๊ฐ€์‚ฌ ์ƒ์„ฑ
723
+ lyrics = generate_lyrics_with_ai(theme, genre, song_style)
724
+
725
+ # ํƒœ๊ทธ ์ƒ์„ฑ
726
+ tags = GENRE_PRESETS.get(genre, "")
727
+ if song_style in SONG_STYLES:
728
+ tags = f"{tags}, {SONG_STYLES[song_style]}"
729
+
730
+ # ๋žœ๋ค ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •
731
+ return (
732
+ duration, # audio_duration
733
+ tags, # prompt
734
+ lyrics, # lyrics
735
+ 150, # infer_step
736
+ 15.0, # guidance_scale
737
+ "euler", # scheduler_type
738
+ "apg", # cfg_type
739
+ 10.0, # omega_scale
740
+ str(random.randint(1, 10000)), # manual_seeds
741
+ 0.5, # guidance_interval
742
+ 0.0, # guidance_interval_decay
743
+ 3.0, # min_guidance_scale
744
+ True, # use_erg_tag
745
+ False, # use_erg_lyric
746
+ True, # use_erg_diffusion
747
+ None, # oss_steps
748
+ 0.0, # guidance_scale_text
749
+ 0.0, # guidance_scale_lyric
750
+ False, # audio2audio_enable
751
+ 0.5, # ref_audio_strength
752
+ None, # ref_audio_input
753
+ )
754
+
755
+ # ๋ชจ๋“  UI ์š”์†Œ๊ฐ€ ์ •์˜๋œ ํ›„ ์ด๋ฒคํŠธ ํ•ธ๋“ค๋Ÿฌ ์„ค์ •
756
+ genre_preset.change(
757
+ fn=update_tags_from_preset,
758
+ inputs=[genre_preset],
759
+ outputs=[prompt]
760
+ )
761
+
762
+ quality_preset.change(
763
+ fn=lambda x: QUALITY_PRESETS.get(x, {}).get("description", ""),
764
+ inputs=[quality_preset],
765
+ outputs=[preset_description]
766
+ )
767
+
768
+ quality_preset.change(
769
+ fn=update_quality_preset,
770
+ inputs=[quality_preset],
771
+ outputs=[infer_step, guidance_scale, scheduler_type, omega_scale, use_erg_diffusion, use_erg_tag]
772
+ )
773
+
774
+ with gr.Column():
775
+ outputs, input_params_json = create_output_ui()
776
+
777
+ # ์‹ค์‹œ๊ฐ„ ํ”„๋ฆฌ๋ทฐ ๊ธฐ๋Šฅ
778
+ def generate_preview(prompt, lyrics, genre_preset, song_style):
779
+ """10์ดˆ ํ”„๋ฆฌ๋ทฐ ์ƒ์„ฑ"""
780
+ preview_params = {
781
+ "audio_duration": 10,
782
+ "infer_step": 50,
783
+ "guidance_scale": 12.0,
784
+ "scheduler_type": "euler",
785
+ "cfg_type": "apg",
786
+ "omega_scale": 5.0,
787
+ }
788
+
789
+ enhanced_prompt = enhance_prompt_with_genre(prompt, genre_preset, song_style)
790
+
791
+ try:
792
+ # ์‹ค์ œ ๊ตฌํ˜„์—์„œ๋Š” ๋น ๋ฅธ ์ƒ์„ฑ ๋ชจ๋“œ ์‚ฌ์šฉ
793
+ result = enhanced_process_func(
794
+ preview_params["audio_duration"],
795
+ enhanced_prompt,
796
+ lyrics[:200], # ๊ฐ€์‚ฌ ์ผ๋ถ€๋งŒ ์‚ฌ์šฉ
797
+ preview_params["infer_step"],
798
+ preview_params["guidance_scale"],
799
+ preview_params["scheduler_type"],
800
+ preview_params["cfg_type"],
801
+ preview_params["omega_scale"],
802
+ None, # manual_seeds
803
+ 0.5, # guidance_interval
804
+ 0.0, # guidance_interval_decay
805
+ 3.0, # min_guidance_scale
806
+ True, # use_erg_tag
807
+ False, # use_erg_lyric
808
+ True, # use_erg_diffusion
809
+ None, # oss_steps
810
+ 0.0, # guidance_scale_text
811
+ 0.0, # guidance_scale_lyric
812
+ multi_seed_mode="Single",
813
+ song_style=song_style
814
+ )
815
+ return result[0] if result else None
816
+ except Exception as e:
817
+ return f"ํ”„๋ฆฌ๋ทฐ ์ƒ์„ฑ ์‹คํŒจ: {str(e)}"
818
+
819
+ preview_bnt.click(
820
+ fn=generate_preview,
821
+ inputs=[prompt, lyrics, genre_preset, song_style],
822
+ outputs=[outputs[0]]
823
+ )
824
+
825
+ with gr.Tab("retake"):
826
+ retake_variance = gr.Slider(
827
+ minimum=0.0, maximum=1.0, step=0.01, value=0.2, label="variance"
828
+ )
829
+ retake_seeds = gr.Textbox(
830
+ label="retake seeds (default None)", placeholder="", value=None
831
+ )
832
+ retake_bnt = gr.Button("Retake", variant="primary")
833
+ retake_outputs, retake_input_params_json = create_output_ui("Retake")
834
+
835
+ def retake_process_func(json_data, retake_variance, retake_seeds):
836
+ return enhanced_process_func(
837
+ json_data.get("audio_duration", 30),
838
+ json_data.get("prompt", ""),
839
+ json_data.get("lyrics", ""),
840
+ json_data.get("infer_step", 100),
841
+ json_data.get("guidance_scale", 15.0),
842
+ json_data.get("scheduler_type", "euler"),
843
+ json_data.get("cfg_type", "apg"),
844
+ json_data.get("omega_scale", 10.0),
845
+ retake_seeds,
846
+ json_data.get("guidance_interval", 0.5),
847
+ json_data.get("guidance_interval_decay", 0.0),
848
+ json_data.get("min_guidance_scale", 3.0),
849
+ json_data.get("use_erg_tag", True),
850
+ json_data.get("use_erg_lyric", False),
851
+ json_data.get("use_erg_diffusion", True),
852
+ json_data.get("oss_steps", None),
853
+ json_data.get("guidance_scale_text", 0.0),
854
+ json_data.get("guidance_scale_lyric", 0.0),
855
+ audio2audio_enable=json_data.get("audio2audio_enable", False),
856
+ ref_audio_strength=json_data.get("ref_audio_strength", 0.5),
857
+ ref_audio_input=json_data.get("ref_audio_input", None),
858
+ lora_name_or_path=json_data.get("lora_name_or_path", "none"),
859
+ multi_seed_mode="Best of 3", # retake๋Š” ์ž๋™์œผ๋กœ ๋‹ค์ค‘ ์ƒ์„ฑ
860
+ retake_variance=retake_variance,
861
+ task="retake"
862
+ )
863
+
864
+ retake_bnt.click(
865
+ fn=retake_process_func,
866
+ inputs=[
867
+ input_params_json,
868
+ retake_variance,
869
+ retake_seeds,
870
+ ],
871
+ outputs=retake_outputs + [retake_input_params_json],
872
+ )
873
+
874
+ with gr.Tab("repainting"):
875
+ retake_variance = gr.Slider(
876
+ minimum=0.0, maximum=1.0, step=0.01, value=0.2, label="variance"
877
+ )
878
+ retake_seeds = gr.Textbox(
879
+ label="repaint seeds (default None)", placeholder="", value=None
880
+ )
881
+ repaint_start = gr.Slider(
882
+ minimum=0.0,
883
+ maximum=240.0,
884
+ step=0.01,
885
+ value=0.0,
886
+ label="Repaint Start Time",
887
+ interactive=True,
888
+ )
889
+ repaint_end = gr.Slider(
890
+ minimum=0.0,
891
+ maximum=240.0,
892
+ step=0.01,
893
+ value=30.0,
894
+ label="Repaint End Time",
895
+ interactive=True,
896
+ )
897
+ repaint_source = gr.Radio(
898
+ ["text2music", "last_repaint", "upload"],
899
+ value="text2music",
900
+ label="Repaint Source",
901
+ elem_id="repaint_source",
902
+ )
903
+
904
+ repaint_source_audio_upload = gr.Audio(
905
+ label="Upload Audio",
906
+ type="filepath",
907
+ visible=False,
908
+ elem_id="repaint_source_audio_upload",
909
+ show_download_button=True,
910
+ )
911
+ repaint_source.change(
912
+ fn=lambda x: gr.update(
913
+ visible=x == "upload", elem_id="repaint_source_audio_upload"
914
+ ),
915
+ inputs=[repaint_source],
916
+ outputs=[repaint_source_audio_upload],
917
+ )
918
+
919
+ repaint_bnt = gr.Button("Repaint", variant="primary")
920
+ repaint_outputs, repaint_input_params_json = create_output_ui("Repaint")
921
+
922
+ def repaint_process_func(
923
+ text2music_json_data,
924
+ repaint_json_data,
925
+ retake_variance,
926
+ retake_seeds,
927
+ repaint_start,
928
+ repaint_end,
929
+ repaint_source,
930
+ repaint_source_audio_upload,
931
+ prompt,
932
+ lyrics,
933
+ infer_step,
934
+ guidance_scale,
935
+ scheduler_type,
936
+ cfg_type,
937
+ omega_scale,
938
+ manual_seeds,
939
+ guidance_interval,
940
+ guidance_interval_decay,
941
+ min_guidance_scale,
942
+ use_erg_tag,
943
+ use_erg_lyric,
944
+ use_erg_diffusion,
945
+ oss_steps,
946
+ guidance_scale_text,
947
+ guidance_scale_lyric,
948
+ ):
949
+ if repaint_source == "upload":
950
+ src_audio_path = repaint_source_audio_upload
951
+ audio_duration = librosa.get_duration(filename=src_audio_path)
952
+ json_data = {"audio_duration": audio_duration}
953
+ elif repaint_source == "text2music":
954
+ json_data = text2music_json_data
955
+ src_audio_path = json_data["audio_path"]
956
+ elif repaint_source == "last_repaint":
957
+ json_data = repaint_json_data
958
+ src_audio_path = json_data["audio_path"]
959
+
960
+ return enhanced_process_func(
961
+ json_data["audio_duration"],
962
+ prompt,
963
+ lyrics,
964
+ infer_step,
965
+ guidance_scale,
966
+ scheduler_type,
967
+ cfg_type,
968
+ omega_scale,
969
+ manual_seeds,
970
+ guidance_interval,
971
+ guidance_interval_decay,
972
+ min_guidance_scale,
973
+ use_erg_tag,
974
+ use_erg_lyric,
975
+ use_erg_diffusion,
976
+ oss_steps,
977
+ guidance_scale_text,
978
+ guidance_scale_lyric,
979
+ retake_seeds=retake_seeds,
980
+ retake_variance=retake_variance,
981
+ task="repaint",
982
+ repaint_start=repaint_start,
983
+ repaint_end=repaint_end,
984
+ src_audio_path=src_audio_path,
985
+ lora_name_or_path="none"
986
+ )
987
+
988
+ repaint_bnt.click(
989
+ fn=repaint_process_func,
990
+ inputs=[
991
+ input_params_json,
992
+ repaint_input_params_json,
993
+ retake_variance,
994
+ retake_seeds,
995
+ repaint_start,
996
+ repaint_end,
997
+ repaint_source,
998
+ repaint_source_audio_upload,
999
+ prompt,
1000
+ lyrics,
1001
+ infer_step,
1002
+ guidance_scale,
1003
+ scheduler_type,
1004
+ cfg_type,
1005
+ omega_scale,
1006
+ manual_seeds,
1007
+ guidance_interval,
1008
+ guidance_interval_decay,
1009
+ min_guidance_scale,
1010
+ use_erg_tag,
1011
+ use_erg_lyric,
1012
+ use_erg_diffusion,
1013
+ oss_steps,
1014
+ guidance_scale_text,
1015
+ guidance_scale_lyric,
1016
+ ],
1017
+ outputs=repaint_outputs + [repaint_input_params_json],
1018
+ )
1019
+
1020
+ with gr.Tab("edit"):
1021
+ edit_prompt = gr.Textbox(lines=2, label="Edit Tags", max_lines=4)
1022
+ edit_lyrics = gr.Textbox(lines=9, label="Edit Lyrics", max_lines=13)
1023
+ retake_seeds = gr.Textbox(
1024
+ label="edit seeds (default None)", placeholder="", value=None
1025
+ )
1026
+
1027
+ edit_type = gr.Radio(
1028
+ ["only_lyrics", "remix"],
1029
+ value="only_lyrics",
1030
+ label="Edit Type",
1031
+ elem_id="edit_type",
1032
+ info="`only_lyrics` will keep the whole song the same except lyrics difference. Make your diffrence smaller, e.g. one lyrc line change.\nremix can change the song melody and genre",
1033
+ )
1034
+ edit_n_min = gr.Slider(
1035
+ minimum=0.0,
1036
+ maximum=1.0,
1037
+ step=0.01,
1038
+ value=0.6,
1039
+ label="edit_n_min",
1040
+ interactive=True,
1041
+ )
1042
+ edit_n_max = gr.Slider(
1043
+ minimum=0.0,
1044
+ maximum=1.0,
1045
+ step=0.01,
1046
+ value=1.0,
1047
+ label="edit_n_max",
1048
+ interactive=True,
1049
+ )
1050
+
1051
+ def edit_type_change_func(edit_type):
1052
+ if edit_type == "only_lyrics":
1053
+ n_min = 0.6
1054
+ n_max = 1.0
1055
+ elif edit_type == "remix":
1056
+ n_min = 0.2
1057
+ n_max = 0.4
1058
+ return n_min, n_max
1059
+
1060
+ edit_type.change(
1061
+ edit_type_change_func,
1062
+ inputs=[edit_type],
1063
+ outputs=[edit_n_min, edit_n_max],
1064
+ )
1065
+
1066
+ edit_source = gr.Radio(
1067
+ ["text2music", "last_edit", "upload"],
1068
+ value="text2music",
1069
+ label="Edit Source",
1070
+ elem_id="edit_source",
1071
+ )
1072
+ edit_source_audio_upload = gr.Audio(
1073
+ label="Upload Audio",
1074
+ type="filepath",
1075
+ visible=False,
1076
+ elem_id="edit_source_audio_upload",
1077
+ show_download_button=True,
1078
+ )
1079
+ edit_source.change(
1080
+ fn=lambda x: gr.update(
1081
+ visible=x == "upload", elem_id="edit_source_audio_upload"
1082
+ ),
1083
+ inputs=[edit_source],
1084
+ outputs=[edit_source_audio_upload],
1085
+ )
1086
+
1087
+ edit_bnt = gr.Button("Edit", variant="primary")
1088
+ edit_outputs, edit_input_params_json = create_output_ui("Edit")
1089
+
1090
+ def edit_process_func(
1091
+ text2music_json_data,
1092
+ edit_input_params_json,
1093
+ edit_source,
1094
+ edit_source_audio_upload,
1095
+ prompt,
1096
+ lyrics,
1097
+ edit_prompt,
1098
+ edit_lyrics,
1099
+ edit_n_min,
1100
+ edit_n_max,
1101
+ infer_step,
1102
+ guidance_scale,
1103
+ scheduler_type,
1104
+ cfg_type,
1105
+ omega_scale,
1106
+ manual_seeds,
1107
+ guidance_interval,
1108
+ guidance_interval_decay,
1109
+ min_guidance_scale,
1110
+ use_erg_tag,
1111
+ use_erg_lyric,
1112
+ use_erg_diffusion,
1113
+ oss_steps,
1114
+ guidance_scale_text,
1115
+ guidance_scale_lyric,
1116
+ retake_seeds,
1117
+ ):
1118
+ if edit_source == "upload":
1119
+ src_audio_path = edit_source_audio_upload
1120
+ audio_duration = librosa.get_duration(filename=src_audio_path)
1121
+ json_data = {"audio_duration": audio_duration}
1122
+ elif edit_source == "text2music":
1123
+ json_data = text2music_json_data
1124
+ src_audio_path = json_data["audio_path"]
1125
+ elif edit_source == "last_edit":
1126
+ json_data = edit_input_params_json
1127
+ src_audio_path = json_data["audio_path"]
1128
+
1129
+ if not edit_prompt:
1130
+ edit_prompt = prompt
1131
+ if not edit_lyrics:
1132
+ edit_lyrics = lyrics
1133
+
1134
+ return enhanced_process_func(
1135
+ json_data["audio_duration"],
1136
+ prompt,
1137
+ lyrics,
1138
+ infer_step,
1139
+ guidance_scale,
1140
+ scheduler_type,
1141
+ cfg_type,
1142
+ omega_scale,
1143
+ manual_seeds,
1144
+ guidance_interval,
1145
+ guidance_interval_decay,
1146
+ min_guidance_scale,
1147
+ use_erg_tag,
1148
+ use_erg_lyric,
1149
+ use_erg_diffusion,
1150
+ oss_steps,
1151
+ guidance_scale_text,
1152
+ guidance_scale_lyric,
1153
+ task="edit",
1154
+ src_audio_path=src_audio_path,
1155
+ edit_target_prompt=edit_prompt,
1156
+ edit_target_lyrics=edit_lyrics,
1157
+ edit_n_min=edit_n_min,
1158
+ edit_n_max=edit_n_max,
1159
+ retake_seeds=retake_seeds,
1160
+ lora_name_or_path="none"
1161
+ )
1162
+
1163
+ edit_bnt.click(
1164
+ fn=edit_process_func,
1165
+ inputs=[
1166
+ input_params_json,
1167
+ edit_input_params_json,
1168
+ edit_source,
1169
+ edit_source_audio_upload,
1170
+ prompt,
1171
+ lyrics,
1172
+ edit_prompt,
1173
+ edit_lyrics,
1174
+ edit_n_min,
1175
+ edit_n_max,
1176
+ infer_step,
1177
+ guidance_scale,
1178
+ scheduler_type,
1179
+ cfg_type,
1180
+ omega_scale,
1181
+ manual_seeds,
1182
+ guidance_interval,
1183
+ guidance_interval_decay,
1184
+ min_guidance_scale,
1185
+ use_erg_tag,
1186
+ use_erg_lyric,
1187
+ use_erg_diffusion,
1188
+ oss_steps,
1189
+ guidance_scale_text,
1190
+ guidance_scale_lyric,
1191
+ retake_seeds,
1192
+ ],
1193
+ outputs=edit_outputs + [edit_input_params_json],
1194
+ )
1195
+
1196
+ with gr.Tab("extend"):
1197
+ extend_seeds = gr.Textbox(
1198
+ label="extend seeds (default None)", placeholder="", value=None
1199
+ )
1200
+ left_extend_length = gr.Slider(
1201
+ minimum=0.0,
1202
+ maximum=240.0,
1203
+ step=0.01,
1204
+ value=0.0,
1205
+ label="Left Extend Length",
1206
+ interactive=True,
1207
+ )
1208
+ right_extend_length = gr.Slider(
1209
+ minimum=0.0,
1210
+ maximum=240.0,
1211
+ step=0.01,
1212
+ value=30.0,
1213
+ label="Right Extend Length",
1214
+ interactive=True,
1215
+ )
1216
+ extend_source = gr.Radio(
1217
+ ["text2music", "last_extend", "upload"],
1218
+ value="text2music",
1219
+ label="Extend Source",
1220
+ elem_id="extend_source",
1221
+ )
1222
+
1223
+ extend_source_audio_upload = gr.Audio(
1224
+ label="Upload Audio",
1225
+ type="filepath",
1226
+ visible=False,
1227
+ elem_id="extend_source_audio_upload",
1228
+ show_download_button=True,
1229
+ )
1230
+ extend_source.change(
1231
+ fn=lambda x: gr.update(
1232
+ visible=x == "upload", elem_id="extend_source_audio_upload"
1233
+ ),
1234
+ inputs=[extend_source],
1235
+ outputs=[extend_source_audio_upload],
1236
+ )
1237
+
1238
+ extend_bnt = gr.Button("Extend", variant="primary")
1239
+ extend_outputs, extend_input_params_json = create_output_ui("Extend")
1240
+
1241
+ def extend_process_func(
1242
+ text2music_json_data,
1243
+ extend_input_params_json,
1244
+ extend_seeds,
1245
+ left_extend_length,
1246
+ right_extend_length,
1247
+ extend_source,
1248
+ extend_source_audio_upload,
1249
+ prompt,
1250
+ lyrics,
1251
+ infer_step,
1252
+ guidance_scale,
1253
+ scheduler_type,
1254
+ cfg_type,
1255
+ omega_scale,
1256
+ manual_seeds,
1257
+ guidance_interval,
1258
+ guidance_interval_decay,
1259
+ min_guidance_scale,
1260
+ use_erg_tag,
1261
+ use_erg_lyric,
1262
+ use_erg_diffusion,
1263
+ oss_steps,
1264
+ guidance_scale_text,
1265
+ guidance_scale_lyric,
1266
+ ):
1267
+ if extend_source == "upload":
1268
+ src_audio_path = extend_source_audio_upload
1269
+ # get audio duration
1270
+ audio_duration = librosa.get_duration(filename=src_audio_path)
1271
+ json_data = {"audio_duration": audio_duration}
1272
+ elif extend_source == "text2music":
1273
+ json_data = text2music_json_data
1274
+ src_audio_path = json_data["audio_path"]
1275
+ elif extend_source == "last_extend":
1276
+ json_data = extend_input_params_json
1277
+ src_audio_path = json_data["audio_path"]
1278
+
1279
+ repaint_start = -left_extend_length
1280
+ repaint_end = json_data["audio_duration"] + right_extend_length
1281
+ return enhanced_process_func(
1282
+ json_data["audio_duration"],
1283
+ prompt,
1284
+ lyrics,
1285
+ infer_step,
1286
+ guidance_scale,
1287
+ scheduler_type,
1288
+ cfg_type,
1289
+ omega_scale,
1290
+ manual_seeds,
1291
+ guidance_interval,
1292
+ guidance_interval_decay,
1293
+ min_guidance_scale,
1294
+ use_erg_tag,
1295
+ use_erg_lyric,
1296
+ use_erg_diffusion,
1297
+ oss_steps,
1298
+ guidance_scale_text,
1299
+ guidance_scale_lyric,
1300
+ retake_seeds=extend_seeds,
1301
+ retake_variance=1.0,
1302
+ task="extend",
1303
+ repaint_start=repaint_start,
1304
+ repaint_end=repaint_end,
1305
+ src_audio_path=src_audio_path,
1306
+ lora_name_or_path="none"
1307
+ )
1308
+
1309
+ extend_bnt.click(
1310
+ fn=extend_process_func,
1311
+ inputs=[
1312
+ input_params_json,
1313
+ extend_input_params_json,
1314
+ extend_seeds,
1315
+ left_extend_length,
1316
+ right_extend_length,
1317
+ extend_source,
1318
+ extend_source_audio_upload,
1319
+ prompt,
1320
+ lyrics,
1321
+ infer_step,
1322
+ guidance_scale,
1323
+ scheduler_type,
1324
+ cfg_type,
1325
+ omega_scale,
1326
+ manual_seeds,
1327
+ guidance_interval,
1328
+ guidance_interval_decay,
1329
+ min_guidance_scale,
1330
+ use_erg_tag,
1331
+ use_erg_lyric,
1332
+ use_erg_diffusion,
1333
+ oss_steps,
1334
+ guidance_scale_text,
1335
+ guidance_scale_lyric,
1336
+ ],
1337
+ outputs=extend_outputs + [extend_input_params_json],
1338
+ )
1339
+
1340
+ # Random ๋ฒ„ํŠผ ์ด๋ฒคํŠธ
1341
+ random_bnt.click(
1342
+ fn=generate_random_music_data,
1343
+ inputs=[genre_preset, song_style],
1344
+ outputs=[
1345
+ audio_duration,
1346
+ prompt,
1347
+ lyrics,
1348
+ infer_step,
1349
+ guidance_scale,
1350
+ scheduler_type,
1351
+ cfg_type,
1352
+ omega_scale,
1353
+ manual_seeds,
1354
+ guidance_interval,
1355
+ guidance_interval_decay,
1356
+ min_guidance_scale,
1357
+ use_erg_tag,
1358
+ use_erg_lyric,
1359
+ use_erg_diffusion,
1360
+ oss_steps,
1361
+ guidance_scale_text,
1362
+ guidance_scale_lyric,
1363
+ audio2audio_enable,
1364
+ ref_audio_strength,
1365
+ ref_audio_input,
1366
+ ],
1367
+ )
1368
+
1369
+ # ๋ฉ”์ธ ์ƒ์„ฑ ๋ฒ„ํŠผ ์ด๋ฒคํŠธ (ํ–ฅ์ƒ๋œ ํ•จ์ˆ˜ ์‚ฌ์šฉ)
1370
+ text2music_bnt.click(
1371
+ fn=enhanced_process_func,
1372
+ inputs=[
1373
+ audio_duration,
1374
+ prompt,
1375
+ lyrics,
1376
+ infer_step,
1377
+ guidance_scale,
1378
+ scheduler_type,
1379
+ cfg_type,
1380
+ omega_scale,
1381
+ manual_seeds,
1382
+ guidance_interval,
1383
+ guidance_interval_decay,
1384
+ min_guidance_scale,
1385
+ use_erg_tag,
1386
+ use_erg_lyric,
1387
+ use_erg_diffusion,
1388
+ oss_steps,
1389
+ guidance_scale_text,
1390
+ guidance_scale_lyric,
1391
+ audio2audio_enable,
1392
+ ref_audio_strength,
1393
+ ref_audio_input,
1394
+ lora_name_or_path,
1395
+ multi_seed_mode,
1396
+ enable_smart_enhancement,
1397
+ genre_preset,
1398
+ song_style
1399
+ ],
1400
+ outputs=outputs + [input_params_json],
1401
+ )
1402
+
1403
+
1404
+ def create_main_demo_ui(
1405
+ text2music_process_func=dump_func,
1406
+ sample_data_func=dump_func,
1407
+ load_data_func=dump_func,
1408
+ ):
1409
+ with gr.Blocks(
1410
+ title="ACE-Step Model 1.0 DEMO - Enhanced",
1411
+ theme=gr.themes.Soft(),
1412
+ css="""
1413
+ /* ๊ทธ๋ผ๋””์–ธํŠธ ๋ฐฐ๊ฒฝ */
1414
+ .gradio-container {
1415
+ max-width: 1200px !important;
1416
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
1417
+ min-height: 100vh;
1418
+ }
1419
+
1420
+ /* ๋ฉ”์ธ ์ปจํ…Œ์ด๋„ˆ ์Šคํƒ€์ผ */
1421
+ .main-container {
1422
+ background: rgba(255, 255, 255, 0.95);
1423
+ border-radius: 20px;
1424
+ padding: 30px;
1425
+ margin: 20px auto;
1426
+ box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
1427
+ }
1428
+
1429
+ /* ํ—ค๋” ์Šคํƒ€์ผ */
1430
+ .header-title {
1431
+ background: linear-gradient(45deg, #667eea, #764ba2);
1432
+ -webkit-background-clip: text;
1433
+ -webkit-text-fill-color: transparent;
1434
+ font-size: 3em;
1435
+ font-weight: bold;
1436
+ text-align: center;
1437
+ margin-bottom: 10px;
1438
+ }
1439
+
1440
+ /* ๋ฒ„ํŠผ ์Šคํƒ€์ผ */
1441
+ .gr-button-primary {
1442
+ background: linear-gradient(45deg, #667eea, #764ba2) !important;
1443
+ border: none !important;
1444
+ color: white !important;
1445
+ font-weight: bold !important;
1446
+ transition: all 0.3s ease !important;
1447
+ }
1448
+
1449
+ .gr-button-primary:hover {
1450
+ transform: translateY(-2px);
1451
+ box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
1452
+ }
1453
+
1454
+ .gr-button-secondary {
1455
+ background: linear-gradient(45deg, #f093fb, #f5576c) !important;
1456
+ border: none !important;
1457
+ color: white !important;
1458
+ transition: all 0.3s ease !important;
1459
+ }
1460
+
1461
+ /* ๊ทธ๋ฃน ์Šคํƒ€์ผ */
1462
+ .gr-group {
1463
+ background: rgba(255, 255, 255, 0.8) !important;
1464
+ border: 1px solid rgba(102, 126, 234, 0.2) !important;
1465
+ border-radius: 15px !important;
1466
+ padding: 20px !important;
1467
+ margin: 10px 0 !important;
1468
+ backdrop-filter: blur(10px) !important;
1469
+ }
1470
+
1471
+ /* ํƒญ ์Šคํƒ€์ผ */
1472
+ .gr-tab {
1473
+ background: rgba(255, 255, 255, 0.9) !important;
1474
+ border-radius: 10px !important;
1475
+ padding: 15px !important;
1476
+ }
1477
+
1478
+ /* ์ž…๋ ฅ ํ•„๋“œ ์Šคํƒ€์ผ */
1479
+ .gr-textbox, .gr-dropdown, .gr-slider {
1480
+ border: 2px solid rgba(102, 126, 234, 0.3) !important;
1481
+ border-radius: 10px !important;
1482
+ transition: all 0.3s ease !important;
1483
+ }
1484
+
1485
+ .gr-textbox:focus, .gr-dropdown:focus {
1486
+ border-color: #667eea !important;
1487
+ box-shadow: 0 0 10px rgba(102, 126, 234, 0.2) !important;
1488
+ }
1489
+
1490
+ /* ํ’ˆ์งˆ ์ •๋ณด ์Šคํƒ€์ผ */
1491
+ .quality-info {
1492
+ background: linear-gradient(135deg, #f093fb20, #f5576c20);
1493
+ padding: 15px;
1494
+ border-radius: 10px;
1495
+ margin: 10px 0;
1496
+ border: 1px solid rgba(240, 147, 251, 0.3);
1497
+ }
1498
+
1499
+ /* ์• ๋‹ˆ๋ฉ”์ด์…˜ */
1500
+ @keyframes fadeIn {
1501
+ from {
1502
+ opacity: 0;
1503
+ transform: translateY(20px);
1504
+ }
1505
+ to {
1506
+ opacity: 1;
1507
+ transform: translateY(0);
1508
+ }
1509
+ }
1510
+
1511
+ .gr-row, .gr-column {
1512
+ animation: fadeIn 0.5s ease-out;
1513
+ }
1514
+
1515
+ /* ์Šคํฌ๋กค๋ฐ” ์Šคํƒ€์ผ */
1516
+ ::-webkit-scrollbar {
1517
+ width: 10px;
1518
+ }
1519
+
1520
+ ::-webkit-scrollbar-track {
1521
+ background: rgba(255, 255, 255, 0.1);
1522
+ border-radius: 10px;
1523
+ }
1524
+
1525
+ ::-webkit-scrollbar-thumb {
1526
+ background: linear-gradient(45deg, #667eea, #764ba2);
1527
+ border-radius: 10px;
1528
+ }
1529
+
1530
+ /* ๋งˆํฌ๋‹ค์šด ์Šคํƒ€์ผ */
1531
+ .gr-markdown {
1532
+ color: #4a5568 !important;
1533
+ }
1534
+
1535
+ .gr-markdown h3 {
1536
+ color: #667eea !important;
1537
+ font-weight: 600 !important;
1538
+ margin: 15px 0 !important;
1539
+ }
1540
+ """
1541
+ ) as demo:
1542
+ with gr.Column(elem_classes="main-container"):
1543
+ gr.HTML(
1544
+ """
1545
+ <h1 class="header-title">๐ŸŽต ACE-Step PRO</h1>
1546
+ <div style="text-align: center; margin: 20px;">
1547
+ <p style="font-size: 1.2em; color: #4a5568;"><strong>๐Ÿš€ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ:</strong> AI ์ž‘์‚ฌ | ํ’ˆ์งˆ ํ”„๋ฆฌ์…‹ | ๋‹ค์ค‘ ์ƒ์„ฑ | ์Šค๋งˆํŠธ ํ”„๋กฌํ”„ํŠธ | ์‹ค์‹œ๊ฐ„ ํ”„๋ฆฌ๋ทฐ</p>
1548
+ <p style="margin-top: 10px;">
1549
+ <a href="https://ace-step.github.io/" target='_blank' style="color: #667eea; text-decoration: none; margin: 0 10px;">๐Ÿ“„ Project</a> |
1550
+ <a href="https://huggingface.co/ACE-Step/ACE-Step-v1-3.5B" style="color: #667eea; text-decoration: none; margin: 0 10px;">๐Ÿค— Checkpoints</a> |
1551
+ <a href="https://discord.gg/rjAZz2xBdG" target='_blank' style="color: #667eea; text-decoration: none; margin: 0 10px;">๐Ÿ’ฌ Discord</a>
1552
+ </p>
1553
+ </div>
1554
+ """
1555
+ )
1556
+
1557
+ # ์‚ฌ์šฉ๋ฒ• ๊ฐ€์ด๋“œ ์ถ”๊ฐ€
1558
+ with gr.Accordion("๐Ÿ“– ์‚ฌ์šฉ๋ฒ• ๊ฐ€์ด๋“œ", open=False):
1559
+ gr.Markdown("""
1560
+ ### ๐ŸŽฏ ๋น ๋ฅธ ์‹œ์ž‘
1561
+ 1. **์žฅ๋ฅด & ์Šคํƒ€์ผ ์„ ํƒ**: ์›ํ•˜๋Š” ์Œ์•… ์žฅ๋ฅด์™€ ๊ณก ์Šคํƒ€์ผ(๋“€์—ฃ, ์†”๋กœ ๋“ฑ)์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค
1562
+ 2. **AI ์ž‘์‚ฌ**: ์ฃผ์ œ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  'AI ์ž‘์‚ฌ' ๋ฒ„ํŠผ์œผ๋กœ ์ž๋™ ๊ฐ€์‚ฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค
1563
+ 3. **ํ’ˆ์งˆ ์„ค์ •**: Draft(๋น ๋ฆ„) โ†’ Standard(๊ถŒ์žฅ) โ†’ High Quality โ†’ Ultra ์ค‘ ์„ ํƒ
1564
+ 4. **๋‹ค์ค‘ ์ƒ์„ฑ**: "Best of 3/5/10" ์„ ํƒํ•˜๋ฉด ์—ฌ๋Ÿฌ ๋ฒˆ ์ƒ์„ฑํ•˜์—ฌ ์ตœ๊ณ  ํ’ˆ์งˆ์„ ์ž๋™ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค
1565
+ 5. **ํ”„๋ฆฌ๋ทฐ**: ์ „์ฒด ์ƒ์„ฑ ์ „ 10์ดˆ ํ”„๋ฆฌ๋ทฐ๋กœ ๋น ๋ฅด๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
1566
+
1567
+ ### ๐Ÿ’ก ํ’ˆ์งˆ ํ–ฅ์ƒ ํŒ
1568
+ - **๊ณ ํ’ˆ์งˆ ์ƒ์„ฑ**: "High Quality" + "Best of 5" ์กฐํ•ฉ ์ถ”์ฒœ
1569
+ - **๋น ๋ฅธ ํ…Œ์ŠคํŠธ**: "Draft" + "ํ”„๋ฆฌ๋ทฐ" ๊ธฐ๋Šฅ ํ™œ์šฉ
1570
+ - **์žฅ๋ฅด ํŠนํ™”**: ์žฅ๋ฅด ํ”„๋ฆฌ์…‹ ์„ ํƒ ํ›„ "์Šค๋งˆํŠธ ํ–ฅ์ƒ" ์ฒดํฌ
1571
+ - **๊ฐ€์‚ฌ ๊ตฌ์กฐ**: [verse], [chorus], [bridge] ํƒœ๊ทธ ์ ๊ทน ํ™œ์šฉ
1572
+ - **๋‹ค๊ตญ์–ด ์ง€์›**: ํ•œ๊ตญ์–ด๋กœ ์ฃผ์ œ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•œ๊ตญ์–ด ๊ฐ€์‚ฌ๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค
1573
+ """)
1574
+
1575
+ with gr.Tab("๐ŸŽต Enhanced Text2Music", elem_classes="gr-tab"):
1576
+ create_text2music_ui(
1577
+ gr=gr,
1578
+ text2music_process_func=text2music_process_func,
1579
+ sample_data_func=sample_data_func,
1580
+ load_data_func=load_data_func,
1581
+ )
1582
+ return demo
1583
+
1584
+
1585
+ if __name__ == "__main__":
1586
+ demo = create_main_demo_ui()
1587
+ demo.launch(
1588
+ server_name="0.0.0.0",
1589
+ server_port=7860,
1590
+ share=True # ๊ณต์œ  ๋งํฌ ์ƒ์„ฑ
1591
+ )