heybaeheef commited on
Commit
3cc9d6f
ยท
verified ยท
1 Parent(s): a7b8b45

Upload 9 files

Browse files
Dockerfile ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # ์‹œ์Šคํ…œ ํŒจํ‚ค์ง€ ์„ค์น˜
6
+ RUN apt-get update && apt-get install -y \
7
+ libsndfile1 \
8
+ ffmpeg \
9
+ git \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Python ํŒจํ‚ค์ง€ ์„ค์น˜
13
+ COPY requirements.txt .
14
+ RUN pip install --no-cache-dir -r requirements.txt
15
+
16
+ # ์•ฑ ์ฝ”๋“œ ๋ณต์‚ฌ
17
+ COPY . .
18
+
19
+ # Hugging Face Spaces๋Š” ํฌํŠธ 7860 ์‚ฌ์šฉ
20
+ EXPOSE 7860
21
+
22
+ # ์„œ๋ฒ„ ์‹คํ–‰
23
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,29 @@
1
  ---
2
- title: KU SW Academy
3
- emoji: ๐ŸŒ–
4
- colorFrom: red
5
- colorTo: yellow
6
  sdk: docker
 
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: DiffVox AI Vocal Effects Server
3
+ emoji: ๐ŸŽค
4
+ colorFrom: purple
5
+ colorTo: pink
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
  ---
10
 
11
+ # DiffVox AI Vocal Effects Server
12
+
13
+ AI-powered vocal effect processing server using DiffVox LLM.
14
+
15
+ ## API Endpoints
16
+
17
+ - `GET /` - Server info
18
+ - `GET /health` - Health check
19
+ - `POST /predict` - Predict effect parameters
20
+ - `POST /process` - Process audio with AI-predicted parameters
21
+ - `POST /process_with_params` - Process audio and return parameters + audio
22
+
23
+ ## Usage
24
+
25
+ ```bash
26
+ curl -X POST "https://YOUR-SPACE.hf.space/process_with_params" \
27
+ -F "audio=@your_vocal.wav" \
28
+ -F "prompt=warm vintage sound"
29
+ ```
audio_processing/__init__.py ADDED
File without changes
audio_processing/effect_chain.py ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio Effect Chain
3
+ ==================
4
+ ์‹ค์ œ ์˜ค๋””์˜ค์— ์ดํŽ™ํŠธ๋ฅผ ์ ์šฉํ•˜๋Š” ์ฒ˜๋ฆฌ ์ฒด์ธ
5
+
6
+ pedalboard ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ (Spotify์—์„œ ๋งŒ๋“  ์˜ค๋””์˜ค ํ”Œ๋Ÿฌ๊ทธ์ธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)
7
+ - ๊ณ ํ’ˆ์งˆ VST ์ˆ˜์ค€์˜ ์ดํŽ™ํŠธ
8
+ - Python์—์„œ ์‰ฝ๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
9
+ - ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ๋„ ๊ฐ€๋Šฅ
10
+ """
11
+
12
+ import numpy as np
13
+ from pathlib import Path
14
+ from typing import Dict, Any, List
15
+ import soundfile as sf
16
+
17
+ # pedalboard - ์˜ค๋””์˜ค ์ดํŽ™ํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
18
+ from pedalboard import (
19
+ Pedalboard,
20
+ Compressor,
21
+ Gain,
22
+ LowShelfFilter,
23
+ HighShelfFilter,
24
+ PeakFilter,
25
+ Delay,
26
+ Reverb,
27
+ Distortion,
28
+ Limiter,
29
+ HighpassFilter,
30
+ LowpassFilter
31
+ )
32
+ from pedalboard.io import AudioFile
33
+
34
+
35
+ class EffectChain:
36
+ """์˜ค๋””์˜ค ์ดํŽ™ํŠธ ์ฒ˜๋ฆฌ ์ฒด์ธ"""
37
+
38
+ AVAILABLE_EFFECTS = [
39
+ "eq_lowshelf",
40
+ "eq_highshelf",
41
+ "eq_peak1",
42
+ "eq_peak2",
43
+ "compressor",
44
+ "distortion",
45
+ "delay",
46
+ "reverb",
47
+ "limiter"
48
+ ]
49
+
50
+ def __init__(self):
51
+ """์ดํŽ™ํŠธ ์ฒด์ธ ์ดˆ๊ธฐํ™”"""
52
+ pass
53
+
54
+ def get_available_effects(self) -> List[str]:
55
+ """์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ดํŽ™ํŠธ ๋ชฉ๋ก ๋ฐ˜ํ™˜"""
56
+ return self.AVAILABLE_EFFECTS.copy()
57
+
58
+ def process(
59
+ self,
60
+ input_path: str,
61
+ output_path: str,
62
+ parameters: Dict[str, float]
63
+ ) -> None:
64
+ """
65
+ ์˜ค๋””์˜ค ํŒŒ์ผ์— ์ดํŽ™ํŠธ ์ฒด์ธ ์ ์šฉ
66
+
67
+ Args:
68
+ input_path: ์ž…๋ ฅ ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฒฝ๋กœ
69
+ output_path: ์ถœ๋ ฅ ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฒฝ๋กœ
70
+ parameters: ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ๋”•์…”๋„ˆ๋ฆฌ
71
+ """
72
+ # ์˜ค๋””์˜ค ํŒŒ์ผ ์ฝ๊ธฐ
73
+ audio, sample_rate = sf.read(input_path)
74
+
75
+ # ๋ชจ๋…ธ๋ฉด ์Šคํ…Œ๋ ˆ์˜ค๋กœ ๋ณ€ํ™˜ (์ผ๋ถ€ ์ดํŽ™ํŠธ๊ฐ€ ์Šคํ…Œ๋ ˆ์˜ค ํ•„์š”)
76
+ if len(audio.shape) == 1:
77
+ audio = np.column_stack([audio, audio])
78
+
79
+ # float32๋กœ ๋ณ€ํ™˜
80
+ audio = audio.astype(np.float32)
81
+
82
+ # ์ดํŽ™ํŠธ ์ฒด์ธ ๊ตฌ์„ฑ
83
+ board = self._build_pedalboard(parameters, sample_rate)
84
+
85
+ # ์ดํŽ™ํŠธ ์ ์šฉ
86
+ processed = board(audio, sample_rate)
87
+
88
+ # Wet/Dry ๋ฏน์Šค ์ ์šฉ
89
+ wet_mix = parameters.get("final_wet_mix", 0.5)
90
+ final_audio = (1 - wet_mix) * audio + wet_mix * processed
91
+
92
+ # ํด๋ฆฌํ•‘ ๋ฐฉ์ง€
93
+ final_audio = np.clip(final_audio, -1.0, 1.0)
94
+
95
+ # ์ถœ๋ ฅ ํŒŒ์ผ ์ €์žฅ
96
+ sf.write(output_path, final_audio, sample_rate)
97
+
98
+ print(f"[EffectChain] ์ฒ˜๋ฆฌ ์™„๋ฃŒ: {output_path}")
99
+
100
+ def _build_pedalboard(
101
+ self,
102
+ params: Dict[str, float],
103
+ sample_rate: int
104
+ ) -> Pedalboard:
105
+ """
106
+ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ๋ถ€ํ„ฐ pedalboard ์ดํŽ™ํŠธ ์ฒด์ธ ๊ตฌ์„ฑ
107
+ """
108
+ effects = []
109
+
110
+ # === EQ Section ===
111
+
112
+ # Low Shelf EQ
113
+ if params.get("eq_lowshelf_gain", 0) != 0:
114
+ effects.append(
115
+ LowShelfFilter(
116
+ cutoff_frequency_hz=params.get("eq_lowshelf_freq", 200),
117
+ gain_db=params.get("eq_lowshelf_gain", 0),
118
+ q=0.707
119
+ )
120
+ )
121
+
122
+ # High Shelf EQ
123
+ if params.get("eq_highshelf_gain", 0) != 0:
124
+ effects.append(
125
+ HighShelfFilter(
126
+ cutoff_frequency_hz=params.get("eq_highshelf_freq", 8000),
127
+ gain_db=params.get("eq_highshelf_gain", 0),
128
+ q=0.707
129
+ )
130
+ )
131
+
132
+ # Peak EQ 1
133
+ if params.get("eq_peak1_gain", 0) != 0:
134
+ effects.append(
135
+ PeakFilter(
136
+ cutoff_frequency_hz=params.get("eq_peak1_freq", 1000),
137
+ gain_db=params.get("eq_peak1_gain", 0),
138
+ q=params.get("eq_peak1_q", 1.0)
139
+ )
140
+ )
141
+
142
+ # Peak EQ 2
143
+ if params.get("eq_peak2_gain", 0) != 0:
144
+ effects.append(
145
+ PeakFilter(
146
+ cutoff_frequency_hz=params.get("eq_peak2_freq", 3000),
147
+ gain_db=params.get("eq_peak2_gain", 0),
148
+ q=params.get("eq_peak2_q", 1.0)
149
+ )
150
+ )
151
+
152
+ # === Dynamics Section ===
153
+
154
+ # Compressor
155
+ threshold = params.get("compressor_threshold", -24)
156
+ ratio = params.get("compressor_ratio", 4.0)
157
+ if ratio > 1.0:
158
+ effects.append(
159
+ Compressor(
160
+ threshold_db=threshold,
161
+ ratio=ratio,
162
+ attack_ms=params.get("compressor_attack", 5),
163
+ release_ms=params.get("compressor_release", 50)
164
+ )
165
+ )
166
+
167
+ # Makeup Gain
168
+ makeup = params.get("compressor_makeup", 0)
169
+ if makeup != 0:
170
+ effects.append(Gain(gain_db=makeup))
171
+
172
+ # === Distortion Section ===
173
+
174
+ distortion_amount = params.get("distortion_amount", 0)
175
+ if distortion_amount > 0:
176
+ # pedalboard์˜ Distortion์€ 0-100 ๋ฒ”์œ„
177
+ effects.append(
178
+ Distortion(drive_db=distortion_amount * 40) # 0-1 -> 0-40dB
179
+ )
180
+
181
+ # Distortion ํ›„ ํ†ค ์กฐ์ ˆ (Tone = LPF)
182
+ tone = params.get("distortion_tone", 0.5)
183
+ lpf_freq = 2000 + tone * 10000 # 2kHz ~ 12kHz
184
+ effects.append(
185
+ LowpassFilter(cutoff_frequency_hz=lpf_freq)
186
+ )
187
+
188
+ # === Time-based Effects Section ===
189
+
190
+ # Delay
191
+ delay_mix = params.get("delay_mix", 0)
192
+ if delay_mix > 0:
193
+ delay_time_ms = params.get("delay_time", 250)
194
+ effects.append(
195
+ Delay(
196
+ delay_seconds=delay_time_ms / 1000,
197
+ feedback=params.get("delay_feedback", 0.3),
198
+ mix=delay_mix
199
+ )
200
+ )
201
+
202
+ # Reverb
203
+ reverb_wet = params.get("reverb_wet_dry", 0)
204
+ if reverb_wet > 0:
205
+ effects.append(
206
+ Reverb(
207
+ room_size=params.get("reverb_room_size", 0.5),
208
+ damping=params.get("reverb_damping", 0.5),
209
+ wet_level=reverb_wet,
210
+ dry_level=1 - reverb_wet,
211
+ width=1.0
212
+ )
213
+ )
214
+
215
+ # === Output Section ===
216
+
217
+ # Limiter (ํด๋ฆฌํ•‘ ๋ฐฉ์ง€)
218
+ effects.append(
219
+ Limiter(
220
+ threshold_db=-1.0,
221
+ release_ms=100
222
+ )
223
+ )
224
+
225
+ return Pedalboard(effects)
226
+
227
+ def process_realtime(
228
+ self,
229
+ audio_chunk: np.ndarray,
230
+ sample_rate: int,
231
+ parameters: Dict[str, float]
232
+ ) -> np.ndarray:
233
+ """
234
+ ์‹ค์‹œ๊ฐ„ ์˜ค๋””์˜ค ์ฒญํฌ ์ฒ˜๋ฆฌ (์ŠคํŠธ๋ฆฌ๋ฐ์šฉ)
235
+
236
+ Args:
237
+ audio_chunk: ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ ๋ฐฐ์—ด
238
+ sample_rate: ์ƒ˜ํ”Œ๋ ˆ์ดํŠธ
239
+ parameters: ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ
240
+
241
+ Returns:
242
+ ์ฒ˜๋ฆฌ๋œ ์˜ค๋””์˜ค ์ฒญํฌ
243
+ """
244
+ if len(audio_chunk.shape) == 1:
245
+ audio_chunk = np.column_stack([audio_chunk, audio_chunk])
246
+
247
+ audio_chunk = audio_chunk.astype(np.float32)
248
+
249
+ board = self._build_pedalboard(parameters, sample_rate)
250
+ processed = board(audio_chunk, sample_rate)
251
+
252
+ wet_mix = parameters.get("final_wet_mix", 0.5)
253
+ final = (1 - wet_mix) * audio_chunk + wet_mix * processed
254
+
255
+ return np.clip(final, -1.0, 1.0)
main.py ADDED
@@ -0,0 +1,275 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ MagicPath AI Vocal Effects Server - DiffVox LLM ํ†ตํ•ฉ ๋ฒ„์ „
3
+ =========================================================
4
+ Dry ๋ณด์ปฌ ํŒŒ์ผ์„ ๋ฐ›์•„์„œ ํ•™์Šต๋œ AI๊ฐ€ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๊ณ ,
5
+ ์‹ค์ œ๋กœ ์ดํŽ™ํŠธ๋ฅผ ์ ์šฉํ•œ ์˜ค๋””์˜ค๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ์„œ๋ฒ„
6
+ """
7
+
8
+ from fastapi import FastAPI, UploadFile, File, Form, HTTPException
9
+ from fastapi.middleware.cors import CORSMiddleware
10
+ from fastapi.responses import FileResponse, JSONResponse
11
+ import tempfile
12
+ import os
13
+ import uuid
14
+ from pathlib import Path
15
+
16
+ # ๋‚ด๋ถ€ ๋ชจ๋“ˆ
17
+ from models.ai_effector import AIEffector
18
+ from audio_processing.effect_chain import EffectChain
19
+
20
+ # ============================================
21
+ # ์„ค์ •
22
+ # ============================================
23
+
24
+ # ํ•™์Šต๋œ ๋ชจ๋ธ ๊ฒฝ๋กœ (Hugging Face ๋ ˆํฌ ๋˜๋Š” ๋กœ์ปฌ ๊ฒฝ๋กœ)
25
+ MODEL_PATH = os.environ.get("DIFFVOX_MODEL_PATH", "heybaeheef/KU_SW_Academy")
26
+ BASE_MODEL_NAME = os.environ.get("BASE_MODEL_NAME", "Qwen/Qwen3-8B")
27
+ AUDIO_FEATURE_DIM = int(os.environ.get("AUDIO_FEATURE_DIM", "64"))
28
+ USE_HUGGINGFACE = os.environ.get("USE_HUGGINGFACE", "true").lower() == "true"
29
+
30
+ # ============================================
31
+ # FastAPI ์•ฑ ์ดˆ๊ธฐํ™”
32
+ # ============================================
33
+
34
+ app = FastAPI(
35
+ title="MagicPath AI Vocal Effects",
36
+ description="AI-powered vocal effect processing server (DiffVox LLM ํ†ตํ•ฉ)",
37
+ version="2.0.0"
38
+ )
39
+
40
+ # CORS ์„ค์ •
41
+ app.add_middleware(
42
+ CORSMiddleware,
43
+ allow_origins=["*"], # ๋ฐฐํฌ ์‹œ ํŠน์ • ๋„๋ฉ”์ธ์œผ๋กœ ์ œํ•œ ๊ถŒ์žฅ
44
+ allow_credentials=True,
45
+ allow_methods=["*"],
46
+ allow_headers=["*"],
47
+ )
48
+
49
+ # ์ „์—ญ ๊ฐ์ฒด ์ดˆ๊ธฐํ™”
50
+ print("=" * 60)
51
+ print("MagicPath AI Vocal Effects Server v2.0")
52
+ print("=" * 60)
53
+ print(f"Model Path: {MODEL_PATH}")
54
+ print(f"Base Model: {BASE_MODEL_NAME}")
55
+ print(f"Audio Feature Dim: {AUDIO_FEATURE_DIM}")
56
+ print(f"Use Hugging Face: {USE_HUGGINGFACE}")
57
+ print("=" * 60)
58
+
59
+ ai_effector = AIEffector(
60
+ model_path=MODEL_PATH,
61
+ base_model_name=BASE_MODEL_NAME,
62
+ audio_feature_dim=AUDIO_FEATURE_DIM,
63
+ use_huggingface=USE_HUGGINGFACE
64
+ )
65
+ effect_chain = EffectChain()
66
+
67
+ # ์ž„์‹œ ํŒŒ์ผ ์ €์žฅ ๊ฒฝ๋กœ
68
+ TEMP_DIR = Path(tempfile.gettempdir()) / "magicpath"
69
+ TEMP_DIR.mkdir(exist_ok=True)
70
+
71
+
72
+ # ============================================
73
+ # API ์—”๋“œํฌ์ธํŠธ
74
+ # ============================================
75
+
76
+ @app.get("/")
77
+ async def root():
78
+ """์„œ๋ฒ„ ์ •๋ณด"""
79
+ return {
80
+ "status": "running",
81
+ "message": "MagicPath AI Vocal Effects Server v2.0 (DiffVox LLM)",
82
+ "ai_model_loaded": ai_effector.is_loaded(),
83
+ "endpoints": {
84
+ "POST /process": "์˜ค๋””์˜ค ํŒŒ์ผ ์ฒ˜๋ฆฌ ํ›„ ๋ฐ˜ํ™˜",
85
+ "POST /predict": "ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์˜ˆ์ธก (JSON)",
86
+ "GET /health": "์„œ๋ฒ„ ์ƒํƒœ ํ™•์ธ"
87
+ }
88
+ }
89
+
90
+
91
+ @app.get("/health")
92
+ async def health_check():
93
+ """์„œ๋ฒ„ ๋ฐ ๋ชจ๋ธ ์ƒํƒœ ํ™•์ธ"""
94
+ return {
95
+ "status": "healthy",
96
+ "ai_model_loaded": ai_effector.is_loaded(),
97
+ "supported_effects": effect_chain.get_available_effects(),
98
+ "model_path": MODEL_PATH,
99
+ "base_model": BASE_MODEL_NAME
100
+ }
101
+
102
+
103
+ @app.post("/predict")
104
+ async def predict_parameters(
105
+ audio: UploadFile = File(..., description="Dry ๋ณด์ปฌ ์˜ค๋””์˜ค ํŒŒ์ผ"),
106
+ prompt: str = Form("", description="ํ…์ŠคํŠธ ๋ช…๋ น (์˜ˆ: 'warm', 'bright')")
107
+ ):
108
+ """
109
+ AI ๋ชจ๋ธ๋กœ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก (์˜ค๋””์˜ค ์ฒ˜๋ฆฌ ์—†์ด)
110
+
111
+ - audio: wav, mp3 ๋“ฑ ์˜ค๋””์˜ค ํŒŒ์ผ
112
+ - prompt: ์›ํ•˜๋Š” ์‚ฌ์šด๋“œ ์„ค๋ช…
113
+
114
+ Returns: ์˜ˆ์ธก๋œ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ JSON
115
+ """
116
+ try:
117
+ # ์ž„์‹œ ํŒŒ์ผ๋กœ ์ €์žฅ
118
+ input_path = TEMP_DIR / f"{uuid.uuid4()}_{audio.filename}"
119
+ with open(input_path, "wb") as f:
120
+ content = await audio.read()
121
+ f.write(content)
122
+
123
+ # AI ๋ชจ๋ธ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก
124
+ parameters = ai_effector.predict(
125
+ audio_path=str(input_path),
126
+ text_prompt=prompt
127
+ )
128
+
129
+ # ์ž„์‹œ ํŒŒ์ผ ์‚ญ์ œ
130
+ os.remove(input_path)
131
+
132
+ return JSONResponse(content={
133
+ "status": "success",
134
+ "prompt": prompt,
135
+ "ai_model_used": ai_effector.is_loaded(),
136
+ "parameters": parameters
137
+ })
138
+
139
+ except Exception as e:
140
+ raise HTTPException(status_code=500, detail=str(e))
141
+
142
+
143
+ @app.post("/process")
144
+ async def process_audio(
145
+ audio: UploadFile = File(..., description="Dry ๋ณด์ปฌ ์˜ค๋””์˜ค ํŒŒ์ผ"),
146
+ prompt: str = Form("", description="ํ…์ŠคํŠธ ๋ช…๋ น (์˜ˆ: 'warm', 'bright')")
147
+ ):
148
+ """
149
+ AI๊ฐ€ ์˜ˆ์ธกํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์‹ค์ œ ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ
150
+
151
+ - audio: wav, mp3 ๋“ฑ ์˜ค๋””์˜ค ํŒŒ์ผ
152
+ - prompt: ์›ํ•˜๋Š” ์‚ฌ์šด๋“œ ์„ค๋ช…
153
+
154
+ Returns: ์ฒ˜๋ฆฌ๋œ ์˜ค๋””์˜ค ํŒŒ์ผ (wav)
155
+ """
156
+ input_path = None
157
+ output_path = None
158
+
159
+ try:
160
+ # ์ž„์‹œ ํŒŒ์ผ ๊ฒฝ๋กœ ์ƒ์„ฑ
161
+ file_id = str(uuid.uuid4())
162
+ input_path = TEMP_DIR / f"{file_id}_input_{audio.filename}"
163
+ output_path = TEMP_DIR / f"{file_id}_output.wav"
164
+
165
+ # ์ž…๋ ฅ ํŒŒ์ผ ์ €์žฅ
166
+ with open(input_path, "wb") as f:
167
+ content = await audio.read()
168
+ f.write(content)
169
+
170
+ print(f"[Process] ์ž…๋ ฅ ํŒŒ์ผ: {input_path}")
171
+ print(f"[Process] ํ”„๋กฌํ”„ํŠธ: {prompt}")
172
+
173
+ # Step 1: AI ๋ชจ๋ธ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก
174
+ parameters = ai_effector.predict(
175
+ audio_path=str(input_path),
176
+ text_prompt=prompt
177
+ )
178
+
179
+ print(f"[Process] ์˜ˆ์ธก๋œ ํŒŒ๋ผ๋ฏธํ„ฐ: {len(parameters)}๊ฐœ")
180
+
181
+ # Step 2: ์ดํŽ™ํ„ฐ ์ฒด์ธ์œผ๋กœ ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ
182
+ effect_chain.process(
183
+ input_path=str(input_path),
184
+ output_path=str(output_path),
185
+ parameters=parameters
186
+ )
187
+
188
+ # ์ž…๋ ฅ ํŒŒ์ผ ์‚ญ์ œ
189
+ os.remove(input_path)
190
+
191
+ # ์ฒ˜๋ฆฌ๋œ ์˜ค๋””์˜ค ๋ฐ˜ํ™˜
192
+ return FileResponse(
193
+ path=str(output_path),
194
+ media_type="audio/wav",
195
+ filename=f"processed_{audio.filename.rsplit('.', 1)[0]}.wav",
196
+ background=None
197
+ )
198
+
199
+ except Exception as e:
200
+ # ์—๋Ÿฌ ์‹œ ์ž„์‹œ ํŒŒ์ผ ์ •๋ฆฌ
201
+ if input_path and input_path.exists():
202
+ os.remove(input_path)
203
+ if output_path and output_path.exists():
204
+ os.remove(output_path)
205
+
206
+ print(f"[Process] โŒ ์—๋Ÿฌ: {e}")
207
+ import traceback
208
+ traceback.print_exc()
209
+ raise HTTPException(status_code=500, detail=str(e))
210
+
211
+
212
+ @app.post("/process_with_params")
213
+ async def process_audio_with_params(
214
+ audio: UploadFile = File(..., description="Dry ๋ณด์ปฌ ์˜ค๋””์˜ค ํŒŒ์ผ"),
215
+ prompt: str = Form("", description="ํ…์ŠคํŠธ ๋ช…๋ น")
216
+ ):
217
+ """
218
+ ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ + ์‚ฌ์šฉ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋„ ํ•จ๊ป˜ ๋ฐ˜ํ™˜
219
+
220
+ Returns: JSON (์ฒ˜๋ฆฌ๋œ ์˜ค๋””์˜ค URL + ํŒŒ๋ผ๋ฏธํ„ฐ)
221
+ """
222
+ input_path = None
223
+ output_path = None
224
+
225
+ try:
226
+ file_id = str(uuid.uuid4())
227
+ input_path = TEMP_DIR / f"{file_id}_input_{audio.filename}"
228
+ output_path = TEMP_DIR / f"{file_id}_output.wav"
229
+
230
+ with open(input_path, "wb") as f:
231
+ content = await audio.read()
232
+ f.write(content)
233
+
234
+ # AI ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก
235
+ parameters = ai_effector.predict(
236
+ audio_path=str(input_path),
237
+ text_prompt=prompt
238
+ )
239
+
240
+ # ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ
241
+ effect_chain.process(
242
+ input_path=str(input_path),
243
+ output_path=str(output_path),
244
+ parameters=parameters
245
+ )
246
+
247
+ os.remove(input_path)
248
+
249
+ # Base64 ์ธ์ฝ”๋”ฉ์œผ๋กœ ์˜ค๋””์˜ค ๋ฐ˜ํ™˜ (๋˜๋Š” URL)
250
+ import base64
251
+ with open(output_path, "rb") as f:
252
+ audio_base64 = base64.b64encode(f.read()).decode('utf-8')
253
+
254
+ os.remove(output_path)
255
+
256
+ return JSONResponse(content={
257
+ "status": "success",
258
+ "prompt": prompt,
259
+ "ai_model_used": ai_effector.is_loaded(),
260
+ "parameters": parameters,
261
+ "audio_base64": audio_base64,
262
+ "audio_format": "wav"
263
+ })
264
+
265
+ except Exception as e:
266
+ if input_path and input_path.exists():
267
+ os.remove(input_path)
268
+ if output_path and output_path.exists():
269
+ os.remove(output_path)
270
+ raise HTTPException(status_code=500, detail=str(e))
271
+
272
+
273
+ if __name__ == "__main__":
274
+ import uvicorn
275
+ uvicorn.run(app, host="0.0.0.0", port=8000)
models/__init__.py ADDED
File without changes
models/ai_effector.py ADDED
@@ -0,0 +1,404 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AI Effector Model - DiffVox LLM ํ†ตํ•ฉ ๋ฒ„์ „
3
+ ==========================================
4
+ CLAP ์ธ์ฝ”๋” + ํ•™์Šต๋œ LLM์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค์—์„œ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์˜ˆ์ธก
5
+
6
+ DiffVox LLM ํŒŒ๋ผ๋ฏธํ„ฐ โ†’ MagicPath ์›น ํŒŒ๋ผ๋ฏธํ„ฐ ์ž๋™ ๋ณ€ํ™˜
7
+ """
8
+
9
+ import json
10
+ import re
11
+ import os
12
+ from pathlib import Path
13
+ from typing import Dict, Any, Optional
14
+ import torch
15
+
16
+ # AI ๋ชจ๋ธ ๊ด€๋ จ import (์„ค์น˜ ํ•„์š”)
17
+ try:
18
+ from transformers import AutoModelForCausalLM, AutoTokenizer
19
+ from peft import PeftModel
20
+ TRANSFORMERS_AVAILABLE = True
21
+ except ImportError:
22
+ TRANSFORMERS_AVAILABLE = False
23
+ print("[AIEffector] transformers/peft ๋ฏธ์„ค์น˜ - ํ”„๋ฆฌ์…‹ ๋ชจ๋“œ๋กœ ๋™์ž‘")
24
+
25
+ # CLAP ์ธ์ฝ”๋” (๋ณ„๋„ ํŒŒ์ผ)
26
+ try:
27
+ from models.audio_encoder import AudioEncoder
28
+ AUDIO_ENCODER_AVAILABLE = True
29
+ except ImportError:
30
+ AUDIO_ENCODER_AVAILABLE = False
31
+ print("[AIEffector] AudioEncoder ๋ฏธ์„ค์น˜ - ํ”„๋ฆฌ์…‹ ๋ชจ๋“œ๋กœ ๋™์ž‘")
32
+
33
+
34
+ class ParameterMapper:
35
+ """DiffVox LLM ํŒŒ๋ผ๋ฏธํ„ฐ โ†” MagicPath ์›น ํŒŒ๋ผ๋ฏธํ„ฐ ๋ณ€ํ™˜"""
36
+
37
+ # DiffVox LLM โ†’ MagicPath ์›น ๋งคํ•‘
38
+ DIFFVOX_TO_WEB = {
39
+ # EQ Low Shelf
40
+ "eq_lowshelf.params.gain": "eq_lowshelf_gain",
41
+ "eq_lowshelf.params.parametrizations.freq.original": "eq_lowshelf_freq",
42
+ # EQ High Shelf
43
+ "eq_highshelf.params.gain": "eq_highshelf_gain",
44
+ "eq_highshelf.params.parametrizations.freq.original": "eq_highshelf_freq",
45
+ # EQ Peak 1
46
+ "eq_peak1.params.gain": "eq_peak1_gain",
47
+ "eq_peak1.params.parametrizations.freq.original": "eq_peak1_freq",
48
+ "eq_peak1.params.parametrizations.Q.original": "eq_peak1_q",
49
+ # EQ Peak 2
50
+ "eq_peak2.params.gain": "eq_peak2_gain",
51
+ "eq_peak2.params.parametrizations.freq.original": "eq_peak2_freq",
52
+ "eq_peak2.params.parametrizations.Q.original": "eq_peak2_q",
53
+ # Delay
54
+ "delay.delay_time": "delay_time",
55
+ "delay.feedback": "delay_feedback",
56
+ "delay.mix": "delay_mix",
57
+ # Distortion
58
+ "distortion_amount": "distortion_amount",
59
+ # Master
60
+ "final_wet_mix": "final_wet_mix",
61
+ }
62
+
63
+ # ์—ญ๋ฐฉํ–ฅ ๋งคํ•‘
64
+ WEB_TO_DIFFVOX = {v: k for k, v in DIFFVOX_TO_WEB.items()}
65
+
66
+ # ๊ฐ’ ๋ณ€ํ™˜ ๊ทœ์น™ (์ •๊ทœํ™”๋œ ๊ฐ’ โ†’ ์‹ค์ œ ๊ฐ’)
67
+ VALUE_TRANSFORMS = {
68
+ # EQ gain: -1~1 โ†’ -12~12 dB
69
+ "eq_lowshelf_gain": lambda x: x * 12,
70
+ "eq_highshelf_gain": lambda x: x * 12,
71
+ "eq_peak1_gain": lambda x: x * 12,
72
+ "eq_peak2_gain": lambda x: x * 12,
73
+ # EQ freq: ์ •๊ทœํ™”๋œ ๊ฐ’ โ†’ Hz (๋กœ๊ทธ ์Šค์ผ€์ผ ์—ญ๋ณ€ํ™˜ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Œ)
74
+ "eq_lowshelf_freq": lambda x: 20 * (20000/20) ** ((x + 1) / 2), # -1~1 โ†’ 20~20000
75
+ "eq_highshelf_freq": lambda x: 20 * (20000/20) ** ((x + 1) / 2),
76
+ "eq_peak1_freq": lambda x: 20 * (20000/20) ** ((x + 1) / 2),
77
+ "eq_peak2_freq": lambda x: 20 * (20000/20) ** ((x + 1) / 2),
78
+ # Q: -1~1 โ†’ 0.1~10
79
+ "eq_peak1_q": lambda x: 0.1 * (10/0.1) ** ((x + 1) / 2),
80
+ "eq_peak2_q": lambda x: 0.1 * (10/0.1) ** ((x + 1) / 2),
81
+ # Delay time: -1~1 โ†’ 0~1000 ms
82
+ "delay_time": lambda x: (x + 1) / 2 * 1000,
83
+ # Delay feedback: -1~1 โ†’ 0~1
84
+ "delay_feedback": lambda x: (x + 1) / 2,
85
+ # Delay mix: -1~1 โ†’ 0~1
86
+ "delay_mix": lambda x: (x + 1) / 2,
87
+ # Distortion: -1~1 โ†’ 0~1
88
+ "distortion_amount": lambda x: (x + 1) / 2,
89
+ # Wet mix: -1~1 โ†’ 0~1
90
+ "final_wet_mix": lambda x: (x + 1) / 2,
91
+ }
92
+
93
+ @classmethod
94
+ def diffvox_to_web(cls, diffvox_params: Dict[str, float]) -> Dict[str, float]:
95
+ """DiffVox LLM ์ถœ๋ ฅ โ†’ MagicPath ์›น ํŒŒ๋ผ๋ฏธํ„ฐ"""
96
+ web_params = {}
97
+
98
+ for diffvox_key, value in diffvox_params.items():
99
+ # ํ‚ค ๋ณ€ํ™˜
100
+ if diffvox_key in cls.DIFFVOX_TO_WEB:
101
+ web_key = cls.DIFFVOX_TO_WEB[diffvox_key]
102
+ else:
103
+ # ๋งคํ•‘์— ์—†์œผ๋ฉด ์Šคํ‚ต
104
+ continue
105
+
106
+ # ๊ฐ’ ๋ณ€ํ™˜
107
+ if web_key in cls.VALUE_TRANSFORMS:
108
+ try:
109
+ web_params[web_key] = cls.VALUE_TRANSFORMS[web_key](value)
110
+ except:
111
+ web_params[web_key] = value
112
+ else:
113
+ web_params[web_key] = value
114
+
115
+ return web_params
116
+
117
+
118
+ class ParameterParser:
119
+ """LLM ์ถœ๋ ฅ์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ JSON ์ถ”์ถœ"""
120
+
121
+ @staticmethod
122
+ def parse(llm_output: str) -> Optional[Dict]:
123
+ """LLM ์ถœ๋ ฅ์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ๋”•์…”๋„ˆ๋ฆฌ ์ถ”์ถœ"""
124
+
125
+ # ๋ฐฉ๋ฒ• 1: JSON ๋ธ”๋ก ์ฐพ๊ธฐ
126
+ json_patterns = [
127
+ r'\{[^{}]*\}',
128
+ r'\{(?:[^{}]|\{[^{}]*\})*\}',
129
+ ]
130
+
131
+ for pattern in json_patterns:
132
+ matches = re.findall(pattern, llm_output, re.DOTALL)
133
+ for match in matches:
134
+ try:
135
+ params = json.loads(match)
136
+ if isinstance(params, dict) and len(params) > 0:
137
+ return params
138
+ except json.JSONDecodeError:
139
+ continue
140
+
141
+ # ๋ฐฉ๋ฒ• 2: key: value ํŒจํ„ด ํŒŒ์‹ฑ
142
+ param_pattern = r'"([^"]+)":\s*([-\d.]+)'
143
+ matches = re.findall(param_pattern, llm_output)
144
+ if matches:
145
+ params = {}
146
+ for key, value in matches:
147
+ try:
148
+ params[key] = float(value)
149
+ except ValueError:
150
+ params[key] = value
151
+ if params:
152
+ return params
153
+
154
+ return None
155
+
156
+
157
+ class AIEffector:
158
+ """AI ๊ธฐ๋ฐ˜ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก ๋ชจ๋ธ - DiffVox LLM ํ†ตํ•ฉ"""
159
+
160
+ # ๊ธฐ๋ณธ ํŒŒ๋ผ๋ฏธํ„ฐ
161
+ DEFAULT_PARAMS = {
162
+ "eq_lowshelf_gain": 0.0,
163
+ "eq_lowshelf_freq": 200,
164
+ "eq_highshelf_gain": 0.0,
165
+ "eq_highshelf_freq": 8000,
166
+ "eq_peak1_gain": 0.0,
167
+ "eq_peak1_freq": 1000,
168
+ "eq_peak1_q": 1.0,
169
+ "eq_peak2_gain": 0.0,
170
+ "eq_peak2_freq": 3000,
171
+ "eq_peak2_q": 1.0,
172
+ "compressor_threshold": -24,
173
+ "compressor_ratio": 4.0,
174
+ "compressor_attack": 5,
175
+ "compressor_release": 50,
176
+ "compressor_makeup": 0.0,
177
+ "distortion_amount": 0.0,
178
+ "distortion_tone": 0.5,
179
+ "delay_time": 250,
180
+ "delay_feedback": 0.3,
181
+ "delay_mix": 0.0,
182
+ "reverb_room_size": 0.5,
183
+ "reverb_damping": 0.5,
184
+ "reverb_wet_dry": 0.0,
185
+ "final_wet_mix": 0.5
186
+ }
187
+
188
+ # ํ”„๋ฆฌ์…‹ (fallback์šฉ)
189
+ PRESETS = {
190
+ "warm": {
191
+ "eq_lowshelf_gain": 5.5,
192
+ "eq_lowshelf_freq": 200,
193
+ "eq_highshelf_gain": -1.5,
194
+ "eq_highshelf_freq": 8000,
195
+ "eq_peak1_gain": 2.0,
196
+ "eq_peak1_freq": 400,
197
+ "eq_peak1_q": 1.0,
198
+ "compressor_threshold": -18,
199
+ "compressor_ratio": 3.0,
200
+ "distortion_amount": 0.05,
201
+ "reverb_room_size": 0.4,
202
+ "reverb_wet_dry": 0.15,
203
+ "final_wet_mix": 0.5
204
+ },
205
+ "bright": {
206
+ "eq_lowshelf_gain": -2.0,
207
+ "eq_lowshelf_freq": 150,
208
+ "eq_highshelf_gain": 4.0,
209
+ "eq_highshelf_freq": 6000,
210
+ "eq_peak1_gain": 1.0,
211
+ "eq_peak1_freq": 3000,
212
+ "compressor_threshold": -20,
213
+ "compressor_ratio": 6.0,
214
+ "reverb_room_size": 0.3,
215
+ "reverb_wet_dry": 0.1,
216
+ "final_wet_mix": 0.5
217
+ },
218
+ }
219
+
220
+ def __init__(
221
+ self,
222
+ model_path: Optional[str] = None,
223
+ base_model_name: str = "Qwen/Qwen3-8B",
224
+ audio_feature_dim: int = 64,
225
+ use_huggingface: bool = True
226
+ ):
227
+ """
228
+ AI ๋ชจ๋ธ ์ดˆ๊ธฐํ™”
229
+
230
+ Args:
231
+ model_path: ํ•™์Šต๋œ LoRA ๋ชจ๋ธ ๊ฒฝ๋กœ (๋กœ์ปฌ ๋˜๋Š” Hugging Face ๋ ˆํฌ)
232
+ base_model_name: ๋ฒ ์ด์Šค LLM ๋ชจ๋ธ ์ด๋ฆ„
233
+ audio_feature_dim: ์˜ค๋””์˜ค ํŠน์ง• ์ฐจ์› (CLAP ์ถœ๋ ฅ)
234
+ use_huggingface: True๋ฉด model_path๋ฅผ Hugging Face ๋ ˆํฌ๋กœ ๊ฐ„์ฃผ
235
+ """
236
+ self.model = None
237
+ self.tokenizer = None
238
+ self.audio_encoder = None
239
+ self.model_loaded = False
240
+ self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
241
+
242
+ self.base_model_name = base_model_name
243
+ self.audio_feature_dim = audio_feature_dim
244
+ self.use_huggingface = use_huggingface
245
+
246
+ if model_path:
247
+ self._load_model(model_path)
248
+
249
+ def _load_model(self, model_path: str):
250
+ """ํ•™์Šต๋œ LoRA ๋ชจ๋ธ ๋กœ๋“œ (๋กœ์ปฌ ๋˜๋Š” Hugging Face)"""
251
+ if not TRANSFORMERS_AVAILABLE:
252
+ print("[AIEffector] transformers/peft ๋ฏธ์„ค์น˜")
253
+ return
254
+
255
+ # ๋กœ์ปฌ ๊ฒฝ๋กœ์ธ์ง€ Hugging Face ๋ ˆํฌ์ธ์ง€ ํ™•์ธ
256
+ is_local = os.path.exists(model_path)
257
+
258
+ if not is_local and not self.use_huggingface:
259
+ print(f"[AIEffector] ๋กœ์ปฌ ๋ชจ๋ธ ๊ฒฝ๋กœ ์—†์Œ: {model_path}")
260
+ return
261
+
262
+ try:
263
+ if self.use_huggingface and not is_local:
264
+ print(f"[AIEffector] Hugging Face์—์„œ ๋ชจ๋ธ ๋กœ๋”ฉ: {model_path}")
265
+ else:
266
+ print(f"[AIEffector] ๋กœ์ปฌ ๋ชจ๋ธ ๋กœ๋”ฉ: {model_path}")
267
+
268
+ # ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
269
+ self.tokenizer = AutoTokenizer.from_pretrained(
270
+ self.base_model_name,
271
+ trust_remote_code=True
272
+ )
273
+ if self.tokenizer.pad_token is None:
274
+ self.tokenizer.pad_token = self.tokenizer.eos_token
275
+
276
+ # ๋ฒ ์ด์Šค ๋ชจ๋ธ ๋กœ๋“œ
277
+ base_model = AutoModelForCausalLM.from_pretrained(
278
+ self.base_model_name,
279
+ torch_dtype=torch.bfloat16,
280
+ device_map="auto",
281
+ trust_remote_code=True,
282
+ )
283
+
284
+ # LoRA ์–ด๋Œ‘ํ„ฐ ์ ์šฉ (Hugging Face ๋ ˆํฌ ๋˜๋Š” ๋กœ์ปฌ ๊ฒฝ๋กœ)
285
+ self.model = PeftModel.from_pretrained(
286
+ base_model,
287
+ model_path, # Hugging Face ๋ ˆํฌ ์ด๋ฆ„ ๋˜๋Š” ๋กœ์ปฌ ๊ฒฝ๋กœ
288
+ is_trainable=False
289
+ )
290
+ self.model.eval()
291
+
292
+ # ์˜ค๋””์˜ค ์ธ์ฝ”๋” ๋กœ๋“œ
293
+ if AUDIO_ENCODER_AVAILABLE:
294
+ self.audio_encoder = AudioEncoder(
295
+ output_dim=self.audio_feature_dim,
296
+ reduction_method="pool"
297
+ )
298
+ print("[AIEffector] AudioEncoder ๋กœ๋“œ ์™„๋ฃŒ")
299
+
300
+ self.model_loaded = True
301
+ print("[AIEffector] โœ… ๋ชจ๋ธ ๋กœ๋“œ ์™„๋ฃŒ")
302
+
303
+ except Exception as e:
304
+ print(f"[AIEffector] โŒ ๋ชจ๋ธ ๋กœ๋“œ ์‹คํŒจ: {e}")
305
+ import traceback
306
+ traceback.print_exc()
307
+ self.model_loaded = False
308
+
309
+ def is_loaded(self) -> bool:
310
+ """AI ๋ชจ๋ธ ๋กœ๋“œ ์ƒํƒœ ํ™•์ธ"""
311
+ return self.model_loaded
312
+
313
+ def predict(self, audio_path: str, text_prompt: str) -> Dict[str, float]:
314
+ """
315
+ ์˜ค๋””์˜ค์™€ ํ…์ŠคํŠธ๋กœ๋ถ€ํ„ฐ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก
316
+
317
+ Args:
318
+ audio_path: ์ž…๋ ฅ ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฒฝ๋กœ
319
+ text_prompt: ์‚ฌ์šฉ์ž ํ…์ŠคํŠธ ๋ช…๋ น
320
+
321
+ Returns:
322
+ MagicPath ์›น ํ˜•์‹์˜ ์ดํŽ™ํ„ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ๋”•์…”๋„ˆ๋ฆฌ
323
+ """
324
+ if self.model_loaded and self.audio_encoder:
325
+ return self._predict_with_model(audio_path, text_prompt)
326
+ else:
327
+ return self._predict_with_preset(text_prompt)
328
+
329
+ def _predict_with_model(self, audio_path: str, text_prompt: str) -> Dict[str, float]:
330
+ """ํ•™์Šต๋œ DiffVox LLM์œผ๋กœ ์ถ”๋ก """
331
+ try:
332
+ # 1. ์˜ค๋””์˜ค ํŠน์ง• ์ถ”์ถœ
333
+ audio_features = self.audio_encoder.get_audio_features(audio_path)
334
+ if not audio_features:
335
+ print("[AIEffector] ์˜ค๋””์˜ค ํŠน์ง• ์ถ”์ถœ ์‹คํŒจ, ํ”„๋ฆฌ์…‹ ์‚ฌ์šฉ")
336
+ return self._predict_with_preset(text_prompt)
337
+
338
+ # 2. ํ”„๋กฌํ”„ํŠธ ๊ตฌ์„ฑ (train_model.py์™€ ๋™์ผํ•œ ํ˜•์‹)
339
+ audio_state_str = json.dumps(audio_features)
340
+ prompt = f"""Task: Convert text to audio parameters.
341
+ Audio: {audio_state_str}
342
+ Text: {text_prompt}
343
+ Parameters:"""
344
+
345
+ # 3. LLM ์ถ”๋ก 
346
+ inputs = self.tokenizer(
347
+ prompt,
348
+ return_tensors="pt",
349
+ truncation=True,
350
+ max_length=1500
351
+ ).to(self.device)
352
+
353
+ with torch.no_grad():
354
+ outputs = self.model.generate(
355
+ **inputs,
356
+ max_new_tokens=500,
357
+ temperature=0.1,
358
+ do_sample=False,
359
+ pad_token_id=self.tokenizer.eos_token_id,
360
+ )
361
+
362
+ generated_text = self.tokenizer.decode(
363
+ outputs[0][inputs['input_ids'].shape[1]:],
364
+ skip_special_tokens=True
365
+ ).strip()
366
+
367
+ print(f"[AIEffector] LLM ์ถœ๋ ฅ: {generated_text[:200]}...")
368
+
369
+ # 4. ํŒŒ๋ผ๋ฏธํ„ฐ ํŒŒ์‹ฑ
370
+ diffvox_params = ParameterParser.parse(generated_text)
371
+
372
+ if not diffvox_params:
373
+ print("[AIEffector] ํŒŒ๋ผ๋ฏธํ„ฐ ํŒŒ์‹ฑ ์‹คํŒจ, ํ”„๋ฆฌ์…‹ ์‚ฌ์šฉ")
374
+ return self._predict_with_preset(text_prompt)
375
+
376
+ # 5. DiffVox โ†’ Web ํŒŒ๋ผ๋ฏธํ„ฐ ๋ณ€ํ™˜
377
+ web_params = ParameterMapper.diffvox_to_web(diffvox_params)
378
+
379
+ # 6. ๊ธฐ๋ณธ๊ฐ’๊ณผ ๋ณ‘ํ•ฉ
380
+ result = self.DEFAULT_PARAMS.copy()
381
+ result.update(web_params)
382
+
383
+ print(f"[AIEffector] โœ… AI ํŒŒ๋ผ๋ฏธํ„ฐ ์ƒ์„ฑ ์™„๋ฃŒ: {len(web_params)}๊ฐœ ํŒŒ๋ผ๋ฏธํ„ฐ")
384
+ return result
385
+
386
+ except Exception as e:
387
+ print(f"[AIEffector] ์ถ”๋ก  ์—๋Ÿฌ: {e}")
388
+ import traceback
389
+ traceback.print_exc()
390
+ return self._predict_with_preset(text_prompt)
391
+
392
+ def _predict_with_preset(self, text_prompt: str) -> Dict[str, float]:
393
+ """ํ”„๋ฆฌ์…‹ ๊ธฐ๋ฐ˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ˜ํ™˜ (fallback)"""
394
+ prompt_lower = text_prompt.lower()
395
+
396
+ for preset_name, preset_params in self.PRESETS.items():
397
+ if preset_name in prompt_lower:
398
+ print(f"[AIEffector] ํ”„๋ฆฌ์…‹ ๋งค์นญ: '{preset_name}'")
399
+ result = self.DEFAULT_PARAMS.copy()
400
+ result.update(preset_params)
401
+ return result
402
+
403
+ print("[AIEffector] ํ”„๋ฆฌ์…‹ ๋งค์นญ ์‹คํŒจ, ๊ธฐ๋ณธ๊ฐ’ ๋ฐ˜ํ™˜")
404
+ return self.DEFAULT_PARAMS.copy()
models/audio_encoder.py ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio Encoder for MagicPath Server
3
+ ===================================
4
+ CLAP ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค ํŒŒ์ผ์—์„œ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ
5
+ DiffVox LLM๊ณผ ๋™์ผํ•œ ์ธ์ฝ”๋” ์‚ฌ์šฉ
6
+ """
7
+
8
+ import torch
9
+ import numpy as np
10
+ from typing import List, Optional
11
+ import warnings
12
+
13
+ warnings.filterwarnings("ignore")
14
+
15
+
16
+ class AudioEncoder:
17
+ """CLAP ๊ธฐ๋ฐ˜ ์˜ค๋””์˜ค ์ธ์ฝ”๋”"""
18
+
19
+ def __init__(
20
+ self,
21
+ output_dim: int = 64,
22
+ reduction_method: str = "pool",
23
+ model_name: str = "laion/larger_clap_general"
24
+ ):
25
+ """
26
+ ์˜ค๋””์˜ค ์ธ์ฝ”๋” ์ดˆ๊ธฐํ™”
27
+
28
+ Args:
29
+ output_dim: ์ถœ๋ ฅ ํŠน์ง• ์ฐจ์› (๊ธฐ๋ณธ 64)
30
+ reduction_method: ์ฐจ์› ์ถ•์†Œ ๋ฐฉ๋ฒ• ("pool", "pca", "linear")
31
+ model_name: CLAP ๋ชจ๋ธ ์ด๋ฆ„
32
+ """
33
+ self.output_dim = output_dim
34
+ self.reduction_method = reduction_method
35
+ self.model_name = model_name
36
+ self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
37
+
38
+ self.model = None
39
+ self.processor = None
40
+ self.projection = None
41
+
42
+ self._load_model()
43
+
44
+ def _load_model(self):
45
+ """CLAP ๋ชจ๋ธ ๋กœ๋“œ"""
46
+ try:
47
+ from transformers import ClapModel, ClapProcessor
48
+
49
+ print(f"[AudioEncoder] CLAP ๋ชจ๋ธ ๋กœ๋”ฉ ์ค‘: {self.model_name}")
50
+
51
+ self.processor = ClapProcessor.from_pretrained(self.model_name)
52
+ self.model = ClapModel.from_pretrained(self.model_name)
53
+ self.model = self.model.to(self.device)
54
+ self.model.eval()
55
+
56
+ # CLAP ์ถœ๋ ฅ ์ฐจ์› ํ™•์ธ (๋ณดํ†ต 512)
57
+ clap_dim = self.model.config.projection_dim
58
+ print(f"[AudioEncoder] CLAP ์ถœ๋ ฅ ์ฐจ์›: {clap_dim}")
59
+
60
+ # ์ฐจ์› ์ถ•์†Œ๋ฅผ ์œ„ํ•œ projection layer
61
+ if self.reduction_method == "linear" and clap_dim != self.output_dim:
62
+ self.projection = torch.nn.Linear(clap_dim, self.output_dim)
63
+ self.projection = self.projection.to(self.device)
64
+ print(f"[AudioEncoder] Linear projection: {clap_dim} โ†’ {self.output_dim}")
65
+
66
+ print("[AudioEncoder] โœ… ๋ชจ๋ธ ๋กœ๋“œ ์™„๋ฃŒ")
67
+
68
+ except ImportError:
69
+ print("[AudioEncoder] โŒ transformers ๋ฏธ์„ค์น˜")
70
+ print(" pip install transformers")
71
+ except Exception as e:
72
+ print(f"[AudioEncoder] โŒ ๋ชจ๋ธ ๋กœ๋“œ ์‹คํŒจ: {e}")
73
+
74
+ def get_audio_features(self, audio_path: str) -> List[float]:
75
+ """
76
+ ์˜ค๋””์˜ค ํŒŒ์ผ์—์„œ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ
77
+
78
+ Args:
79
+ audio_path: ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฒฝ๋กœ
80
+
81
+ Returns:
82
+ ํŠน์ง• ๋ฒกํ„ฐ (output_dim ์ฐจ์›)
83
+ """
84
+ if self.model is None:
85
+ print("[AudioEncoder] ๋ชจ๋ธ์ด ๋กœ๋“œ๋˜์ง€ ์•Š์Œ")
86
+ return []
87
+
88
+ try:
89
+ import librosa
90
+
91
+ # ์˜ค๋””์˜ค ๋กœ๋“œ
92
+ audio, sr = librosa.load(audio_path, sr=48000, mono=True)
93
+
94
+ # CLAP ์ž…๋ ฅ ์ค€๋น„
95
+ inputs = self.processor(
96
+ audios=audio,
97
+ sampling_rate=48000,
98
+ return_tensors="pt"
99
+ )
100
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
101
+
102
+ # ํŠน์ง• ์ถ”์ถœ
103
+ with torch.no_grad():
104
+ audio_features = self.model.get_audio_features(**inputs)
105
+
106
+ # CPU๋กœ ์ด๋™
107
+ features = audio_features.squeeze().cpu().numpy()
108
+
109
+ # ์ฐจ์› ์ถ•์†Œ
110
+ features = self._reduce_dimension(features)
111
+
112
+ return features.tolist()
113
+
114
+ except Exception as e:
115
+ print(f"[AudioEncoder] ํŠน์ง• ์ถ”์ถœ ์‹คํŒจ: {e}")
116
+ import traceback
117
+ traceback.print_exc()
118
+ return []
119
+
120
+ def _reduce_dimension(self, features: np.ndarray) -> np.ndarray:
121
+ """ํŠน์ง• ๋ฒกํ„ฐ ์ฐจ์› ์ถ•์†Œ"""
122
+ current_dim = len(features)
123
+
124
+ if current_dim == self.output_dim:
125
+ return features
126
+
127
+ if self.reduction_method == "pool":
128
+ # ํ‰๊ท  ํ’€๋ง์œผ๋กœ ์ฐจ์› ์ถ•์†Œ
129
+ if current_dim > self.output_dim:
130
+ pool_size = current_dim // self.output_dim
131
+ remainder = current_dim % self.output_dim
132
+
133
+ pooled = []
134
+ idx = 0
135
+ for i in range(self.output_dim):
136
+ size = pool_size + (1 if i < remainder else 0)
137
+ pooled.append(np.mean(features[idx:idx+size]))
138
+ idx += size
139
+
140
+ return np.array(pooled)
141
+ else:
142
+ # ์ฐจ์›์ด ์ž‘์œผ๋ฉด zero-padding
143
+ padded = np.zeros(self.output_dim)
144
+ padded[:current_dim] = features
145
+ return padded
146
+
147
+ elif self.reduction_method == "linear" and self.projection is not None:
148
+ # Linear projection
149
+ with torch.no_grad():
150
+ features_tensor = torch.tensor(features, dtype=torch.float32).to(self.device)
151
+ projected = self.projection(features_tensor)
152
+ return projected.cpu().numpy()
153
+
154
+ else:
155
+ # ๊ธฐ๋ณธ: ์•ž์—์„œ๋ถ€ํ„ฐ ์ž๋ฅด๊ธฐ
156
+ return features[:self.output_dim]
157
+
158
+ def get_text_features(self, text: str) -> List[float]:
159
+ """
160
+ ํ…์ŠคํŠธ์—์„œ ํŠน์ง• ๋ฒกํ„ฐ ์ถ”์ถœ (CLAP text encoder)
161
+
162
+ Args:
163
+ text: ์ž…๋ ฅ ํ…์ŠคํŠธ
164
+
165
+ Returns:
166
+ ํŠน์ง• ๋ฒกํ„ฐ
167
+ """
168
+ if self.model is None:
169
+ return []
170
+
171
+ try:
172
+ inputs = self.processor(
173
+ text=text,
174
+ return_tensors="pt",
175
+ padding=True
176
+ )
177
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
178
+
179
+ with torch.no_grad():
180
+ text_features = self.model.get_text_features(**inputs)
181
+
182
+ features = text_features.squeeze().cpu().numpy()
183
+ features = self._reduce_dimension(features)
184
+
185
+ return features.tolist()
186
+
187
+ except Exception as e:
188
+ print(f"[AudioEncoder] ํ…์ŠคํŠธ ํŠน์ง• ์ถ”์ถœ ์‹คํŒจ: {e}")
189
+ return []
requirements.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MagicPath Server - DiffVox LLM ํ†ตํ•ฉ ๋ฒ„์ „
2
+ # ==========================================
3
+
4
+ # ์›น ์„œ๋ฒ„
5
+ fastapi>=0.104.0
6
+ uvicorn>=0.24.0
7
+ python-multipart>=0.0.6
8
+
9
+ # ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ
10
+ soundfile>=0.12.0
11
+ pedalboard>=0.8.0
12
+ librosa>=0.10.0
13
+ numpy>=1.24.0
14
+
15
+ # AI ๋ชจ๋ธ
16
+ torch>=2.2.0
17
+ transformers>=4.36.0
18
+ peft>=0.7.0
19
+ huggingface_hub>=0.20.0
20
+ accelerate>=0.25.0