OnyxMunk's picture
Add LoRA training assets: scripts, docs (no binaries), ui, my_dataset
bc9c638

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

ACE-Step API ํด๋ผ์ด์–ธํŠธ ๋ฌธ์„œ

์–ธ์–ด / Language / ่ฏญ่จ€ / ่จ€่ชž: English | ํ•œ๊ตญ์–ด | ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž


์ด ์„œ๋น„์Šค๋Š” HTTP ๊ธฐ๋ฐ˜์˜ ๋น„๋™๊ธฐ ์Œ์•… ์ƒ์„ฑ API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ ์›Œํฌํ”Œ๋กœ์šฐ:

  1. POST /release_task๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ž‘์—…์„ ์ œ์ถœํ•˜๊ณ  task_id๋ฅผ ํš๋“ํ•ฉ๋‹ˆ๋‹ค.
  2. POST /query_result๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ž‘์—… ์ƒํƒœ๊ฐ€ 1(์„ฑ๊ณต) ๋˜๋Š” 2(์‹คํŒจ)๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ๋ฐฐ์น˜ ์ฟผ๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ฒฐ๊ณผ์— ๋ฐ˜ํ™˜๋œ GET /v1/audio?path=... URL์„ ํ†ตํ•ด ์˜ค๋””์˜ค ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

๋ชฉ์ฐจ


1. ์ธ์ฆ

API๋Š” ์„ ํƒ์ ์œผ๋กœ API ํ‚ค ์ธ์ฆ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ํ™œ์„ฑํ™”๋œ ๊ฒฝ์šฐ ์š”์ฒญ ์‹œ ์œ ํšจํ•œ ํ‚ค๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ธ์ฆ ๋ฐฉ๋ฒ•

๋‘ ๊ฐ€์ง€ ์ธ์ฆ ๋ฐฉ๋ฒ•์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:

๋ฐฉ๋ฒ• A: ์š”์ฒญ ๋ณธ๋ฌธ์˜ ai_token

{
  "ai_token": "your-api-key",
  "prompt": "upbeat pop song",
  ...
}

๋ฐฉ๋ฒ• B: Authorization ํ—ค๋”

curl -X POST http://localhost:8001/release_task \
  -H 'Authorization: Bearer your-api-key' \
  -H 'Content-Type: application/json' \
  -d '{"prompt": "upbeat pop song"}'

2. ์‘๋‹ต ํ˜•์‹

๋ชจ๋“  API ์‘๋‹ต์€ ํ†ตํ•ฉ ๋ž˜ํผ ํ˜•์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

{
  "data": { ... },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}
ํ•„๋“œ ํƒ€์ž… ์„ค๋ช…
data any ์‹ค์ œ ์‘๋‹ต ๋ฐ์ดํ„ฐ
code int ์ƒํƒœ ์ฝ”๋“œ (200=์„ฑ๊ณต)
error string ์—๋Ÿฌ ๋ฉ”์‹œ์ง€ (์„ฑ๊ณต ์‹œ null)
timestamp int ์‘๋‹ต ํƒ€์ž„์Šคํƒฌํ”„ (๋ฐ€๋ฆฌ์ดˆ)
extra any ์ถ”๊ฐ€ ์ •๋ณด (๋ณดํ†ต null)

3. ์ž‘์—… ์ƒํƒœ ์„ค๋ช…

์ž‘์—… ์ƒํƒœ(status)๋Š” ์ •์ˆ˜๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค:

์ƒํƒœ ์ฝ”๋“œ ์ƒํƒœ ์ด๋ฆ„ ์„ค๋ช…
0 ๋Œ€๊ธฐ ์ค‘/์‹คํ–‰ ์ค‘ ์ž‘์—…์ด ๋Œ€๊ธฐ์—ด์— ์žˆ๊ฑฐ๋‚˜ ์ง„ํ–‰ ์ค‘์ž„
1 ์„ฑ๊ณต ์ƒ์„ฑ์ด ์„ฑ๊ณต์ ์ด๋ฉฐ ๊ฒฐ๊ณผ๊ฐ€ ์ค€๋น„๋จ
2 ์‹คํŒจ ์ƒ์„ฑ ์‹คํŒจ

4. ์ƒ์„ฑ ์ž‘์—… ์ƒ์„ฑ

4.1 API ์ •์˜

  • URL: /release_task
  • Method: POST
  • Content-Type: application/json, multipart/form-data, ๋˜๋Š” application/x-www-form-urlencoded

4.2 ์š”์ฒญ ํŒŒ๋ผ๋ฏธํ„ฐ

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช…๋ช… ๊ทœ์น™

API๋Š” ๋Œ€๋ถ€๋ถ„์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•ด snake_case์™€ camelCase ๋ช…๋ช…์„ ๋ชจ๋‘ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ:

  • audio_duration / duration / audioDuration
  • key_scale / keyscale / keyScale
  • time_signature / timesignature / timeSignature
  • sample_query / sampleQuery / description / desc
  • use_format / useFormat / format

๋˜ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋Š” ์ค‘์ฒฉ๋œ ๊ฐ์ฒด(metas, metadata, ๋˜๋Š” user_metadata)๋กœ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• A: JSON ์š”์ฒญ (application/json)

ํ…์ŠคํŠธ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์ „๋‹ฌํ•˜๊ฑฐ๋‚˜ ์„œ๋ฒ„์— ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฒฝ๋กœ๋ฅผ ์ฐธ์กฐํ•  ๋•Œ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ ํŒŒ๋ผ๋ฏธํ„ฐ:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
prompt string "" ์Œ์•… ์„ค๋ช… ํ”„๋กฌํ”„ํŠธ (๋ณ„์นญ: caption)
lyrics string "" ๊ฐ€์‚ฌ ๋‚ด์šฉ
thinking bool false 5Hz LM์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ• ์ง€ ์—ฌ๋ถ€ (lm-dit ๋™์ž‘)
vocal_language string "en" ๊ฐ€์‚ฌ ์–ธ์–ด (en, zh, ja ๋“ฑ)
audio_format string "mp3" ์ถœ๋ ฅ ํ˜•์‹ (mp3, wav, flac)

์ƒ˜ํ”Œ/์„ค๋ช… ๋ชจ๋“œ ํŒŒ๋ผ๋ฏธํ„ฐ:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
sample_mode bool false ๋žœ๋ค ์ƒ˜ํ”Œ ์ƒ์„ฑ ๋ชจ๋“œ ํ™œ์„ฑํ™” (LM์„ ํ†ตํ•ด ์บก์…˜/๊ฐ€์‚ฌ/๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ž๋™ ์ƒ์„ฑ)
sample_query string "" ์ƒ˜ํ”Œ ์ƒ์„ฑ์„ ์œ„ํ•œ ์ž์—ฐ์–ด ์„ค๋ช… (์˜ˆ: "์กฐ์šฉํ•œ ์ €๋…์„ ์œ„ํ•œ ๋ถ€๋“œ๋Ÿฌ์šด ๋ฒต๊ณจ์–ด ์‚ฌ๋ž‘ ๋…ธ๋ž˜"). ๋ณ„์นญ: description, desc
use_format bool false LM์„ ์‚ฌ์šฉํ•˜์—ฌ ์ œ๊ณต๋œ ์บก์…˜๊ณผ ๊ฐ€์‚ฌ๋ฅผ ๊ฐœ์„ /ํฌ๋งทํŒ…ํ•ฉ๋‹ˆ๋‹ค. ๋ณ„์นญ: format

๋‹ค์ค‘ ๋ชจ๋ธ ์ง€์›:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
model string null ์‚ฌ์šฉํ•  DiT ๋ชจ๋ธ ์„ ํƒ (์˜ˆ: "acestep-v15-turbo", "acestep-v15-turbo-shift3"). /v1/models๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ๋ชฉ๋ก์„ ํ™•์ธํ•˜์„ธ์š”. ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ๊ธฐ๋ณธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

thinking ์˜๋ฏธ๋ก  (์ค‘์š”):

  • thinking=false:
    • ์„œ๋ฒ„๋Š” audio_code_string์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด 5Hz LM์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    • DiT๋Š” text2music ๋ชจ๋“œ์—์„œ ์‹คํ–‰๋˜๋ฉฐ ์ œ๊ณต๋œ audio_code_string์„ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.
  • thinking=true:
    • ์„œ๋ฒ„๋Š” audio_code_string์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด 5Hz LM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค (lm-dit ๋™์ž‘).
    • DiT๋Š” ํ–ฅ์ƒ๋œ ์Œ์•… ํ’ˆ์งˆ์„ ์œ„ํ•ด LM์ด ์ƒ์„ฑํ•œ ์ฝ”๋“œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ž๋™ ์™„์„ฑ (์กฐ๊ฑด๋ถ€):

use_cot_caption=true ๋˜๋Š” use_cot_language=true์ด๊ฑฐ๋‚˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ•„๋“œ๊ฐ€ ๋ˆ„๋ฝ๋œ ๊ฒฝ์šฐ, ์„œ๋ฒ„๋Š” caption/lyrics๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ˆ„๋ฝ๋œ ํ•„๋“œ๋ฅผ ์ฑ„์šฐ๊ธฐ ์œ„ํ•ด 5Hz LM์„ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • bpm
  • key_scale
  • time_signature
  • audio_duration

์‚ฌ์šฉ์ž๊ฐ€ ์ œ๊ณตํ•œ ๊ฐ’์ด ํ•ญ์ƒ ์šฐ์„ ํ•˜๋ฉฐ, LM์€ ๋น„์–ด ์žˆ๊ฑฐ๋‚˜ ๋ˆ„๋ฝ๋œ ํ•„๋“œ๋งŒ ์ฑ„์›๋‹ˆ๋‹ค.

์Œ์•… ์†์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
bpm int null ํ…œํฌ(BPM) ์ง€์ •, ๋ฒ”์œ„ 30-300
key_scale string "" ํ‚ค/์Šค์ผ€์ผ (์˜ˆ: "C Major", "Am"). ๋ณ„์นญ: keyscale, keyScale
time_signature string "" ๋ฐ•์ž ๊ธฐํ˜ธ (2/4, 3/4, 4/4, 6/8์˜ ๊ฒฝ์šฐ 2, 3, 4, 6). ๋ณ„์นญ: timesignature, timeSignature
audio_duration float null ์ƒ์„ฑ ๊ธธ์ด (์ดˆ), ๋ฒ”์œ„ 10-600. ๋ณ„์นญ: duration, target_duration

์˜ค๋””์˜ค ์ฝ”๋“œ (์„ ํƒ ์‚ฌํ•ญ):

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
audio_code_string string or string[] "" llm_dit๋ฅผ ์œ„ํ•œ ์˜ค๋””์˜ค ์‹œ๋งจํ‹ฑ ํ† ํฐ(5Hz) ๋ฌธ์ž์—ด. ๋ณ„์นญ: audioCodeString

์ƒ์„ฑ ์ œ์–ด ํŒŒ๋ผ๋ฏธํ„ฐ:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
inference_steps int 8 ์ถ”๋ก  ๋‹จ๊ณ„ ์ˆ˜. Turbo ๋ชจ๋ธ: 1-20 (๊ถŒ์žฅ 8). Base ๋ชจ๋ธ: 1-200 (๊ถŒ์žฅ 32-64).
guidance_scale float 7.0 ํ”„๋กฌํ”„ํŠธ ๊ฐ€์ด๋“œ ๊ณ„์ˆ˜. Base ๋ชจ๋ธ์—์„œ๋งŒ ์œ ํšจํ•ฉ๋‹ˆ๋‹ค.
use_random_seed bool true ๋žœ๋ค ์‹œ๋“œ ์‚ฌ์šฉ ์—ฌ๋ถ€
seed int -1 ์‹œ๋“œ ์ง€์ • (use_random_seed=false์ผ ๋•Œ)
batch_size int 2 ๋ฐฐ์น˜ ์ƒ์„ฑ ์ˆ˜ (์ตœ๋Œ€ 8)

๊ณ ๊ธ‰ DiT ํŒŒ๋ผ๋ฏธํ„ฐ:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
shift float 3.0 ํƒ€์ž„์Šคํ… ์‹œํ”„ํŠธ ๊ณ„์ˆ˜ (๋ฒ”์œ„ 1.0-5.0). Turbo ๋ชจ๋ธ์ด ์•„๋‹Œ Base ๋ชจ๋ธ์—์„œ๋งŒ ์œ ํšจํ•ฉ๋‹ˆ๋‹ค.
infer_method string "ode" ํ™•์‚ฐ ์ถ”๋ก  ๋ฐฉ๋ฒ•: "ode" (Euler, ๋” ๋น ๋ฆ„) ๋˜๋Š” "sde" (ํ™•๋ฅ ์ ).
timesteps string null ์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„๋œ ์ปค์Šคํ…€ ํƒ€์ž„์Šคํ… (์˜ˆ: "0.97,0.76,0.615,0.5,0.395,0.28,0.18,0.085,0"). inference_steps์™€ shift๋ฅผ ์žฌ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
use_adg bool false ADG (Adaptive Dual Guidance) ์‚ฌ์šฉ (Base ๋ชจ๋ธ ์ „์šฉ)
cfg_interval_start float 0.0 CFG ์ ์šฉ ์‹œ์ž‘ ๋น„์œจ (0.0-1.0)
cfg_interval_end float 1.0 CFG ์ ์šฉ ์ข…๋ฃŒ ๋น„์œจ (0.0-1.0)

5Hz LM ํŒŒ๋ผ๋ฏธํ„ฐ (์„ ํƒ ์‚ฌํ•ญ, ์„œ๋ฒ„์ธก):

์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์€ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ž๋™ ์™„์„ฑ ๋ฐ (thinking=true์ผ ๋•Œ) ์ฝ”๋“œ ์ƒ์„ฑ์— ์‚ฌ์šฉ๋˜๋Š” 5Hz LM ์ƒ˜ํ”Œ๋ง์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
lm_model_path string null 5Hz LM ์ฒดํฌํฌ์ธํŠธ ๋””๋ ‰ํ† ๋ฆฌ ์ด๋ฆ„ (์˜ˆ: acestep-5Hz-lm-0.6B)
lm_backend string "vllm" vllm ๋˜๋Š” pt
lm_temperature float 0.85 ์ƒ˜ํ”Œ๋ง ์˜จ๋„
lm_cfg_scale float 2.5 CFG ์Šค์ผ€์ผ (>1์ผ ๊ฒฝ์šฐ CFG ํ™œ์„ฑํ™”)
lm_negative_prompt string "NO USER INPUT" CFG์— ์‚ฌ์šฉ๋˜๋Š” ๋„ค๊ฑฐํ‹ฐ๋ธŒ ํ”„๋กฌํ”„ํŠธ
lm_top_k int null Top-k (0/null์€ ๋น„ํ™œ์„ฑ)
lm_top_p float 0.9 Top-p (>=1์€ ๋น„ํ™œ์„ฑ)
lm_repetition_penalty float 1.0 ๋ฐ˜๋ณต ํŽ˜๋„ํ‹ฐ

LM CoT (Chain-of-Thought) ํŒŒ๋ผ๋ฏธํ„ฐ:

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
use_cot_caption bool true CoT ์ถ”๋ก ์„ ํ†ตํ•ด LM์ด ์ž…๋ ฅ๋œ ์บก์…˜์„ ๋‹ค์‹œ ์“ฐ๊ฑฐ๋‚˜ ๊ฐœ์„ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๋ณ„์นญ: cot_caption, cot-caption
use_cot_language bool true CoT๋ฅผ ํ†ตํ•ด LM์ด ๊ฐ€์ฐฝ ์–ธ์–ด๋ฅผ ๊ฐ์ง€ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๋ณ„์นญ: cot_language, cot-language
constrained_decoding bool true ๊ตฌ์กฐํ™”๋œ LM ์ถœ๋ ฅ์„ ์œ„ํ•ด FSM ๊ธฐ๋ฐ˜ ์ œ์•ฝ ๋””์ฝ”๋”ฉ์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค. ๋ณ„์นญ: constrainedDecoding, constrained
constrained_decoding_debug bool false ์ œ์•ฝ ๋””์ฝ”๋”ฉ์— ๋Œ€ํ•œ ๋””๋ฒ„๊ทธ ๋กœ๊น… ํ™œ์„ฑํ™”
allow_lm_batch bool true ํšจ์œจ์„ฑ์„ ์œ„ํ•ด LM ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ํ—ˆ์šฉ

ํŽธ์ง‘/์ฐธ์กฐ ์˜ค๋””์˜ค ํŒŒ๋ผ๋ฏธํ„ฐ (์„œ๋ฒ„์˜ ์ ˆ๋Œ€ ๊ฒฝ๋กœ ํ•„์š”):

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
reference_audio_path string null ์ฐธ์กฐ ์˜ค๋””์˜ค ๊ฒฝ๋กœ (Style Transfer)
src_audio_path string null ์†Œ์Šค ์˜ค๋””์˜ค ๊ฒฝ๋กœ (Repainting/Cover)
task_type string "text2music" ์ž‘์—… ์œ ํ˜•: text2music, cover, repaint, lego, extract, complete
instruction string auto ํŽธ์ง‘ ์ง€์นจ (์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด task_type์— ๋”ฐ๋ผ ์ž๋™ ์ƒ์„ฑ๋จ)
repainting_start float 0.0 ๋ฆฌํŽ˜์ธํŒ… ์‹œ์ž‘ ์‹œ๊ฐ„ (์ดˆ)
repainting_end float null ๋ฆฌํŽ˜์ธํŒ… ์ข…๋ฃŒ ์‹œ๊ฐ„ (์ดˆ), ์˜ค๋””์˜ค ๋๊นŒ์ง€์˜ ๊ฒฝ์šฐ -1
audio_cover_strength float 1.0 ์˜ค๋””์˜ค ์ปค๋ฒ„ ๊ฐ•๋„ (0.0-1.0). ์Šคํƒ€์ผ ์ „์†ก ์ž‘์—…์˜ ๊ฒฝ์šฐ ๋‚ฎ์€ ๊ฐ’(0.2)์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• B: ํŒŒ์ผ ์—…๋กœ๋“œ (multipart/form-data)

๋กœ์ปฌ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์ฐธ์กฐ ๋˜๋Š” ์†Œ์Šค ์˜ค๋””์˜ค๋กœ ์—…๋กœ๋“œํ•ด์•ผ ํ•  ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์œ„์˜ ๋ชจ๋“  ํ•„๋“œ๋ฅผ ํผ ํ•„๋“œ๋กœ ์ง€์›ํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋‹ค์Œ ํŒŒ์ผ ํ•„๋“œ๋„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:

  • reference_audio ๋˜๋Š” ref_audio: (ํŒŒ์ผ) ์ฐธ์กฐ ์˜ค๋””์˜ค ํŒŒ์ผ ์—…๋กœ๋“œ
  • src_audio ๋˜๋Š” ctx_audio: (ํŒŒ์ผ) ์†Œ์Šค ์˜ค๋””์˜ค ํŒŒ์ผ ์—…๋กœ๋“œ

์ฐธ๊ณ : ํŒŒ์ผ์„ ์—…๋กœ๋“œํ•˜๋ฉด ํ•ด๋‹น _path ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ž๋™์œผ๋กœ ๋ฌด์‹œ๋˜๊ณ  ์‹œ์Šคํ…œ์€ ์—…๋กœ๋“œ ํ›„ ์ƒ์„ฑ๋œ ์ž„์‹œ ํŒŒ์ผ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

4.3 ์‘๋‹ต ์˜ˆ์‹œ

{
  "data": {
    "task_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "queue_position": 1
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

5. ์ž‘์—… ๊ฒฐ๊ณผ ๋ฐฐ์น˜ ์กฐํšŒ

5.1 API ์ •์˜

  • URL: /query_result
  • Method: POST
  • Content-Type: application/json ๋˜๋Š” application/x-www-form-urlencoded

5.2 ์š”์ฒญ ํŒŒ๋ผ๋ฏธํ„ฐ

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ์„ค๋ช…
task_id_list string (JSON array) or array ์กฐํšŒํ•  ์ž‘์—… ID ๋ชฉ๋ก

5.3 ์‘๋‹ต ์˜ˆ์‹œ

{
  "data": [
    {
      "task_id": "550e8400-e29b-41d4-a716-446655440000",
      "status": 1,
      "result": "[{\"file\": \"/v1/audio?path=...\", \"wave\": \"\", \"status\": 1, \"create_time\": 1700000000, \"env\": \"development\", \"prompt\": \"upbeat pop song\", \"lyrics\": \"Hello world\", \"metas\": {\"bpm\": 120, \"duration\": 30, \"genres\": \"\", \"keyscale\": \"C Major\", \"timesignature\": \"4\"}, \"generation_info\": \"...\", \"seed_value\": \"12345,67890\", \"lm_model\": \"acestep-5Hz-lm-0.6B\", \"dit_model\": \"acestep-v15-turbo\"}]"
    }
  ],
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

Result ํ•„๋“œ ์„ค๋ช… (result๋Š” JSON ๋ฌธ์ž์—ด์ด๋ฉฐ, ํŒŒ์‹ฑ ํ›„ ๋‹ค์Œ์„ ํฌํ•จ):

ํ•„๋“œ ํƒ€์ž… ์„ค๋ช…
file string ์˜ค๋””์˜ค ํŒŒ์ผ URL (/v1/audio ์—”๋“œํฌ์ธํŠธ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ)
wave string ํŒŒํ˜• ๋ฐ์ดํ„ฐ (๋ณดํ†ต ๋น„์–ด ์žˆ์Œ)
status int ์ƒํƒœ ์ฝ”๋“œ (0=์ง„ํ–‰ ์ค‘, 1=์„ฑ๊ณต, 2=์‹คํŒจ)
create_time int ์ƒ์„ฑ ์‹œ๊ฐ„ (Unix ํƒ€์ž„์Šคํƒฌํ”„)
env string ํ™˜๊ฒฝ ์‹๋ณ„์ž
prompt string ์‚ฌ์šฉ๋œ ํ”„๋กฌํ”„ํŠธ
lyrics string ์‚ฌ์šฉ๋œ ๊ฐ€์‚ฌ
metas object ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (bpm, duration, genres, keyscale, timesignature)
generation_info string ์ƒ์„ฑ ์ •๋ณด ์š”์•ฝ
seed_value string ์‚ฌ์šฉ๋œ ์‹œ๋“œ ๊ฐ’ (์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„)
lm_model string ์‚ฌ์šฉ๋œ LM ๋ชจ๋ธ ๋ช…
dit_model string ์‚ฌ์šฉ๋œ DiT ๋ชจ๋ธ ๋ช…

6. ์ž…๋ ฅ ํฌ๋งทํŒ… (Format Input)

6.1 API ์ •์˜

  • URL: /format_input
  • Method: POST

์ด ์—”๋“œํฌ์ธํŠธ๋Š” LLM์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ์ œ๊ณตํ•œ ์บก์…˜๊ณผ ๊ฐ€์‚ฌ๋ฅผ ๊ฐœ์„ ํ•˜๊ณ  ํฌ๋งทํŒ…ํ•ฉ๋‹ˆ๋‹ค.

6.2 ์š”์ฒญ ํŒŒ๋ผ๋ฏธํ„ฐ

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
prompt string "" ์Œ์•… ์„ค๋ช… ํ”„๋กฌํ”„ํŠธ
lyrics string "" ๊ฐ€์‚ฌ ๋‚ด์šฉ
temperature float 0.85 LM ์ƒ˜ํ”Œ๋ง ์˜จ๋„
param_obj string (JSON) "{}" ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š” JSON ๊ฐ์ฒด (duration, bpm, key, time_signature, language)

6.3 ์‘๋‹ต ์˜ˆ์‹œ

{
  "data": {
    "caption": "Enhanced music description",
    "lyrics": "Formatted lyrics...",
    "bpm": 120,
    "key_scale": "C Major",
    "time_signature": "4",
    "duration": 180,
    "vocal_language": "en"
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

7. ๋žœ๋ค ์ƒ˜ํ”Œ ๊ฐ€์ ธ์˜ค๊ธฐ

7.1 API ์ •์˜

  • URL: /create_random_sample
  • Method: POST

์ด ์—”๋“œํฌ์ธํŠธ๋Š” ํผ ์ฑ„์šฐ๊ธฐ๋ฅผ ์œ„ํ•ด ์‚ฌ์ „ ๋กœ๋“œ๋œ ์˜ˆ์ œ ๋ฐ์ดํ„ฐ์—์„œ ์ž„์˜์˜ ์ƒ˜ํ”Œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

7.2 ์š”์ฒญ ํŒŒ๋ผ๋ฏธํ„ฐ

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
sample_type string "simple_mode" ์ƒ˜ํ”Œ ์œ ํ˜•: "simple_mode" ๋˜๋Š” "custom_mode"

8. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ๋ชฉ๋ก

8.1 API ์ •์˜

  • URL: /v1/models
  • Method: GET

์„œ๋ฒ„์— ๋กœ๋“œ๋œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ DiT ๋ชจ๋ธ ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

8.2 ์‘๋‹ต ์˜ˆ์‹œ

{
  "data": {
    "models": [
      {
        "name": "acestep-v15-turbo",
        "is_default": true
      },
      {
        "name": "acestep-v15-turbo-shift3",
        "is_default": false
      }
    ],
    "default_model": "acestep-v15-turbo"
  },
  "code": 200,
  "error": null,
  "timestamp": 1700000000000,
  "extra": null
}

9. ์„œ๋ฒ„ ํ†ต๊ณ„

9.1 API ์ •์˜

  • URL: /v1/stats
  • Method: GET

์„œ๋ฒ„ ๋Ÿฐํƒ€์ž„ ํ†ต๊ณ„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.


10. ์˜ค๋””์˜ค ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ

10.1 API ์ •์˜

  • URL: /v1/audio
  • Method: GET

๊ฒฝ๋กœ๋ณ„๋กœ ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

10.2 ์š”์ฒญ ํŒŒ๋ผ๋ฏธํ„ฐ

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ช… ํƒ€์ž… ์„ค๋ช…
path string ์˜ค๋””์˜ค ํŒŒ์ผ์˜ URL ์ธ์ฝ”๋”ฉ๋œ ๊ฒฝ๋กœ

11. ํ—ฌ์Šค ์ฒดํฌ

11.1 API ์ •์˜

  • URL: /health
  • Method: GET

์„œ๋น„์Šค ์ƒํƒœ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.


12. ํ™˜๊ฒฝ ๋ณ€์ˆ˜

API ์„œ๋ฒ„๋Š” ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

์„œ๋ฒ„ ๊ตฌ์„ฑ

๋ณ€์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
ACESTEP_API_HOST 127.0.0.1 ์„œ๋ฒ„ ๋ฐ”์ธ๋“œ ํ˜ธ์ŠคํŠธ
ACESTEP_API_PORT 8001 ์„œ๋ฒ„ ๋ฐ”์ธ๋“œ ํฌํŠธ
ACESTEP_API_KEY (๋น„์–ด ์žˆ์Œ) API ์ธ์ฆ ํ‚ค (๋น„์–ด ์žˆ์œผ๋ฉด ์ธ์ฆ ๋น„ํ™œ์„ฑํ™”)
ACESTEP_API_WORKERS 1 API ์›Œ์ปค ์Šค๋ ˆ๋“œ ์ˆ˜

๋ชจ๋ธ ๊ตฌ์„ฑ

๋ณ€์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
ACESTEP_CONFIG_PATH acestep-v15-turbo ์ฃผ DiT ๋ชจ๋ธ ๊ฒฝ๋กœ
ACESTEP_DEVICE auto ๋ชจ๋ธ ๋กœ๋”ฉ ์žฅ์น˜
ACESTEP_OFFLOAD_TO_CPU false ์œ ํœด ์‹œ ๋ชจ๋ธ์„ CPU๋กœ ์˜คํ”„๋กœ๋“œ

LM ๊ตฌ์„ฑ

๋ณ€์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
ACESTEP_INIT_LLM auto ์‹œ์ž‘ ์‹œ LM์„ ์ดˆ๊ธฐํ™”ํ• ์ง€ ์—ฌ๋ถ€ (GPU์— ๋”ฐ๋ผ ์ž๋™ ๊ฒฐ์ •)
ACESTEP_LM_MODEL_PATH acestep-5Hz-lm-0.6B ๊ธฐ๋ณธ 5Hz LM ๋ชจ๋ธ
ACESTEP_LM_BACKEND vllm LM ๋ฐฑ์—”๋“œ (vllm ๋˜๋Š” pt)

์—๋Ÿฌ ์ฒ˜๋ฆฌ

HTTP ์ƒํƒœ ์ฝ”๋“œ:

  • 200: ์„ฑ๊ณต
  • 400: ์ž˜๋ชป๋œ ์š”์ฒญ (์ž˜๋ชป๋œ JSON, ๋ˆ„๋ฝ๋œ ํ•„๋“œ)
  • 401: ๋ฏธ์ธ์ฆ (๋ˆ„๋ฝ๋˜์—ˆ๊ฑฐ๋‚˜ ์ž˜๋ชป๋œ API ํ‚ค)
  • 429: ์„œ๋ฒ„ ๋ฐ”์จ (๋Œ€๊ธฐ์—ด์ด ๊ฐ€๋“ ์ฐธ)
  • 500: ๋‚ด๋ถ€ ์„œ๋ฒ„ ์˜ค๋ฅ˜

๋ชจ๋ฒ” ์‚ฌ๋ก€

  1. thinking=true๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LM์ด ํ–ฅ์ƒ๋œ ์ƒ์„ฑ ํ’ˆ์งˆ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์œผ์„ธ์š”.
  2. ์ž์—ฐ์–ด ์„ค๋ช…์—์„œ ๋น ๋ฅธ ์ƒ์„ฑ์„ ์œ„ํ•ด sample_query/description์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
  3. ์บก์…˜/๊ฐ€์‚ฌ๊ฐ€ ์žˆ์ง€๋งŒ LM์ด ์ด๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ๋ฅผ ์›ํ•  ๋•Œ use_format=true๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
  4. /query_result ์—”๋“œํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ์ž‘์—… ์ƒํƒœ๋ฅผ ๋ฐฐ์น˜ ์กฐํšŒํ•˜์„ธ์š”.
  5. /v1/stats๋ฅผ ํ™•์ธํ•˜์—ฌ ์„œ๋ฒ„ ๋ถ€ํ•˜์™€ ํ‰๊ท  ์ž‘์—… ์‹œ๊ฐ„์„ ํŒŒ์•…ํ•˜์„ธ์š”.
  6. ๋ณด์•ˆ์„ ์œ„ํ•ด ACESTEP_API_KEY๋ฅผ ์„ค์ •ํ•˜์—ฌ ์ธ์ฆ์„ ํ™œ์„ฑํ™”ํ•˜์„ธ์š”.