Ace-Step-Munk / docs /ko /LoRA_Training_Tutorial.md
OnyxMunk's picture
Add LoRA training assets: scripts, docs (no binaries), ui, my_dataset
bc9c638

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

ACE-Step 1.5 LoRA ํ•™์Šต ํŠœํ† ๋ฆฌ์–ผ

ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ

VRAM ์„ค๋ช…
16 GB (์ตœ์†Œ) ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‚˜, ๊ธด ๊ณก์˜ ๊ฒฝ์šฐ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
20 GB ์ด์ƒ (๊ถŒ์žฅ) ์ „์ฒด ๊ธธ์ด์˜ ๊ณก์„ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ. ํ•™์Šต ์ค‘ VRAM ์‚ฌ์šฉ๋Ÿ‰์€ ๋ณดํ†ต 17 GB ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค

์ฐธ๊ณ : ํ•™์Šต ์‹œ์ž‘ ์ „ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ VRAM์„ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•ด Gradio๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ์žฌ์‹œ์ž‘ํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์ธ ์‹œ์ ์€ ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.

๋ฉด์ฑ… ์กฐํ•ญ

๋ณธ ํŠœํ† ๋ฆฌ์–ผ์€ ๋‚˜์œ ํƒ„์„ฑ์ธ (NayutalieN) ์˜ ์•จ๋ฒ” ใƒŠใƒฆใ‚ฟใƒณๆ˜Ÿใ‹ใ‚‰ใฎ็‰ฉไฝ“Y (์ด 13๊ณก)์„ ๋ฐ๋ชจ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, 500 ์—ํฌํฌ(๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ 1)๋กœ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ํŠœํ† ๋ฆฌ์–ผ์€ LoRA ํŒŒ์ธํŠœ๋‹ ๊ธฐ์ˆ ์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ž์‹ ์˜ ์›์ž‘์œผ๋กœ LoRA๋ฅผ ํ•™์Šตํ•ด ์ฃผ์„ธ์š”.

๊ฐœ๋ฐœ์ž๋กœ์„œ ๋‚˜์œ ํƒ„์„ฑ์ธ์˜ ์ž‘ํ’ˆ์„ ๋งค์šฐ ์ข‹์•„ํ•˜์—ฌ ์•จ๋ฒ” ํ•˜๋‚˜๋ฅผ ์˜ˆ์‹œ๋กœ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ถŒ๋ฆฌ ๋ณด์œ ์ž๋ถ„๊ป˜์„œ ๋ณธ ํŠœํ† ๋ฆฌ์–ผ์ด ํ•ฉ๋ฒ•์ ์ธ ๊ถŒ๋ฆฌ๋ฅผ ์นจํ•ดํ•œ๋‹ค๊ณ  ํŒ๋‹จํ•˜์‹œ๋ฉด ์ฆ‰์‹œ ์—ฐ๋ฝ ์ฃผ์„ธ์š”. ์œ ํšจํ•œ ํ†ต์ง€๋ฅผ ๋ฐ›์€ ํ›„ ๊ด€๋ จ ์ฝ˜ํ…์ธ ๋ฅผ ์‚ญ์ œํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ธฐ์ˆ ์€ ํ•ฉ๋ฆฌ์ ์ด๊ณ  ํ•ฉ๋ฒ•์ ์œผ๋กœ ์‚ฌ์šฉ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์•„ํ‹ฐ์ŠคํŠธ์˜ ์ฐฝ์ž‘๋ฌผ์„ ์กด์ค‘ํ•˜๊ณ , ์›์ž‘ ์•„ํ‹ฐ์ŠคํŠธ์˜ ๋ช…์˜ˆ, ๊ถŒ๋ฆฌ ๋˜๋Š” ์ด์ต์„ ์†์ƒ์‹œํ‚ค๊ฑฐ๋‚˜ ํ•ด์น˜๋Š” ํ–‰์œ„๋ฅผ ํ•˜์ง€ ๋งˆ์„ธ์š”.


๋ฐ์ดํ„ฐ ์ค€๋น„

ํŒ: ํ”„๋กœ๊ทธ๋ž˜๋ฐ์— ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ, ์ด ๋ฌธ์„œ๋ฅผ Claude Code / Codex CLI / Cursor / Copilot ๋“ฑ์˜ AI ์ฝ”๋”ฉ ๋„๊ตฌ์— ์ „๋‹ฌํ•˜์—ฌ ์ž‘์—…์„ ๋Œ€์‹  ์ˆ˜ํ–‰ํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ์š”

๊ฐ ๊ณก์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ ํ•ญ๋ชฉ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. ์˜ค๋””์˜ค ํŒŒ์ผ โ€” .mp3, .wav, .flac, .ogg, .opus ํ˜•์‹ ์ง€์›
  2. ๊ฐ€์‚ฌ โ€” ์˜ค๋””์˜ค์™€ ๋™์ผํ•œ ์ด๋ฆ„์˜ .lyrics.txt ํŒŒ์ผ (ํ•˜์œ„ ํ˜ธํ™˜์„ ์œ„ํ•ด .txt๋„ ์ง€์›)
  3. ์–ด๋…ธํ…Œ์ด์…˜ ๋ฐ์ดํ„ฐ โ€” caption, bpm, keyscale, timesignature, language ๋“ฑ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ

์–ด๋…ธํ…Œ์ด์…˜ ๋ฐ์ดํ„ฐ ํ˜•์‹

์™„์ „ํ•œ ์–ด๋…ธํ…Œ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋‹ค๋ฉด, JSON ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜์—ฌ ์˜ค๋””์˜ค ๋ฐ ๊ฐ€์‚ฌ์™€ ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ผ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

dataset/
โ”œโ”€โ”€ song1.mp3               # ์˜ค๋””์˜ค
โ”œโ”€โ”€ song1.lyrics.txt        # ๊ฐ€์‚ฌ
โ”œโ”€โ”€ song1.json              # ์–ด๋…ธํ…Œ์ด์…˜ (์„ ํƒ)
โ”œโ”€โ”€ song1.caption.txt       # ์บก์…˜ (์„ ํƒ, JSON์— ํฌํ•จํ•  ์ˆ˜๋„ ์žˆ์Œ)
โ”œโ”€โ”€ song2.mp3
โ”œโ”€โ”€ song2.lyrics.txt
โ”œโ”€โ”€ song2.json
โ””โ”€โ”€ ...

JSON ํŒŒ์ผ ๊ตฌ์กฐ (๋ชจ๋“  ํ•„๋“œ๋Š” ์„ ํƒ ์‚ฌํ•ญ):

{
    "caption": "A high-energy J-pop track with synthesizer leads and fast tempo",
    "bpm": 190,
    "keyscale": "D major",
    "timesignature": "4",
    "language": "ja"
}

์–ด๋…ธํ…Œ์ด์…˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ, ์ดํ›„ ์„น์…˜์—์„œ ์†Œ๊ฐœํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ทจ๋“ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๊ฐ€์‚ฌ

๊ฐ€์‚ฌ๋ฅผ ์˜ค๋””์˜ค ํŒŒ์ผ๊ณผ ๋™์ผํ•œ ์ด๋ฆ„์˜ .lyrics.txt ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ณ  ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”. ๊ฐ€์‚ฌ์˜ ์ •ํ™•์„ฑ์„ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.

์Šค์บ” ์‹œ ๊ฐ€์‚ฌ ํŒŒ์ผ ๊ฒ€์ƒ‰ ์šฐ์„ ์ˆœ์œ„:

  1. {ํŒŒ์ผ๋ช…}.lyrics.txt (๊ถŒ์žฅ)
  2. {ํŒŒ์ผ๋ช…}.txt (ํ•˜์œ„ ํ˜ธํ™˜)

๊ฐ€์‚ฌ ์ „์‚ฌ

๊ธฐ์กด ๊ฐ€์‚ฌ ํ…์ŠคํŠธ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ, ๋‹ค์Œ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

๋„๊ตฌ ๊ตฌ์กฐํ™” ํƒœ๊ทธ ์ •ํ™•๋„ ์‚ฌ์šฉ ๋‚œ์ด๋„ ๋ฐฐํฌ ๋ฐฉ์‹
acestep-transcriber ์—†์Œ ์˜ค๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ ์žˆ์Œ ๋†’์Œ (๋ชจ๋ธ ๋ฐฐํฌ ํ•„์š”) ์ž์ฒด ํ˜ธ์ŠคํŒ…
Gemini ์žˆ์Œ ์˜ค๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ ์žˆ์Œ ๋‚ฎ์Œ ์œ ๋ฃŒ API
Whisper ์—†์Œ ์˜ค๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ ์žˆ์Œ ๋ณดํ†ต ์ž์ฒด ํ˜ธ์ŠคํŒ… / ์œ ๋ฃŒ API
ElevenLabs ์—†์Œ ์˜ค๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ ์žˆ์Œ ๋ณดํ†ต ์œ ๋ฃŒ API (๋ฌด๋ฃŒ ํฌ๋ ˆ๋”ง ์ œ๊ณต)

๋ณธ ํ”„๋กœ์ ํŠธ๋Š” scripts/lora_data_prepare/์— ํ•ด๋‹น ์ „์‚ฌ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • whisper_transcription.py โ€” OpenAI Whisper API๋ฅผ ํ†ตํ•œ ์ „์‚ฌ
  • elevenlabs_transcription.py โ€” ElevenLabs Scribe API๋ฅผ ํ†ตํ•œ ์ „์‚ฌ

๋‘ ์Šคํฌ๋ฆฝํŠธ ๋ชจ๋‘ process_folder() ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•œ ํด๋” ์ผ๊ด„ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

๊ฒ€ํ†  ๋ฐ ์ •์ œ (ํ•„์ˆ˜)

์ „์‚ฌ๋œ ๊ฐ€์‚ฌ์—๋Š” ์˜ค๋ฅ˜๊ฐ€ ํฌํ•จ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ฐ˜๋“œ์‹œ ์ˆ˜๋™์œผ๋กœ ๊ฒ€ํ† ํ•˜๊ณ  ์ˆ˜์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

LRC ํ˜•์‹์˜ ๊ฐ€์‚ฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ์ œ๊ฑฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๊ฐ„๋‹จํ•œ ์ •์ œ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค:

import re

def clean_lrc_content(lines):
    """LRC ํŒŒ์ผ ๋‚ด์šฉ์„ ์ •์ œํ•˜๊ณ  ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ์ œ๊ฑฐ"""
    result = []
    for line in lines:
        line = line.strip()
        if not line:
            continue
        # ํƒ€์ž„์Šคํƒฌํ”„ ์ œ๊ฑฐ [mm:ss.x] [mm:ss.xx] [mm:ss.xxx]
        cleaned = re.sub(r"\[\d{2}:\d{2}\.\d{1,3}\]", "", line)
        result.append(cleaned)

    # ๋๋ถ€๋ถ„ ๋นˆ ์ค„ ์ œ๊ฑฐ
    while result and not result[-1]:
        result.pop()

    return result

๊ตฌ์กฐํ™” ํƒœ๊ทธ (์„ ํƒ)

๊ฐ€์‚ฌ์— ๊ตฌ์กฐํ™” ํƒœ๊ทธ([Verse], [Chorus] ๋“ฑ)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉด, ๋ชจ๋ธ์ด ๊ณก์˜ ๊ตฌ์กฐ๋ฅผ ๋” ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์กฐํ™” ํƒœ๊ทธ ์—†์ด๋„ ์ •์ƒ์ ์œผ๋กœ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํŒ: Gemini๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด ๊ฐ€์‚ฌ์— ๊ตฌ์กฐํ™” ํƒœ๊ทธ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ:

[Intro]
La la la...

[Verse 1]
Walking down the empty street
Echoes dancing at my feet

[Chorus]
We are the stars tonight
Shining through the endless sky

[Bridge]
Close your eyes and feel the sound

์ž๋™ ์–ด๋…ธํ…Œ์ด์…˜

1. BPM ๋ฐ Key ์ทจ๋“

Key-BPM-Finder๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ BPM๊ณผ ํ‚ค ์–ด๋…ธํ…Œ์ด์…˜์„ ์˜จ๋ผ์ธ์œผ๋กœ ์ทจ๋“ํ•ฉ๋‹ˆ๋‹ค:

  1. ์›น ํŽ˜์ด์ง€๋ฅผ ์—ด๊ณ  Browse my files๋ฅผ ํด๋ฆญํ•˜์—ฌ ์ฒ˜๋ฆฌํ•  ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค (ํ•œ ๋ฒˆ์— ๋„ˆ๋ฌด ๋งŽ์ด ์ฒ˜๋ฆฌํ•˜๋ฉด ๋ฉˆ์ถœ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ๋ถ„ํ•  ์ฒ˜๋ฆฌ ํ›„ CSV๋ฅผ ๋ณ‘ํ•ฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค). ์ฒ˜๋ฆฌ๋Š” ๋กœ์ปฌ์—์„œ ์ˆ˜ํ–‰๋˜๋ฉฐ ์„œ๋ฒ„์— ์—…๋กœ๋“œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. key-bpm-finder-0.jpg

  2. ์ฒ˜๋ฆฌ ์™„๋ฃŒ ํ›„, Export CSV๋ฅผ ํด๋ฆญํ•˜์—ฌ CSV ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. key-bpm-finder-1.jpg

  3. CSV ํŒŒ์ผ ๋‚ด์šฉ ์˜ˆ์‹œ:

    File,Artist,Title,BPM,Key,Camelot
    song1.wav,,,190,D major,10B
    song2.wav,,,128,A minor,8A
    
  4. CSV ํŒŒ์ผ์„ ๋ฐ์ดํ„ฐ์…‹ ํด๋”์— ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค. ์บก์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ ค๋ฉด Camelot ์—ด ๋’ค์— ์ƒˆ ์—ด์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

2. Caption ์ทจ๋“

๋‹ค์Œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ณก์˜ ์บก์…˜์„ ์ทจ๋“ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • acestep-5Hz-lm ์‚ฌ์šฉ (0.6B / 1.7B / 4B) โ€” Gradio UI์˜ Auto Label ๊ธฐ๋Šฅ์—์„œ ํ˜ธ์ถœ (์ดํ›„ ๋‹จ๊ณ„ ์ฐธ์กฐ)
  • Gemini API ์‚ฌ์šฉ โ€” ์Šคํฌ๋ฆฝํŠธ scripts/lora_data_prepare/gemini_caption.py๋ฅผ ์ฐธ์กฐ. process_folder()๋กœ ์ผ๊ด„ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•˜๋ฉฐ, ๊ฐ ์˜ค๋””์˜ค ํŒŒ์ผ์— ๋Œ€ํ•ด ๋‹ค์Œ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
    • {ํŒŒ์ผ๋ช…}.lyrics.txt โ€” ๊ฐ€์‚ฌ
    • {ํŒŒ์ผ๋ช…}.caption.txt โ€” ์บก์…˜ ์„ค๋ช…

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

๋ฐ์ดํ„ฐ๊ฐ€ ์ค€๋น„๋˜๋ฉด Gradio UI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๊ฒ€ํ†  ๋ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ค‘์š”: ์‹œ์ž‘ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ์„œ๋น„์Šค ์‚ฌ์ „ ์ดˆ๊ธฐํ™”๋ฅผ ๋น„ํ™œ์„ฑํ™”ํ•˜๋„๋ก ์‹œ์ž‘ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  • Windows (start_gradio_ui.bat): if not defined INIT_SERVICE set INIT_SERVICE=--init_service true๋ฅผ if not defined INIT_SERVICE set INIT_SERVICE=--init_service false๋กœ ๋ณ€๊ฒฝ
  • Linux/macOS (start_gradio_ui.sh): : "${INIT_SERVICE:=--init_service true}"๋ฅผ : "${INIT_SERVICE:=--init_service false}"๋กœ ๋ณ€๊ฒฝ

Gradio UI๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค (์‹œ์ž‘ ์Šคํฌ๋ฆฝํŠธ ๋˜๋Š” acestep/acestep_v15_pipeline.py ์ง์ ‘ ์‹คํ–‰).

๋‹จ๊ณ„ 1: ๋ชจ๋ธ ๋กœ๋“œ

  • LM์œผ๋กœ ์บก์…˜์„ ์ƒ์„ฑํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ: ์ดˆ๊ธฐํ™” ์‹œ ์‚ฌ์šฉํ•  LM ๋ชจ๋ธ(acestep-5Hz-lm-0.6B / 1.7B / 4B)์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

  • LM์ด ํ•„์š”ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ: LM ๋ชจ๋ธ์„ ์„ ํƒํ•˜์ง€ ๋งˆ์„ธ์š”.

๋‹จ๊ณ„ 2: ๋ฐ์ดํ„ฐ ๋กœ๋“œ

LoRA Training ํƒญ์œผ๋กœ ์ „ํ™˜ํ•˜๊ณ , ๋ฐ์ดํ„ฐ์…‹ ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•œ ํ›„ Scan์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์Šค์บ๋„ˆ๋Š” ๋‹ค์Œ ํŒŒ์ผ์„ ์ž๋™์œผ๋กœ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค:

ํŒŒ์ผ ์„ค๋ช…
*.mp3 / *.wav / *.flac / ... ์˜ค๋””์˜ค ํŒŒ์ผ
{ํŒŒ์ผ๋ช…}.lyrics.txt (๋˜๋Š” {ํŒŒ์ผ๋ช…}.txt) ๊ฐ€์‚ฌ
{ํŒŒ์ผ๋ช…}.caption.txt ์บก์…˜ ์„ค๋ช…
{ํŒŒ์ผ๋ช…}.json ์–ด๋…ธํ…Œ์ด์…˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (caption / bpm / keyscale / timesignature / language)
*.csv BPM / Key ์ผ๊ด„ ์–ด๋…ธํ…Œ์ด์…˜ (Key-BPM-Finder์—์„œ ๋‚ด๋ณด๋‚ด๊ธฐ)

๋‹จ๊ณ„ 3: ๋ฐ์ดํ„ฐ์…‹ ๋ฏธ๋ฆฌ๋ณด๊ธฐ ๋ฐ ์กฐ์ •

  • Duration โ€” ์˜ค๋””์˜ค ํŒŒ์ผ์—์„œ ์ž๋™์œผ๋กœ ์ฝ๊ธฐ
  • Lyrics โ€” ๋™์ผํ•œ ์ด๋ฆ„์˜ .lyrics.txt ํŒŒ์ผ์ด ํ•„์š” (.txt๋„ ์ง€์›)
  • Labeled โ€” ์บก์…˜์ด ์žˆ์œผ๋ฉด โœ…, ์—†์œผ๋ฉด โŒ๋กœ ํ‘œ์‹œ
  • BPM / Key / Caption โ€” JSON ๋˜๋Š” CSV ํŒŒ์ผ์—์„œ ๋กœ๋“œ
  • ๋ฐ์ดํ„ฐ์…‹์ด ๋ชจ๋‘ ์ธ์ŠคํŠธ๋ฃจ๋ฉ˜ํƒˆ์ด ์•„๋‹Œ ๊ฒฝ์šฐ, All Instrumental ์ฒดํฌ๋ฅผ ํ•ด์ œํ•˜์„ธ์š”
  • Format Lyrics ๋ฐ Transcribe Lyrics ๊ธฐ๋Šฅ์€ ํ˜„์žฌ ๋น„ํ™œ์„ฑํ™” ์ƒํƒœ์ž…๋‹ˆ๋‹ค (acestep-transcriber ๋ฏธ์—ฐ๋™์œผ๋กœ ์ธํ•ด LM ์ง์ ‘ ์‚ฌ์šฉ ์‹œ ํ™˜๊ฐ ๋ฐœ์ƒ ๊ฐ€๋Šฅ)
  • Custom Trigger Tag๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š” (ํ˜„์žฌ ํšจ๊ณผ๊ฐ€ ์ œํ•œ์ ์ด๋ฉฐ, Replace Caption ์ด์™ธ์˜ ์˜ต์…˜์ด๋ฉด ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค)
  • Genre Ratio๋Š” ์บก์…˜ ๋Œ€์‹  ์žฅ๋ฅด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ƒ˜ํ”Œ ๋น„์œจ์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ LM์ด ์ƒ์„ฑํ•˜๋Š” ์žฅ๋ฅด ์„ค๋ช…์€ ์บก์…˜์— ๋น„ํ•ด ๋ถ€์กฑํ•˜๋ฏ€๋กœ 0์œผ๋กœ ์œ ์ง€ํ•˜์„ธ์š”

๋‹จ๊ณ„ 4: Auto Label Data

  • ์ด๋ฏธ ์บก์…˜์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด ๋‹จ๊ณ„๋ฅผ ๊ฑด๋„ˆ๋›ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • ๋ฐ์ดํ„ฐ์— ์บก์…˜์ด ์—†๋Š” ๊ฒฝ์šฐ, LM ์ถ”๋ก ์„ ํ†ตํ•ด ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • BPM / Key ๊ฐ’์ด ์—†๋Š” ๊ฒฝ์šฐ, ๋จผ์ € Key-BPM-Finder๋กœ ์ทจ๋“ํ•˜์„ธ์š”. LM์œผ๋กœ ์ง์ ‘ ์ƒ์„ฑํ•˜๋ฉด ํ™˜๊ฐ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค

๋‹จ๊ณ„ 5: ๋ฐ์ดํ„ฐ ๋ฏธ๋ฆฌ๋ณด๊ธฐ ๋ฐ ํŽธ์ง‘

ํ•„์š”ํ•œ ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ๋ฅผ ํ•ญ๋ชฉ๋ณ„๋กœ ๊ฒ€ํ† ํ•˜๊ณ  ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ ํŽธ์ง‘ ํ›„ ๋ฐ˜๋“œ์‹œ ์ €์žฅ์„ ํด๋ฆญํ•˜์„ธ์š”.

๋‹จ๊ณ„ 6: ๋ฐ์ดํ„ฐ์…‹ ์ €์žฅ

์ €์žฅ ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๋ฐ์ดํ„ฐ์…‹์„ JSON ํŒŒ์ผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

๋‹จ๊ณ„ 7: ์ „์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•œ Tensor ํŒŒ์ผ ์ƒ์„ฑ

์ฃผ์˜: ์ด์ „์— LM์œผ๋กœ ์บก์…˜์„ ์ƒ์„ฑํ–ˆ๊ณ  VRAM์ด ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ, ๋จผ์ € Gradio๋ฅผ ์žฌ์‹œ์ž‘ํ•˜์—ฌ VRAM์„ ํ™•๋ณดํ•˜์„ธ์š”. ์žฌ์‹œ์ž‘ ์‹œ LM ๋ชจ๋ธ์„ ์„ ํƒํ•˜์ง€ ๋งˆ์„ธ์š”. ๏ฟฝ๏ฟฝ์‹œ์ž‘ ํ›„, ์ €์žฅ๋œ JSON ํŒŒ์ผ์˜ ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

Tensor ํŒŒ์ผ ์ €์žฅ ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ์ „์ฒ˜๋ฆฌ๋ฅผ ์‹œ์ž‘ํ•œ ํ›„ ์™„๋ฃŒ๋ฅผ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.


ํ•™์Šต

์ฃผ์˜: Tensor ํŒŒ์ผ ์ƒ์„ฑ ํ›„์—๋„ VRAM์„ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•ด Gradio๋ฅผ ์žฌ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

  1. Train LoRA ํƒญ์œผ๋กœ ์ „ํ™˜ํ•˜๊ณ , Tensor ํŒŒ์ผ ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
  2. ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ์— ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ, ๊ธฐ๋ณธ๊ฐ’์„ ์‚ฌ์šฉํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค.

ํŒŒ๋ผ๋ฏธํ„ฐ ์ฐธ๊ณ 

ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค๋ช… ๊ถŒ์žฅ๊ฐ’
Max Epochs ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์กฐ์ • ์•ฝ 100๊ณก โ†’ 500 ์—ํฌํฌ; 10โ€“20๊ณก โ†’ 800 ์—ํฌํฌ (์ฐธ๊ณ ์šฉ)
Batch Size VRAM์ด ์ถฉ๋ถ„ํ•˜๋ฉด ์ฆ๊ฐ€ ๊ฐ€๋Šฅ 1 (๊ธฐ๋ณธ๊ฐ’), VRAM์ด ์ถฉ๋ถ„ํ•˜๋ฉด 2 ๋˜๋Š” 4
Save Every N Epochs ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ ๊ฐ„๊ฒฉ Max Epochs๊ฐ€ ์ž‘์œผ๋ฉด ์งง๊ฒŒ, ํฌ๋ฉด ๊ธธ๊ฒŒ ์„ค์ •

์œ„ ์ˆ˜์น˜๋Š” ์ฐธ๊ณ ์šฉ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์กฐ์ •ํ•ด ์ฃผ์„ธ์š”.

  1. Start Training์„ ํด๋ฆญํ•˜๊ณ  ํ•™์Šต ์™„๋ฃŒ๋ฅผ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.


LoRA ์‚ฌ์šฉ

  1. ํ•™์Šต ์™„๋ฃŒ ํ›„ Gradio๋ฅผ ์žฌ์‹œ์ž‘ํ•˜๊ณ  ๋ชจ๋ธ์„ ๋‹ค์‹œ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค (LM ๋ชจ๋ธ์€ ์„ ํƒํ•˜์ง€ ๋งˆ์„ธ์š”).
  2. ๋ชจ๋ธ ์ดˆ๊ธฐํ™” ์™„๋ฃŒ ํ›„, ํ•™์Šต๋œ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
  3. ์Œ์•… ์ƒ์„ฑ์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

์ถ•ํ•˜ํ•ฉ๋‹ˆ๋‹ค! LoRA ํ•™์Šต์˜ ์ „์ฒด ๊ณผ์ •์„ ์™„๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค.


๊ณ ๊ธ‰ ํ•™์Šต: Side-Step

LoRA ํ•™์Šต์„ ๋” ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด โ€” ์ˆ˜์ •๋œ ํƒ€์ž„์Šคํ… ์ƒ˜ํ”Œ๋ง, LoKR ์–ด๋Œ‘ํ„ฐ, CLI ๊ธฐ๋ฐ˜ ์›Œํฌํ”Œ๋กœ์šฐ, VRAM ์ตœ์ ํ™”, ๊ทธ๋ž˜๋””์–ธํŠธ ๊ฐ๋„ ๋ถ„์„ ๋“ฑ โ€” ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ๊ฐœ๋ฐœํ•œ Side-Step ํˆดํ‚ท์ด ๊ณ ๊ธ‰ ๋Œ€์•ˆ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฌธ์„œ๋Š” ์ด ์ €์žฅ์†Œ์˜ docs/sidestep/ ๋””๋ ‰ํ† ๋ฆฌ์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์ œ ์„ค๋ช…
Getting Started ์„ค์น˜, ์‚ฌ์ „ ์š”๊ตฌ์‚ฌํ•ญ, ์ฒซ ์‹คํ–‰ ์„ค์ •
End-to-End Tutorial ์›๋ณธ ์˜ค๋””์˜ค์—์„œ ์ƒ์„ฑ๊นŒ์ง€ ์ „์ฒด ๊ณผ์ • ์•ˆ๋‚ด
Dataset Preparation JSON ์Šคํ‚ค๋งˆ, ์˜ค๋””์˜ค ํ˜•์‹, ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ•„๋“œ, ์ปค์Šคํ…€ ํƒœ๊ทธ
Training Guide LoRA vs LoKR, ์ˆ˜์ • ๋ชจ๋“œ vs ๋ฐ”๋‹๋ผ ๋ชจ๋“œ, ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ€์ด๋“œ
Using Your Adapter ์ถœ๋ ฅ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ, Gradio์—์„œ ๋กœ๋“œ, LoKR ์ œํ•œ์‚ฌํ•ญ
VRAM Optimization Guide VRAM ์ตœ์ ํ™” ์ „๋žต ๋ฐ GPU ํ‹ฐ์–ด๋ณ„ ์„ค์ •
Estimation Guide ํƒ€๊ฒŸ ํ•™์Šต์„ ์œ„ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ ๊ฐ๋„ ๋ถ„์„
Shift and Timestep Sampling ํ•™์Šต ํƒ€์ž„์Šคํ… ์ž‘๋™ ์›๋ฆฌ์™€ Side-Step์˜ ์ฐจ์ด์ 
Preset Management ๋‚ด์žฅ ํ”„๋ฆฌ์…‹, ์ €์žฅ/๋กœ๋“œ/๊ฐ€์ ธ์˜ค๊ธฐ/๋‚ด๋ณด๋‚ด๊ธฐ
The Settings Wizard ์œ„์ž๋“œ ์„ค์ • ์ „์ฒด ์ฐธ์กฐ
Model Management ์ฒดํฌํฌ์ธํŠธ ๊ตฌ์กฐ ๋ฐ ํŒŒ์ธํŠœ๋‹ ๋ชจ๋ธ ์ง€์›
Windows Notes Windows ์ „์šฉ ์„ค์ • ๋ฐ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•