Nathan9 commited on
Commit
ab56ef6
·
verified ·
1 Parent(s): 97fcb0b

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +331 -3
  2. config.json +29 -0
  3. generation_config.json +6 -0
  4. tokenizer.model +3 -0
README.md CHANGED
@@ -1,3 +1,331 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - music
7
+ - text-generation
8
+ - transformers
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ ---
12
+
13
+ # Stage 2 Model
14
+
15
+ # ScrapeGoatMusic Generation API
16
+
17
+ A music generation system powered by ScrapeGoatMusic, optimized for NVIDIA H100 GPUs with FastAPI integration.
18
+
19
+ ## System Requirements
20
+
21
+ - NVIDIA H100 GPU
22
+ - CUDA 12.0 or higher
23
+ - Python 3.8
24
+ - 32GB+ RAM
25
+ - Ubuntu 22.04 LTS or higher
26
+
27
+ ## Installation
28
+
29
+ 1. Create and activate a conda environment:
30
+ ```bash
31
+ conda create -n ScrapeGoatMusic python=3.8
32
+ conda activate ScrapeGoatMusic
33
+ ```
34
+
35
+ 2. Install PyTorch with CUDA support:
36
+ ```bash
37
+ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
38
+ ```
39
+
40
+ 3. Install dependencies:
41
+ ```bash
42
+ pip install descript-audio-codec
43
+ pip install npy_append_array soundfile
44
+ pip install fastapi uvicorn python-multipart
45
+ pip install flash-attn --no-build-isolation
46
+ ```
47
+
48
+ 4. Clone and install RepCodec:
49
+ ```bash
50
+ cd inference/xcodec_mini_infer
51
+ git clone https://github.com/mct10/RepCodec.git
52
+ cd RepCodec
53
+ pip install .
54
+ ```
55
+
56
+ 5. Download required model files:
57
+ ```bash
58
+ # Download models from Hugging Face
59
+ git lfs install
60
+ cd inference
61
+ git clone https://huggingface.co/Nathan9/xcodec_mini_infer
62
+ ```
63
+
64
+ ## API Setup
65
+
66
+ 1. Create a new file `api.py`:
67
+ ```python
68
+ from fastapi import FastAPI, UploadFile, File, Form
69
+ from fastapi.responses import FileResponse
70
+ import uvicorn
71
+ import torch
72
+ import os
73
+ import argparse
74
+ from pathlib import Path
75
+ import uuid
76
+ from typing import Optional
77
+
78
+ app = FastAPI(title="ScrapeGoatMusic Generation API")
79
+
80
+ # Initialize models and configurations
81
+ def init_models():
82
+ parser = argparse.ArgumentParser()
83
+ # Add all your existing arguments here
84
+ args = parser.parse_args([])
85
+ args.stage1_model = "scrapegoat/ScrapeGoat-Music-Stage1"
86
+ args.stage2_model = "scrapegoat/ScrapeGoat-Music-Stage1"
87
+ args.max_new_tokens = 3000
88
+ args.run_n_segments = 2
89
+ args.stage2_batch_size = 4
90
+ args.output_dir = "./output"
91
+ args.cuda_idx = 0
92
+ # Add other default arguments
93
+ return args
94
+
95
+ @app.on_event("startup")
96
+ async def startup_event():
97
+ global args
98
+ args = init_models()
99
+ os.makedirs(args.output_dir, exist_ok=True)
100
+
101
+ @app.post("/generate")
102
+ async def generate_music(
103
+ genre_file: UploadFile = File(...),
104
+ lyrics_file: UploadFile = File(...),
105
+ audio_prompt: Optional[UploadFile] = File(None),
106
+ prompt_start_time: float = Form(0.0),
107
+ prompt_end_time: float = Form(30.0)
108
+ ):
109
+ # Create unique session ID
110
+ session_id = str(uuid.uuid4())
111
+ session_dir = Path(args.output_dir) / session_id
112
+ os.makedirs(session_dir, exist_ok=True)
113
+
114
+ # Save uploaded files
115
+ genre_path = session_dir / "genre.txt"
116
+ lyrics_path = session_dir / "lyrics.txt"
117
+
118
+ with open(genre_path, "wb") as f:
119
+ f.write(await genre_file.read())
120
+ with open(lyrics_path, "wb") as f:
121
+ f.write(await lyrics_file.read())
122
+
123
+ # Handle optional audio prompt
124
+ audio_prompt_path = None
125
+ if audio_prompt:
126
+ audio_prompt_path = session_dir / "audio_prompt.wav"
127
+ with open(audio_prompt_path, "wb") as f:
128
+ f.write(await audio_prompt.read())
129
+
130
+ # Run inference
131
+ try:
132
+ # Import your inference code here
133
+ from infer import run_inference
134
+ output_path = run_inference(
135
+ args,
136
+ str(genre_path),
137
+ str(lyrics_path),
138
+ str(audio_prompt_path) if audio_prompt_path else None,
139
+ prompt_start_time,
140
+ prompt_end_time
141
+ )
142
+
143
+ return FileResponse(
144
+ output_path,
145
+ media_type="audio/mpeg",
146
+ filename=f"generated_music_{session_id}.mp3"
147
+ )
148
+ except Exception as e:
149
+ return {"error": str(e)}
150
+
151
+ if __name__ == "__main__":
152
+ uvicorn.run(app, host="0.0.0.0", port=8000)
153
+ ```
154
+
155
+ 2. Create a new file `infer.py` with your existing inference code, modified to be imported as a module.
156
+
157
+ ## Running the API
158
+
159
+ 1. Start the API server:
160
+ ```bash
161
+ python api.py
162
+ ```
163
+
164
+ 2. The API will be available at `http://localhost:8000`
165
+
166
+ ## API Endpoints
167
+
168
+ ### POST /generate
169
+ Generates music based on provided genre and lyrics.
170
+
171
+ **Parameters:**
172
+ - `genre_file`: Text file containing genre tags (Required)
173
+ - `lyrics_file`: Text file containing lyrics (Required)
174
+ - `audio_prompt`: Audio file for prompt (Optional)
175
+ - `prompt_start_time`: Start time for audio prompt (Default: 0.0)
176
+ - `prompt_end_time`: End time for audio prompt (Default: 30.0)
177
+
178
+ **Example using curl:**
179
+ ```bash
180
+ curl -X POST "http://localhost:8000/generate" \
181
+ -H "accept: application/json" \
182
+ -H "Content-Type: multipart/form-data" \
183
+ -F "genre_file=@/path/to/genre.txt" \
184
+ -F "lyrics_file=@/path/to/lyrics.txt" \
185
+ -F "prompt_start_time=0.0" \
186
+ -F "prompt_end_time=30.0"
187
+ ```
188
+
189
+ **Example genre.txt format:**
190
+ ```
191
+ instrumental pop energetic female vocals
192
+ ```
193
+
194
+ **Example lyrics.txt format:**
195
+ ```
196
+ [verse]
197
+ Your lyrics here
198
+ [chorus]
199
+ Your chorus here
200
+ ```
201
+
202
+ ## H100 Optimization
203
+
204
+ 1. Enable Flash Attention:
205
+ ```python
206
+ model = AutoModelForCausalLM.from_pretrained(
207
+ stage1_model,
208
+ torch_dtype=torch.bfloat16,
209
+ attn_implementation="flash_attention_2"
210
+ )
211
+ ```
212
+
213
+ 2. Optimize memory usage:
214
+ ```python
215
+ # Add to your inference configuration
216
+ torch.cuda.set_device(0) # Use first H100
217
+ torch.backends.cudnn.benchmark = True
218
+ ```
219
+
220
+ 3. For multi-GPU setup, modify `cuda_idx` in the API configuration.
221
+
222
+ ## Monitoring
223
+
224
+ The API includes Swagger documentation at `http://localhost:8000/docs` for testing and monitoring endpoints.
225
+
226
+ ## Troubleshooting
227
+
228
+ 1. CUDA Out of Memory:
229
+ - Reduce `stage2_batch_size`
230
+ - Adjust `max_new_tokens`
231
+ - Use gradient checkpointing
232
+
233
+ 2. Audio Quality Issues:
234
+ - Check input audio format (16kHz, mono)
235
+ - Verify genre tags format
236
+ - Ensure lyrics follow the correct structure
237
+
238
+ ## Training
239
+
240
+ This model was created through a multi-stage training process optimized for music generation. You can further fine-tune the model on your own data using the following steps:
241
+
242
+ ### Data Preparation
243
+
244
+ 1. Prepare your training data using the provided script:
245
+ ```bash
246
+ python prepare_training_data.py
247
+ ```
248
+
249
+ The script expects the following directory structure:
250
+ ```
251
+ training_data/
252
+ ├── audio_tracks/ # 16kHz mono WAV files
253
+ ├── lyrics/ # Corresponding lyrics files
254
+ └── genres/ # Genre tag files
255
+ ```
256
+
257
+ ### Training Requirements
258
+
259
+ - NVIDIA H100 GPU (recommended)
260
+ - 32GB+ GPU memory
261
+ - Training dataset with:
262
+ - High-quality audio files (16kHz mono)
263
+ - Aligned lyrics in structured format
264
+ - Genre annotations
265
+ - At least 10,000 samples recommended
266
+
267
+ ### Fine-tuning Steps
268
+
269
+ 1. Install additional training dependencies:
270
+ ```bash
271
+ pip install accelerate datasets transformers
272
+ ```
273
+
274
+ 2. Prepare your configuration:
275
+ ```bash
276
+ # For Stage 1 model (7B)
277
+ export MODEL_PATH="Nathan9/ScrapeGoatMusic-s1-7B-anneal-en-cot"
278
+ export OUTPUT_DIR="./fine_tuned_model_s1"
279
+
280
+ # For Stage 2 model (1B)
281
+ export MODEL_PATH="Nathan9/ScrapeGoatMusic-s2-1B-general"
282
+ export OUTPUT_DIR="./fine_tuned_model_s2"
283
+ ```
284
+
285
+ 3. Start training:
286
+ ```bash
287
+ python train.py \
288
+ --model_name_or_path $MODEL_PATH \
289
+ --output_dir $OUTPUT_DIR \
290
+ --num_train_epochs 3 \
291
+ --per_device_train_batch_size 4 \
292
+ --gradient_accumulation_steps 4 \
293
+ --learning_rate 1e-5 \
294
+ --warmup_steps 500 \
295
+ --logging_steps 100 \
296
+ --save_steps 1000 \
297
+ --evaluation_strategy steps \
298
+ --load_best_model_at_end \
299
+ --gradient_checkpointing true
300
+ ```
301
+
302
+ ### Training Tips
303
+
304
+ 1. Stage 1 Model:
305
+ - Use larger batch sizes (8-16) for better convergence
306
+ - Enable gradient checkpointing for memory efficiency
307
+ - Start with a lower learning rate (1e-5)
308
+ - Train for at least 3 epochs
309
+
310
+ 2. Stage 2 Model:
311
+ - Use smaller batch sizes (4-8)
312
+ - Higher learning rate possible (2e-5)
313
+ - Shorter training time needed
314
+ - Focus on audio quality metrics
315
+
316
+ 3. Monitoring:
317
+ - Use Weights & Biases for training visualization
318
+ - Monitor loss curves for convergence
319
+ - Validate generation quality periodically
320
+ - Check for overfit on validation set
321
+
322
+ 4. Performance Optimization:
323
+ - Enable Flash Attention during training
324
+ - Use mixed precision training (bf16)
325
+ - Distribute training across multiple GPUs if available
326
+ - Implement proper gradient clipping
327
+
328
+ ## License
329
+
330
+ FULL ACCESS, ENJOY
331
+
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "None",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5504,
14
+ "max_position_embeddings": 8192,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 32,
19
+ "num_key_value_heads": 16,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 10000,
24
+ "tie_word_embeddings": false,
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.42.0",
27
+ "use_cache": true,
28
+ "vocab_size": 83840
29
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.42.0"
6
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee5c7cbf32da93989f14d9ba635e3e1d1ab2cc88a92908a5ed0f149375f6ee49
3
+ size 1761962