Upload 4 files

Browse files

Files changed (4) hide show

README.md +331 -3
config.json +29 -0
generation_config.json +6 -0
tokenizer.model +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,331 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+tags:
+- music
+- text-generation
+- transformers
+pipeline_tag: text-generation
+library_name: transformers
+---
+# Stage 2 Model
+# ScrapeGoatMusic Generation API
+A music generation system powered by ScrapeGoatMusic, optimized for NVIDIA H100 GPUs with FastAPI integration.
+## System Requirements
+- NVIDIA H100 GPU
+- CUDA 12.0 or higher
+- Python 3.8
+- 32GB+ RAM
+- Ubuntu 22.04 LTS or higher
+## Installation
+1. Create and activate a conda environment:
+```bash
+conda create -n ScrapeGoatMusic python=3.8
+conda activate ScrapeGoatMusic
+```
+2. Install PyTorch with CUDA support:
+```bash
+conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
+```
+3. Install dependencies:
+```bash
+pip install descript-audio-codec
+pip install npy_append_array soundfile
+pip install fastapi uvicorn python-multipart
+pip install flash-attn --no-build-isolation
+```
+4. Clone and install RepCodec:
+```bash
+cd inference/xcodec_mini_infer
+git clone https://github.com/mct10/RepCodec.git
+cd RepCodec
+pip install .
+```
+5. Download required model files:
+```bash
+# Download models from Hugging Face
+git lfs install
+cd inference
+git clone https://huggingface.co/Nathan9/xcodec_mini_infer
+```
+## API Setup
+1. Create a new file `api.py`:
+```python
+from fastapi import FastAPI, UploadFile, File, Form
+from fastapi.responses import FileResponse
+import uvicorn
+import torch
+import os
+import argparse
+from pathlib import Path
+import uuid
+from typing import Optional
+app = FastAPI(title="ScrapeGoatMusic Generation API")
+# Initialize models and configurations
+def init_models():
+    parser = argparse.ArgumentParser()
+    # Add all your existing arguments here
+    args = parser.parse_args([])
+    args.stage1_model = "scrapegoat/ScrapeGoat-Music-Stage1"
+    args.stage2_model = "scrapegoat/ScrapeGoat-Music-Stage1"
+    args.max_new_tokens = 3000
+    args.run_n_segments = 2
+    args.stage2_batch_size = 4
+    args.output_dir = "./output"
+    args.cuda_idx = 0
+    # Add other default arguments
+    return args
+@app.on_event("startup")
+async def startup_event():
+    global args
+    args = init_models()
+    os.makedirs(args.output_dir, exist_ok=True)
+@app.post("/generate")
+async def generate_music(
+    genre_file: UploadFile = File(...),
+    lyrics_file: UploadFile = File(...),
+    audio_prompt: Optional[UploadFile] = File(None),
+    prompt_start_time: float = Form(0.0),
+    prompt_end_time: float = Form(30.0)
+):
+    # Create unique session ID
+    session_id = str(uuid.uuid4())
+    session_dir = Path(args.output_dir) / session_id
+    os.makedirs(session_dir, exist_ok=True)
+    # Save uploaded files
+    genre_path = session_dir / "genre.txt"
+    lyrics_path = session_dir / "lyrics.txt"
+    with open(genre_path, "wb") as f:
+        f.write(await genre_file.read())
+    with open(lyrics_path, "wb") as f:
+        f.write(await lyrics_file.read())
+    # Handle optional audio prompt
+    audio_prompt_path = None
+    if audio_prompt:
+        audio_prompt_path = session_dir / "audio_prompt.wav"
+        with open(audio_prompt_path, "wb") as f:
+            f.write(await audio_prompt.read())
+    # Run inference
+    try:
+        # Import your inference code here
+        from infer import run_inference
+        output_path = run_inference(
+            args,
+            str(genre_path),
+            str(lyrics_path),
+            str(audio_prompt_path) if audio_prompt_path else None,
+            prompt_start_time,
+            prompt_end_time
+        )
+        return FileResponse(
+            output_path,
+            media_type="audio/mpeg",
+            filename=f"generated_music_{session_id}.mp3"
+        )
+    except Exception as e:
+        return {"error": str(e)}
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+```
+2. Create a new file `infer.py` with your existing inference code, modified to be imported as a module.
+## Running the API
+1. Start the API server:
+```bash
+python api.py
+```
+2. The API will be available at `http://localhost:8000`
+## API Endpoints
+### POST /generate
+Generates music based on provided genre and lyrics.
+**Parameters:**
+- `genre_file`: Text file containing genre tags (Required)
+- `lyrics_file`: Text file containing lyrics (Required)
+- `audio_prompt`: Audio file for prompt (Optional)
+- `prompt_start_time`: Start time for audio prompt (Default: 0.0)
+- `prompt_end_time`: End time for audio prompt (Default: 30.0)
+**Example using curl:**
+```bash
+curl -X POST "http://localhost:8000/generate" \
+  -H "accept: application/json" \
+  -H "Content-Type: multipart/form-data" \
+  -F "genre_file=@/path/to/genre.txt" \
+  -F "lyrics_file=@/path/to/lyrics.txt" \
+  -F "prompt_start_time=0.0" \
+  -F "prompt_end_time=30.0"
+```
+**Example genre.txt format:**
+```
+instrumental pop energetic female vocals
+```
+**Example lyrics.txt format:**
+```
+[verse]
+Your lyrics here
+[chorus]
+Your chorus here
+```
+## H100 Optimization
+1. Enable Flash Attention:
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    stage1_model,
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2"
+)
+```
+2. Optimize memory usage:
+```python
+# Add to your inference configuration
+torch.cuda.set_device(0)  # Use first H100
+torch.backends.cudnn.benchmark = True
+```
+3. For multi-GPU setup, modify `cuda_idx` in the API configuration.
+## Monitoring
+The API includes Swagger documentation at `http://localhost:8000/docs` for testing and monitoring endpoints.
+## Troubleshooting
+1. CUDA Out of Memory:
+- Reduce `stage2_batch_size`
+- Adjust `max_new_tokens`
+- Use gradient checkpointing
+2. Audio Quality Issues:
+- Check input audio format (16kHz, mono)
+- Verify genre tags format
+- Ensure lyrics follow the correct structure
+## Training
+This model was created through a multi-stage training process optimized for music generation. You can further fine-tune the model on your own data using the following steps:
+### Data Preparation
+1. Prepare your training data using the provided script:
+```bash
+python prepare_training_data.py
+```
+The script expects the following directory structure:
+```
+training_data/
+├── audio_tracks/      # 16kHz mono WAV files
+├── lyrics/           # Corresponding lyrics files
+└── genres/          # Genre tag files
+```
+### Training Requirements
+- NVIDIA H100 GPU (recommended)
+- 32GB+ GPU memory
+- Training dataset with:
+  - High-quality audio files (16kHz mono)
+  - Aligned lyrics in structured format
+  - Genre annotations
+  - At least 10,000 samples recommended
+### Fine-tuning Steps
+1. Install additional training dependencies:
+```bash
+pip install accelerate datasets transformers
+```
+2. Prepare your configuration:
+```bash
+# For Stage 1 model (7B)
+export MODEL_PATH="Nathan9/ScrapeGoatMusic-s1-7B-anneal-en-cot"
+export OUTPUT_DIR="./fine_tuned_model_s1"
+# For Stage 2 model (1B)
+export MODEL_PATH="Nathan9/ScrapeGoatMusic-s2-1B-general"
+export OUTPUT_DIR="./fine_tuned_model_s2"
+```
+3. Start training:
+```bash
+python train.py \
+    --model_name_or_path $MODEL_PATH \
+    --output_dir $OUTPUT_DIR \
+    --num_train_epochs 3 \
+    --per_device_train_batch_size 4 \
+    --gradient_accumulation_steps 4 \
+    --learning_rate 1e-5 \
+    --warmup_steps 500 \
+    --logging_steps 100 \
+    --save_steps 1000 \
+    --evaluation_strategy steps \
+    --load_best_model_at_end \
+    --gradient_checkpointing true
+```
+### Training Tips
+1. Stage 1 Model:
+- Use larger batch sizes (8-16) for better convergence
+- Enable gradient checkpointing for memory efficiency
+- Start with a lower learning rate (1e-5)
+- Train for at least 3 epochs
+2. Stage 2 Model:
+- Use smaller batch sizes (4-8)
+- Higher learning rate possible (2e-5)
+- Shorter training time needed
+- Focus on audio quality metrics
+3. Monitoring:
+- Use Weights & Biases for training visualization
+- Monitor loss curves for convergence
+- Validate generation quality periodically
+- Check for overfit on validation set
+4. Performance Optimization:
+- Enable Flash Attention during training
+- Use mixed precision training (bf16)
+- Distribute training across multiple GPUs if available
+- Implement proper gradient clipping
+## License
+FULL ACCESS, ENJOY

config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "_name_or_path": "None",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 5504,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 16,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.42.0",
+  "use_cache": true,
+  "vocab_size": 83840
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.42.0"
+}

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee5c7cbf32da93989f14d9ba635e3e1d1ab2cc88a92908a5ed0f149375f6ee49
+size 1761962