diff --git a/.gitattributes b/.gitattributes
index a6344aac8c09253b3b630fb776ae94478aa0275b..9829846e08697549a46bdb02ed7ea19a4cdd15c9 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -33,3 +33,14 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/intro.jpg filter=lfs diff=lfs merge=lfs -text
+examples/muse_outputs/main_0.6b_0.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/muse_outputs/main_0.6b_1.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/muse_outputs/main_1.7b_2.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/muse_outputs/main_8b_3.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/train_inputs/suno_cn_0.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/train_inputs/suno_cn_1.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/train_inputs/suno_en_2.mp3 filter=lfs diff=lfs merge=lfs -text
+examples/train_inputs/suno_en_3.mp3 filter=lfs diff=lfs merge=lfs -text
+train/train_demo.jsonl filter=lfs diff=lfs merge=lfs -text
+train/val.jsonl filter=lfs diff=lfs merge=lfs -text
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..75b23f04bca5a888e0b9c5dc7e7a33c3954c85ab
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 yuhui1038
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
index 64a9d9aa85fd41244ff16f40e3e5ed10ef93e792..4983acdd85771259f624423d2e1b7a0605e85d20 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,80 @@
----
-title: Muse
-emoji: 🐢
-colorFrom: red
-colorTo: pink
-sdk: gradio
-sdk_version: 6.3.0
-app_file: app.py
-pinned: false
----
-
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
+
+
+
+This repository is the official repository for "Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control". In this repository, we provide the Muse model, training and inference scripts, pretrained checkpoints, and evaluation pipelines.
+
+## News and Updates
+
+* **2026.01.11 🔥**: We are excited to announce that all datasets and models are now fully open-sourced! 🎶 The complete training dataset (116k songs), pretrained model weights, training and evaluation code, and data pipeline are publicly available.
+
+## Installation
+
+**Requirements**: Python 3.10 is required.
+
+To set up the environment for Muse:
+
+- **For training**: Install the training framework:
+ ```bash
+ pip install ms-swift -U
+ ```
+- **For inference**: Install vLLM:
+ ```bash
+ pip install vllm
+ ```
+- **For audio encoding/decoding**: Some dependencies (e.g., `av`) require system-level packages. On Ubuntu/Debian, install FFmpeg 4.4+ first:
+ ```bash
+ sudo apt-get update
+ sudo apt-get install -y software-properties-common
+ sudo add-apt-repository ppa:savoury1/ffmpeg4 -y
+ sudo apt-get update
+ sudo apt-get install -y pkg-config ffmpeg libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
+ ```
+ We recommend creating a new conda environment with Python 3.10. **Note**: Since `omegaconf==2.0.6` is required and has compatibility issues with pip 24.1+, you need to downgrade pip first:
+ ```bash
+ pip install "pip<24.1"
+ ```
+ Then install dependencies:
+ ```bash
+ pip install --default-timeout=1000 -r requirements_mucodec.txt
+ ```
+ For more details, please refer to the [MuCodec](https://github.com/tencent-ailab/MuCodec) official repository.
+
+- **For data pipeline and evaluation**: If you need to run data processing scripts (lyrics generation, metadata processing) or evaluation scripts, install additional dependencies:
+ ```bash
+ pip install -r requirements_data_eval.txt
+ ```
+
+## Repository Structure
+
+This repository contains the following main directories:
+
+- **`train/`**: Training scripts and utilities for fine-tuning the Muse model. See [`train/README.md`](train/README.md) for details.
+- **`infer/`**: Inference scripts for generating music with the Muse model. See [`infer/README.md`](infer/README.md) for details.
+- **`eval_pipeline/`**: Evaluation scripts for assessing model performance (Mulan-T, PER, AudioBox, SongEval, etc.).
+- **`data_pipeline/`**: Scripts for building and processing training data, including lyrics generation, metadata processing, and music generation utilities.
+
+## Model Architecture
+
+
+
+
+
+## Acknowledgments
+
+We thank [Qwen3](https://github.com/QwenLM/Qwen3) for providing the base language model, [ms-swift](https://github.com/modelscope/ms-swift) for the training framework, and [MuCodec](https://github.com/tencent-ailab/MuCodec) for discrete audio tokenization.
+
+## Citation
+
+If you find our work useful, please cite our paper:
+
+```bibtex
+@article{jiang2026muse,
+ title={Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control},
+ author={Jiang, Changhao and Chen, Jiahao and Xiang, Zhenghao and Yang, Zhixiong and Wang, Hanchen and Zhuang, Jiabao and Che, Xinmeng and Sun, Jiajun and Li, Hui and Cao, Yifei and others},
+ journal={arXiv preprint arXiv:2601.03973},
+ year={2026}
+}
+```
diff --git a/assets/intro.jpg b/assets/intro.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bb1c0865bcda69969332a55be75355880f16a3d3
--- /dev/null
+++ b/assets/intro.jpg
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8dccf43f652372948e48c47afabb0d979823788ae00a4019b8d8d215821284eb
+size 165507
diff --git a/baseline_generate/ace_step/convert.py b/baseline_generate/ace_step/convert.py
new file mode 100644
index 0000000000000000000000000000000000000000..6279cc59f00259d83a6d096bad1f80ff3f1da051
--- /dev/null
+++ b/baseline_generate/ace_step/convert.py
@@ -0,0 +1,160 @@
+"""
+Convert data to ACE-STEP acceptable format
+"""
+
+import re
+import json
+import random
+from tqdm import tqdm
+
+random.seed(42)
+
+def load_jsonl(path:str) -> list[dict]:
+ data = []
+ with open(path, 'r') as file:
+ for line in tqdm(file, desc=f"Loading {path}"):
+ data.append(json.loads(line))
+ return data
+
+def save_jsonl(data:list, path:str):
+ with open(path, 'w', encoding='utf-8') as file:
+ for ele in tqdm(data, desc=f"Saving {path}"):
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+
+START_STR = "Please generate a song in the following style:"
+END_STR = "\nNext, I will tell you the requirements and lyrics"
+
+def process_tag(content:str) -> str:
+ """Process segment label"""
+ # Extract label
+ end = content.find("[desc:")
+ tag = content[1:end-1]
+ # Lowercase & remove numbers & remove parentheses
+ tag = tag.lower()
+ tag = re.sub(r'\d+', '', tag)
+ tag = re.sub(r'\([^)]*\)', '', tag).strip()
+ if tag == "pre-chorus":
+ tag = "chorus"
+ return f"[{tag}]"
+
+def process_lyrics(content:str) -> str:
+ """Process segment lyrics"""
+ # Extract lyrics
+ start = content.find("[lyrics:\n")
+ if start == -1:
+ return ""
+ end = content.find("][phoneme:")
+ lyric = content[start+len("[lyrics:\n"):end]
+
+ # Punctuation conversion
+ pattern = r'[,。",:;&—‘\'.\]\[()?\n-]'
+ lyric = re.sub(pattern, '\n', lyric)
+ while lyric.find("\n\n") != -1:
+ lyric = lyric.replace("\n\n", "\n")
+ if lyric.endswith('\n'):
+ lyric = lyric[:-1]
+ return lyric
+
+def has_chinese(text) -> bool:
+ for char in text:
+ if '\u4e00' <= char <= '\u9fff': # Basic Chinese characters
+ return True
+ return False
+
+def process_duration(lyrics:str):
+ if has_chinese(lyrics):
+ lyrics = lyrics.replace("\n", "")
+ length = len(lyrics)
+ else:
+ lyrics = lyrics.replace("\n", " ")
+ length = len(lyrics.split())
+ duration = random.randint(int(length * 0.4), int(length * 0.7))
+ return duration
+
+def process_one(messages:list[dict]):
+ """Process a conversation messages into input format, return gt_lyric and descriptions"""
+ # Overall style
+ style:str = messages[0]['content']
+ start = style.find(START_STR)
+ end = style.find(END_STR)
+ descriptions = style[start+len(START_STR):end]
+
+ # Line-by-line lyrics
+ all_lyrics = "[intro]\n\n"
+ pure_lyrics = ""
+ for message in messages[1:]:
+ if message['role'] == "assistant":
+ continue
+ content = message['content']
+ # Segment label
+ tag = process_tag(content)
+ # Segment lyrics
+ lyric = process_lyrics(content)
+ all_lyrics += f"{tag}\n{lyric}\n\n"
+ pure_lyrics += lyric
+ all_lyrics = all_lyrics[:-2]
+
+ # Duration
+ duration = process_duration(pure_lyrics)
+
+ obj = {
+ "prompt": descriptions,
+ "lyrics": all_lyrics,
+ "audio_duration": duration,
+ "infer_step": 60,
+ "guidance_scale": 15,
+ "scheduler_type": "euler",
+ "cfg_type": "apg",
+ "omega_scale": 10,
+ "guidance_interval": 0.5,
+ "guidance_interval_decay": 0,
+ "min_guidance_scale": 3,
+ "use_erg_tag": True,
+ "use_erg_lyric": True,
+ "use_erg_diffusion": True,
+ "oss_steps": [],
+ "actual_seeds": [
+ 3299954530
+ ]
+ }
+ return obj
+
+def main():
+ path = "xxx/ACE-Step/data/inputs/messages.jsonl"
+ dataset = load_jsonl(path)
+
+ for id, ele in tqdm(enumerate(dataset), desc="Processing"):
+ messages = ele['messages']
+ data = process_one(messages)
+ path = f"./data/inputs/test_{id}.jsonl"
+ with open(path, 'w', encoding='utf-8') as file:
+ json.dump(data, file, ensure_ascii=False, indent=4)
+
+def load_jsonl(path):
+ dataset = []
+ with open(path, 'r') as file:
+ for line in file:
+ dataset.append(json.loads(line))
+ return dataset
+
+def save_jsonl(dataset, path):
+ with open(path, 'w', encoding='utf-8') as file:
+ for ele in dataset:
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+
+if __name__ == "__main__":
+ # main()
+ dataset = load_jsonl("./data/outputs/lyrics_params.jsonl")
+ for ele in dataset:
+ path = ele['audio_path']
+ ele['extra'] = int(path[len("./data/outputs/test_"):-len(".wav")])
+ sorted_data = sorted(dataset, key=lambda x: x['extra'])
+
+ save_path = "./data/outputs/lyrics_params_.jsonl"
+ with open(save_path, 'w', encoding='utf-8') as file:
+ for ele in sorted_data:
+ del ele['extra']
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
diff --git a/baseline_generate/ace_step/infer.py b/baseline_generate/ace_step/infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..092f0ccbb1f57b85616216751111805b290e5d88
--- /dev/null
+++ b/baseline_generate/ace_step/infer.py
@@ -0,0 +1,122 @@
+import click
+import os
+import json
+from acestep.pipeline_ace_step import ACEStepPipeline
+from acestep.data_sampler import DataSampler
+
+
+def sample_data(json_data):
+ return (
+ json_data["audio_duration"],
+ json_data["prompt"],
+ json_data["lyrics"],
+ json_data["infer_step"],
+ json_data["guidance_scale"],
+ json_data["scheduler_type"],
+ json_data["cfg_type"],
+ json_data["omega_scale"],
+ ", ".join(map(str, json_data["actual_seeds"])),
+ json_data["guidance_interval"],
+ json_data["guidance_interval_decay"],
+ json_data["min_guidance_scale"],
+ json_data["use_erg_tag"],
+ json_data["use_erg_lyric"],
+ json_data["use_erg_diffusion"],
+ ", ".join(map(str, json_data["oss_steps"])),
+ json_data["guidance_scale_text"] if "guidance_scale_text" in json_data else 0.0,
+ (
+ json_data["guidance_scale_lyric"]
+ if "guidance_scale_lyric" in json_data
+ else 0.0
+ ),
+ )
+
+
+@click.command()
+@click.option(
+ "--checkpoint_path", type=str, default="", help="Path to the checkpoint directory"
+)
+@click.option("--bf16", type=bool, default=True, help="Whether to use bfloat16")
+@click.option(
+ "--torch_compile", type=bool, default=False, help="Whether to use torch compile"
+)
+@click.option(
+ "--cpu_offload", type=bool, default=False, help="Whether to use CPU offloading (only load current stage's model to GPU)"
+)
+@click.option(
+ "--overlapped_decode", type=bool, default=False, help="Whether to use overlapped decoding (run dcae and vocoder using sliding windows)"
+)
+@click.option("--device_id", type=int, default=0, help="Device ID to use")
+@click.option("--output_path", type=str, default=None, help="Path to save the output")
+def main(checkpoint_path, bf16, torch_compile, cpu_offload, overlapped_decode, device_id, output_path):
+ # os.environ["CUDA_VISIBLE_DEVICES"] = str(device_id)
+
+ model_demo = ACEStepPipeline(
+ checkpoint_dir=checkpoint_path,
+ dtype="bfloat16" if bf16 else "float32",
+ torch_compile=torch_compile,
+ cpu_offload=cpu_offload,
+ overlapped_decode=overlapped_decode
+ )
+ print(model_demo)
+
+ data_sampler = DataSampler()
+
+ inputs_dir = "./data/inputs"
+ for id, name in enumerate(os.listdir(inputs_dir)):
+ if not name.startswith("test"):
+ continue
+ path = os.path.join(inputs_dir, name)
+ with open(path, 'r') as file:
+ json_data = json.load(file)
+ json_data = sample_data(json_data)
+
+ pure_name = os.path.splitext(name)[0]
+ output_path = f"./data/outputs/{pure_name}.wav"
+ if os.path.exists(output_path):
+ continue
+ (
+ audio_duration,
+ prompt,
+ lyrics,
+ infer_step,
+ guidance_scale,
+ scheduler_type,
+ cfg_type,
+ omega_scale,
+ manual_seeds,
+ guidance_interval,
+ guidance_interval_decay,
+ min_guidance_scale,
+ use_erg_tag,
+ use_erg_lyric,
+ use_erg_diffusion,
+ oss_steps,
+ guidance_scale_text,
+ guidance_scale_lyric,
+ ) = json_data
+
+ model_demo(
+ audio_duration=audio_duration,
+ prompt=prompt,
+ lyrics=lyrics,
+ infer_step=infer_step,
+ guidance_scale=guidance_scale,
+ scheduler_type=scheduler_type,
+ cfg_type=cfg_type,
+ omega_scale=omega_scale,
+ manual_seeds=manual_seeds,
+ guidance_interval=guidance_interval,
+ guidance_interval_decay=guidance_interval_decay,
+ min_guidance_scale=min_guidance_scale,
+ use_erg_tag=use_erg_tag,
+ use_erg_lyric=use_erg_lyric,
+ use_erg_diffusion=use_erg_diffusion,
+ oss_steps=oss_steps,
+ guidance_scale_text=guidance_scale_text,
+ guidance_scale_lyric=guidance_scale_lyric,
+ save_path=output_path,
+ )
+
+if __name__ == "__main__":
+ main()
diff --git a/baseline_generate/diffrhythm2/batch_inference.sh b/baseline_generate/diffrhythm2/batch_inference.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e3362e6e22c1bd752c2be62b93970ce7b0894c40
--- /dev/null
+++ b/baseline_generate/diffrhythm2/batch_inference.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+set -euo pipefail
+
+# Navigate to script directory to ensure relative paths are consistent
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+cd "$SCRIPT_DIR"
+
+SONG_DIR="$SCRIPT_DIR/example/zh_songs"
+
+if [ ! -d "$SONG_DIR" ]; then
+ echo "Song directory not found: $SONG_DIR"
+ exit 1
+fi
+
+# Collect all song_*.jsonl files
+shopt -s nullglob
+SONG_FILES=("$SONG_DIR"/song_*.jsonl)
+shopt -u nullglob
+
+if [ ${#SONG_FILES[@]} -eq 0 ]; then
+ echo "No song_*.jsonl files in song directory"
+ exit 0
+fi
+
+export PYTHONPATH="${PYTHONPATH:-}:${SCRIPT_DIR}"
+
+espeak-ng --version
+
+# Reproducibility settings:
+# - Fixed random seed SEED
+# - DO_SAMPLE=0 tries to follow deterministic path (including fixed style prompt cropping start)
+SEED="${SEED:-42}"
+DO_SAMPLE="${DO_SAMPLE:-0}"
+
+# Further reduce cuBLAS non-determinism (enable when needed; comment out if causes errors)
+export CUBLAS_WORKSPACE_CONFIG="${CUBLAS_WORKSPACE_CONFIG:-:4096:8}"
+
+for SONG_FILE in "${SONG_FILES[@]}"; do
+ SONG_NAME="$(basename "$SONG_FILE")"
+ INPUT_PATH="./example/zh_songs/${SONG_NAME}"
+ echo "=============================="
+ echo "Starting generation: ${SONG_NAME}"
+ CMD=(python inference.py
+ --repo-id ASLP-lab/DiffRhythm2
+ --output-dir ./results/zh
+ --input-jsonl "$INPUT_PATH"
+ --cfg-strength 3.0
+ --max-secs 285.0
+ --seed "$SEED"
+ )
+ if [ "$DO_SAMPLE" -eq 1 ]; then
+ CMD+=(--do-sample)
+ fi
+ "${CMD[@]}"
+done
+
+echo "All songs generation complete, processed ${#SONG_FILES[@]} songs."
diff --git a/baseline_generate/diffrhythm2/batch_inference_en.sh b/baseline_generate/diffrhythm2/batch_inference_en.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9ff57ba5268496e1b00e6de2f4cf9e0e0f88f5ca
--- /dev/null
+++ b/baseline_generate/diffrhythm2/batch_inference_en.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+set -euo pipefail
+
+# Navigate to script directory to ensure relative paths are consistent
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+cd "$SCRIPT_DIR"
+
+SONG_DIR="$SCRIPT_DIR/example/en_songs"
+
+if [ ! -d "$SONG_DIR" ]; then
+ echo "Song directory not found: $SONG_DIR"
+ exit 1
+fi
+
+# Collect all song_*.jsonl files
+shopt -s nullglob
+SONG_FILES=("$SONG_DIR"/song_*.jsonl)
+shopt -u nullglob
+
+if [ ${#SONG_FILES[@]} -eq 0 ]; then
+ echo "No song_*.jsonl files in song directory"
+ exit 0
+fi
+
+export PYTHONPATH="${PYTHONPATH:-}:${SCRIPT_DIR}"
+
+espeak-ng --version
+
+# Reproducibility settings:
+# - Fixed random seed SEED
+# - DO_SAMPLE=0 tries to follow deterministic path (including fixed style prompt cropping start)
+SEED="${SEED:-42}"
+DO_SAMPLE="${DO_SAMPLE:-0}"
+
+# Further reduce cuBLAS non-determinism (enable when needed; comment out if causes errors)
+export CUBLAS_WORKSPACE_CONFIG="${CUBLAS_WORKSPACE_CONFIG:-:4096:8}"
+
+for SONG_FILE in "${SONG_FILES[@]}"; do
+ SONG_NAME="$(basename "$SONG_FILE")"
+ INPUT_PATH="./example/en_songs/${SONG_NAME}"
+ echo "=============================="
+ echo "Starting generation: ${SONG_NAME}"
+ CMD=(python inference.py
+ --repo-id ASLP-lab/DiffRhythm2
+ --output-dir ./results/en
+ --input-jsonl "$INPUT_PATH"
+ --cfg-strength 3.0
+ --max-secs 285.0
+ --seed "$SEED"
+ )
+ if [ "$DO_SAMPLE" -eq 1 ]; then
+ CMD+=(--do-sample)
+ fi
+ "${CMD[@]}"
+done
+
+echo "All songs generation complete, processed ${#SONG_FILES[@]} songs."
diff --git a/baseline_generate/diffrhythm2/inference.py b/baseline_generate/diffrhythm2/inference.py
new file mode 100644
index 0000000000000000000000000000000000000000..73ca04143029ba55f8d952c3c8350dc33a682c70
--- /dev/null
+++ b/baseline_generate/diffrhythm2/inference.py
@@ -0,0 +1,294 @@
+# Copyright 2025 ASLP Lab and Xiaomi Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+import torchaudio
+import argparse
+import json
+import os
+from tqdm import tqdm
+import random
+import pedalboard
+import numpy as np
+
+from muq import MuQMuLan
+from diffrhythm2.cfm import CFM
+from diffrhythm2.backbones.dit import DiT
+from bigvgan.model import Generator
+from huggingface_hub import hf_hub_download
+
+
+STRUCT_INFO = {
+ "[start]": 500,
+ "[end]": 501,
+ "[intro]": 502,
+ "[verse]": 503,
+ "[chorus]": 504,
+ "[outro]": 505,
+ "[inst]": 506,
+ "[solo]": 507,
+ "[bridge]": 508,
+ "[hook]": 509,
+ "[break]": 510,
+ "[stop]": 511,
+ "[space]": 512
+}
+
+lrc_tokenizer = None
+
+
+def set_seed(seed: int, deterministic: bool = True):
+ random.seed(seed)
+ np.random.seed(seed)
+ torch.manual_seed(seed)
+ if torch.cuda.is_available():
+ torch.cuda.manual_seed_all(seed)
+
+ if deterministic:
+ # best-effort deterministic behavior; some ops may still be nondeterministic on certain GPUs/kernels
+ torch.backends.cudnn.deterministic = True
+ torch.backends.cudnn.benchmark = False
+ try:
+ torch.use_deterministic_algorithms(True, warn_only=True)
+ except Exception:
+ pass
+
+class CNENTokenizer():
+ def __init__(self):
+ curr_path = os.path.abspath(__file__)
+ vocab_path = os.path.join(os.path.dirname(curr_path), "g2p/g2p/vocab.json")
+ with open(vocab_path, 'r') as file:
+ self.phone2id:dict = json.load(file)['vocab']
+ self.id2phone = {v:k for (k, v) in self.phone2id.items()}
+ from g2p.g2p_generation import chn_eng_g2p
+ self.tokenizer = chn_eng_g2p
+ def encode(self, text):
+ phone, token = self.tokenizer(text)
+ token = [x+1 for x in token]
+ return token
+ def decode(self, token):
+ return "|".join([self.id2phone[x-1] for x in token])
+
+
+def prepare_model(repo_id, device):
+ diffrhythm2_ckpt_path = hf_hub_download(
+ repo_id=repo_id,
+ filename="model.safetensors",
+ local_dir="./ckpt",
+ local_files_only=False,
+ )
+ diffrhythm2_config_path = hf_hub_download(
+ repo_id=repo_id,
+ filename="config.json",
+ local_dir="./ckpt",
+ local_files_only=False,
+ )
+ with open(diffrhythm2_config_path) as f:
+ model_config = json.load(f)
+
+ model_config['use_flex_attn'] = False
+ diffrhythm2 = CFM(
+ transformer=DiT(
+ **model_config
+ ),
+ num_channels=model_config['mel_dim'],
+ block_size=model_config['block_size'],
+ )
+
+ total_params = sum(p.numel() for p in diffrhythm2.parameters())
+
+ diffrhythm2 = diffrhythm2.to(device)
+ if diffrhythm2_ckpt_path.endswith('.safetensors'):
+ from safetensors.torch import load_file
+ ckpt = load_file(diffrhythm2_ckpt_path)
+ else:
+ ckpt = torch.load(diffrhythm2_ckpt_path, map_location='cpu')
+ diffrhythm2.load_state_dict(ckpt)
+ print(f"Total params: {total_params:,}")
+
+ # load Mulan
+ mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large", cache_dir="./ckpt").to(device)
+
+ # load frontend
+ lrc_tokenizer = CNENTokenizer()
+
+ # load decoder
+ decoder_ckpt_path = hf_hub_download(
+ repo_id=repo_id,
+ filename="decoder.bin",
+ local_dir="./ckpt",
+ local_files_only=False,
+ )
+ decoder_config_path = hf_hub_download(
+ repo_id=repo_id,
+ filename="decoder.json",
+ local_dir="./ckpt",
+ local_files_only=False,
+ )
+ decoder = Generator(decoder_config_path, decoder_ckpt_path)
+ decoder = decoder.to(device)
+ return diffrhythm2, mulan, lrc_tokenizer, decoder
+
+
+def parse_lyrics(lyrics: str):
+ lyrics_with_time = []
+ lyrics = lyrics.split("\n")
+ for line in lyrics:
+ struct_idx = STRUCT_INFO.get(line, None)
+ if struct_idx is not None:
+ lyrics_with_time.append([struct_idx, STRUCT_INFO['[stop]']])
+ else:
+ tokens = lrc_tokenizer.encode(line.strip())
+ tokens = tokens + [STRUCT_INFO['[stop]']]
+ lyrics_with_time.append(tokens)
+ return lyrics_with_time
+
+
+def make_fake_stereo(audio, sampling_rate):
+ left_channel = audio
+ right_channel = audio.copy()
+ right_channel = right_channel * 0.8
+ delay_samples = int(0.01 * sampling_rate)
+ right_channel = np.roll(right_channel, delay_samples)
+ right_channel[:,:delay_samples] = 0
+ stereo_audio = np.concatenate([left_channel, right_channel], axis=0)
+
+ return stereo_audio
+
+
+def inference(
+ model,
+ decoder,
+ text,
+ style_prompt,
+ duration,
+ output_dir,
+ song_name,
+ cfg_strength,
+ sample_steps=32,
+ process_bar=True,
+ fake_stereo=True,
+ ):
+ with torch.inference_mode():
+ latent = model.sample_block_cache(
+ text=text.unsqueeze(0),
+ duration=int(duration * 5),
+ style_prompt=style_prompt.unsqueeze(0),
+ steps=sample_steps,
+ cfg_strength=cfg_strength,
+ process_bar=process_bar,
+ )
+ latent = latent.transpose(1, 2)
+ audio = decoder.decode_audio(latent, overlap=5, chunk_size=20)
+
+ basename = f"{song_name}.mp3"
+ output_path = os.path.join(output_dir, basename)
+
+ num_channels = 1
+ audio = audio.float().cpu().numpy().squeeze()[None, :]
+ if fake_stereo:
+ audio = make_fake_stereo(audio, decoder.h.sampling_rate)
+ num_channels = 2
+
+ with pedalboard.io.AudioFile(output_path, "w", decoder.h.sampling_rate, num_channels) as f:
+ f.write(audio)
+
+
+if __name__ == "__main__":
+
+ parser = argparse.ArgumentParser()
+
+ parser.add_argument('--repo-id', type=str, default=None)
+ parser.add_argument('--output-dir', type=str, default=None)
+ parser.add_argument('--input-jsonl', type=str, default=None)
+ parser.add_argument('--cfg-strength', type=float, default=2.0)
+ parser.add_argument('--max-secs', type=float, default=210.0)
+ parser.add_argument('--steps', type=int, default=16)
+ parser.add_argument('--fake-stereo', type=bool, default=True)
+ parser.add_argument('--seed', type=int, default=42)
+ parser.add_argument('--do-sample', action='store_true', default=False)
+
+ args = parser.parse_args()
+
+ output_dir = args.output_dir
+ input_jsonl = args.input_jsonl
+ cfg_strength = args.cfg_strength
+ max_secs = args.max_secs
+ device = torch.device('cuda:7' if torch.cuda.is_available() else 'cpu')
+ dtype = torch.float16
+
+ # reproducibility
+ set_seed(args.seed, deterministic=(not args.do_sample))
+
+ # load diffrhythm2
+ diffrhythm2, mulan, lrc_tokenizer, decoder = prepare_model(args.repo_id, device)
+
+ output_dir = args.output_dir
+ os.makedirs(output_dir, exist_ok=True)
+
+ with open(input_jsonl, 'r') as f:
+ input_info = [json.loads(i.strip()) for i in f.readlines()]
+
+ for i in tqdm(range(len(input_info))):
+ info = input_info[i]
+ song_name = info.get('song_name', f"{i:04d}")
+ lyrics = info.get('lyrics', None)
+ style_prompt = info.get('style_prompt', None)
+ if lyrics is None or style_prompt is None:
+ print(f"lyrics or style_prompt is None, skip {song_name}")
+ continue
+
+ # preprocess lyrics
+ with open(lyrics, 'r') as f:
+ lyrics = f.read()
+ lyrics_token = parse_lyrics(lyrics)
+ lyrics_token = torch.tensor(sum(lyrics_token, []), dtype=torch.long, device=device)
+
+ # preprocess style prompt
+ if os.path.isfile(style_prompt):
+ prompt_wav, sr = torchaudio.load(style_prompt)
+ prompt_wav = torchaudio.functional.resample(prompt_wav.to(device), sr, 24000)
+ if prompt_wav.shape[1] > 24000 * 10:
+ if args.do_sample:
+ start = random.randint(0, prompt_wav.shape[1] - 24000 * 10)
+ else:
+ start = 0
+ prompt_wav = prompt_wav[:, start:start+24000*10]
+ prompt_wav = prompt_wav.mean(dim=0, keepdim=True)
+ with torch.no_grad():
+ style_prompt_embed = mulan(wavs = prompt_wav)
+ else:
+ with torch.no_grad():
+ style_prompt_embed = mulan(texts = [style_prompt])
+ style_prompt_embed = style_prompt_embed.to(device).squeeze(0)
+
+ if device.type != 'cpu':
+ diffrhythm2 = diffrhythm2.half()
+ decoder = decoder.half()
+ style_prompt_embed = style_prompt_embed.half()
+
+ inference(
+ model=diffrhythm2,
+ decoder=decoder,
+ text=lyrics_token,
+ style_prompt=style_prompt_embed,
+ duration=max_secs,
+ output_dir=output_dir,
+ song_name=song_name,
+ sample_steps=args.steps,
+ cfg_strength=cfg_strength,
+ fake_stereo=args.fake_stereo,
+ )
+
+
diff --git a/baseline_generate/diffrhythm2/inference.sh b/baseline_generate/diffrhythm2/inference.sh
new file mode 100644
index 0000000000000000000000000000000000000000..682a5e639c0f94633e837817056aa8c97a117193
--- /dev/null
+++ b/baseline_generate/diffrhythm2/inference.sh
@@ -0,0 +1,10 @@
+
+export PYTHONPATH=$PYTHONPATH:$PWD
+espeak-ng --version
+
+python inference.py \
+ --repo-id ASLP-lab/DiffRhythm2 \
+ --output-dir ./results/test \
+ --input-jsonl ./example/song_1.jsonl \
+ --cfg-strength 3.0 \
+ --max-secs 285.0 \
diff --git a/baseline_generate/diffrhythm2/scripts/__pycache__/proce_song_enprompt.cpython-311.pyc b/baseline_generate/diffrhythm2/scripts/__pycache__/proce_song_enprompt.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d6ab680cb2b9bc570513facebeb7bac38931933a
Binary files /dev/null and b/baseline_generate/diffrhythm2/scripts/__pycache__/proce_song_enprompt.cpython-311.pyc differ
diff --git a/baseline_generate/diffrhythm2/scripts/__pycache__/proce_song_enprompt_justtag.cpython-311.pyc b/baseline_generate/diffrhythm2/scripts/__pycache__/proce_song_enprompt_justtag.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5f9c0914153b871557be33913597b4b5ec114999
Binary files /dev/null and b/baseline_generate/diffrhythm2/scripts/__pycache__/proce_song_enprompt_justtag.cpython-311.pyc differ
diff --git a/baseline_generate/diffrhythm2/scripts/proce_song.py b/baseline_generate/diffrhythm2/scripts/proce_song.py
new file mode 100644
index 0000000000000000000000000000000000000000..bb5b2308a25e323ac9866e6d4e1f5bf4ec984813
--- /dev/null
+++ b/baseline_generate/diffrhythm2/scripts/proce_song.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+"""
+Process songs.jsonl to generate corresponding lrc files and jsonl files.
+"""
+
+import json
+import os
+import re
+from pathlib import Path
+from typing import List
+
+INPUT_JSONL = Path("xxx/diffrhythm2/example/final_zh_test.jsonl")
+OUTPUT_SONG_DIR = Path("xxx/diffrhythm2/example/zh_songs")
+OUTPUT_LRC_DIR = Path("xxx/diffrhythm2/example/zh_lrc")
+
+TIMESTAMP_PATTERN = re.compile(r"\[\d{2}:\d{2}(?:\.\d+)?\]")
+STRUCTURE_PATTERN = re.compile(r"^\[[^\]]+\]$")
+
+
+def normalize_structure(tag: str) -> str:
+ """Convert structure tag to target format."""
+ tag_lower = tag.lower()
+ if tag_lower.startswith("verse"):
+ return "[verse]"
+ if "chorus" in tag_lower:
+ return "[chorus]"
+ if "bridge" in tag_lower:
+ return "[bridge]"
+ return f"[{tag_lower}]"
+
+
+def transform_lyrics(raw_lyrics: str) -> List[str]:
+ """Convert lyrics to LRC line list according to requirements."""
+ lines = ["[start]", "[intro]"]
+ for raw_line in raw_lyrics.splitlines():
+ line = raw_line.strip()
+ if not line:
+ continue
+
+ # Process structure tags separately
+ if STRUCTURE_PATTERN.match(line) and not TIMESTAMP_PATTERN.match(line):
+ tag_content = line[1:-1].strip()
+ lines.append(normalize_structure(tag_content))
+ continue
+
+ # Remove timestamps
+ text = TIMESTAMP_PATTERN.sub("", line).strip()
+ if not text:
+ continue
+ lines.append(text)
+
+ lines.append("[end]")
+ return lines
+
+
+def ensure_dirs() -> None:
+ OUTPUT_SONG_DIR.mkdir(parents=True, exist_ok=True)
+ OUTPUT_LRC_DIR.mkdir(parents=True, exist_ok=True)
+
+
+def process_songs() -> None:
+ ensure_dirs()
+ with INPUT_JSONL.open("r", encoding="utf-8") as infile:
+ for idx, line in enumerate(infile, start=1):
+ line = line.strip()
+ if not line:
+ continue
+ data = json.loads(line)
+ description = data.get("description", "")
+ lyrics_raw = data.get("lyrics", "")
+
+ lrc_lines = transform_lyrics(lyrics_raw)
+ lrc_filename = f"song_{idx}.lrc"
+ lrc_path = OUTPUT_LRC_DIR / lrc_filename
+ lrc_path.write_text("\n".join(lrc_lines), encoding="utf-8")
+
+ song_base = f"song_{idx}"
+ song_filename = f"{song_base}.jsonl"
+ song_json_path = OUTPUT_SONG_DIR / song_filename
+ song_entry = {
+ "song_name": song_base,
+ "style_prompt": description,
+ "lyrics": f"example/zh_lrc/{lrc_filename}",
+ }
+ song_json_path.write_text(json.dumps(song_entry, ensure_ascii=False) + "\n", encoding="utf-8")
+ print(f"Processed song {idx}: {song_filename}")
+
+
+if __name__ == "__main__":
+ process_songs()
diff --git a/baseline_generate/diffrhythm2/scripts/proce_song_enprompt.py b/baseline_generate/diffrhythm2/scripts/proce_song_enprompt.py
new file mode 100644
index 0000000000000000000000000000000000000000..6652e33be30e4dbc8bb2bc7985f18b6fd86e1dca
--- /dev/null
+++ b/baseline_generate/diffrhythm2/scripts/proce_song_enprompt.py
@@ -0,0 +1,145 @@
+import json
+import os
+
+def extract_user_prompt(messages):
+ """
+ Extract all content from messages where role is user, and concatenate them
+ When concatenating each content, check if the concatenated prompt length exceeds 1600,
+ if so, do not concatenate and skip subsequent segments to ensure paragraph integrity
+
+ Args:
+ messages: List of dictionaries, each containing role and content fields
+
+ Returns:
+ Concatenated prompt string
+ """
+ # Collect all user message content, but need to check length limit
+ user_contents = []
+ current_length = 0 # Current concatenated length
+
+ for msg in messages:
+ if msg.get("role") == "user":
+ content = msg.get("content", "")
+ if content:
+ # Calculate total length if this content is added
+ # Need to consider newline: if content already exists, need to add a newline
+ if user_contents:
+ # Content already exists, need to add newline and current content
+ new_length = current_length + 1 + len(content) # 1 is newline length
+ else:
+ # First content, no newline needed
+ new_length = len(content)
+
+ # If adding this content doesn't exceed 1600, add it
+ if new_length <= 1600:
+ user_contents.append(content)
+ current_length = new_length
+ else:
+ # Exceeds 1600, don't add this content and stop processing subsequent segments
+ break
+
+ # Concatenate all content with newlines
+ if user_contents:
+ prompt = "\n".join(user_contents)
+ return prompt
+
+ # If no user message found, return empty string
+ return ""
+
+def update_song_file(file_path, new_prompt):
+ """
+ Update style_prompt field in song file
+
+ Args:
+ file_path: Path to song file
+ new_prompt: New prompt content
+ """
+ # Read file content
+ with open(file_path, 'r', encoding='utf-8') as f:
+ lines = [line.strip() for line in f if line.strip()]
+
+ if not lines:
+ print(f" Warning: File {file_path} is empty, skipping")
+ return
+
+ # Read first JSON data
+ try:
+ data = json.loads(lines[0])
+ # Update style_prompt field
+ data['style_prompt'] = new_prompt
+
+ # Write back to file
+ with open(file_path, 'w', encoding='utf-8') as f:
+ f.write(json.dumps(data, ensure_ascii=False) + '\n')
+ # If there's a second empty line, keep it
+ if len(lines) > 1:
+ f.write('\n')
+
+ print(f" ✓ Updated {file_path}")
+ except json.JSONDecodeError as e:
+ print(f" Error: JSON parsing failed {file_path}: {e}")
+ except Exception as e:
+ print(f" Error: Failed to update file {file_path}: {e}")
+
+def main():
+ # File paths
+ input_file = "xxx/diffrhythm2/scripts/test_messages.jsonl"
+ zh_songs_dir = "xxx/diffrhythm2/example/zh_songs"
+ en_songs_dir = "xxx/diffrhythm2/example/en_songs"
+
+ print(f"Reading file: {input_file}")
+
+ # Read all data
+ with open(input_file, 'r', encoding='utf-8') as f:
+ lines = [line.strip() for line in f if line.strip()]
+
+ print(f"Read {len(lines)} entries")
+
+ # Process each entry
+ for idx, line in enumerate(lines, 1):
+ try:
+ data = json.loads(line)
+ messages = data.get("messages", [])
+
+ # Extract prompt
+ prompt = extract_user_prompt(messages)
+
+ if not prompt:
+ print(f"Processing entry {idx}: No user content found, skipping")
+ continue
+
+ # Determine if Chinese or English
+ if idx <= 50:
+ # First 50 entries: Chinese songs
+ song_num = idx
+ target_dir = zh_songs_dir
+ lang = "Chinese"
+ else:
+ # Entries 51-100: English songs
+ song_num = idx - 50 # 51->1, 52->2, ..., 100->50
+ target_dir = en_songs_dir
+ lang = "English"
+
+ # Build file path
+ song_file = os.path.join(target_dir, f"song_{song_num}.jsonl")
+
+ print(f"Processing entry {idx} ({lang}, song_{song_num})...")
+ print(f" Prompt length: {len(prompt)} characters")
+
+ # Update file
+ update_song_file(song_file, prompt)
+
+ except json.JSONDecodeError as e:
+ print(f"JSON parsing failed for entry {idx}: {e}")
+ continue
+ except Exception as e:
+ print(f"Error processing entry {idx}: {e}")
+ import traceback
+ traceback.print_exc()
+ continue
+
+ print(f"\nProcessing complete! Processed {len(lines)} entries")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/baseline_generate/diffrhythm2/scripts/proce_song_enprompt_justtag.py b/baseline_generate/diffrhythm2/scripts/proce_song_enprompt_justtag.py
new file mode 100644
index 0000000000000000000000000000000000000000..e8fe12d9277e90775a5902d7619c1024fbfef32c
--- /dev/null
+++ b/baseline_generate/diffrhythm2/scripts/proce_song_enprompt_justtag.py
@@ -0,0 +1,143 @@
+import json
+import os
+
+def extract_user_prompt(messages):
+ """
+ Extract the first user role content from messages list, extract style string
+
+ Format is usually: "Please generate a song in the following style: (style description)\n"
+ Only keep the style string part after the colon
+
+ Args:
+ messages: List of dictionaries, each containing role and content fields
+
+ Returns:
+ Style string
+ """
+ # Find first message with role user
+ for msg in messages:
+ if msg.get("role") == "user":
+ content = msg.get("content", "")
+ if content:
+ # Find position of "Please generate a song in the following style:"
+ style_prefix = "Please generate a song in the following style:"
+ style_index = content.find(style_prefix)
+
+ if style_index != -1:
+ # Find start position of content after colon
+ start_index = style_index + len(style_prefix)
+ # Find position of newline
+ newline_index = content.find("\n", start_index)
+
+ if newline_index != -1:
+ # Extract content from after colon to before newline
+ style_text = content[start_index:newline_index].strip()
+ else:
+ # If no newline, extract to end of string
+ style_text = content[start_index:].strip()
+
+ return style_text
+ else:
+ # If standard format not found, return empty string
+ return ""
+
+ # If no user message found, return empty string
+ return ""
+
+def update_song_file(file_path, new_prompt):
+ """
+ Update style_prompt field in song file
+
+ Args:
+ file_path: Path to song file
+ new_prompt: New prompt content
+ """
+ # Read file content
+ with open(file_path, 'r', encoding='utf-8') as f:
+ lines = [line.strip() for line in f if line.strip()]
+
+ if not lines:
+ print(f" Warning: File {file_path} is empty, skipping")
+ return
+
+ # Read first JSON data
+ try:
+ data = json.loads(lines[0])
+ # Update style_prompt field
+ data['style_prompt'] = new_prompt
+
+ # Write back to file
+ with open(file_path, 'w', encoding='utf-8') as f:
+ f.write(json.dumps(data, ensure_ascii=False) + '\n')
+ # If there's a second empty line, keep it
+ if len(lines) > 1:
+ f.write('\n')
+
+ print(f" ✓ Updated {file_path}")
+ except json.JSONDecodeError as e:
+ print(f" Error: JSON parsing failed {file_path}: {e}")
+ except Exception as e:
+ print(f" Error: Failed to update file {file_path}: {e}")
+
+def main():
+ # File paths
+ input_file = "xxx/diffrhythm2/scripts/test_messages.jsonl"
+ zh_songs_dir = "xxx/diffrhythm2/example/zh_songs"
+ en_songs_dir = "xxx/diffrhythm2/example/en_songs"
+
+ print(f"Reading file: {input_file}")
+
+ # Read all data
+ with open(input_file, 'r', encoding='utf-8') as f:
+ lines = [line.strip() for line in f if line.strip()]
+
+ print(f"Read {len(lines)} entries")
+
+ # Process each entry
+ for idx, line in enumerate(lines, 1):
+ try:
+ data = json.loads(line)
+ messages = data.get("messages", [])
+
+ # Extract prompt
+ prompt = extract_user_prompt(messages)
+
+ if not prompt:
+ print(f"Processing entry {idx}: No user content found, skipping")
+ continue
+
+ # Determine if Chinese or English
+ if idx <= 50:
+ # First 50 entries: Chinese songs
+ song_num = idx
+ target_dir = zh_songs_dir
+ lang = "Chinese"
+ else:
+ # Entries 51-100: English songs
+ song_num = idx - 50 # 51->1, 52->2, ..., 100->50
+ target_dir = en_songs_dir
+ lang = "English"
+
+ # Build file path
+ song_file = os.path.join(target_dir, f"song_{song_num}.jsonl")
+
+ print(f"Processing entry {idx} ({lang}, song_{song_num})...")
+ print(f" Prompt length: {len(prompt)} characters")
+
+ # Update file
+ update_song_file(song_file, prompt)
+
+ except json.JSONDecodeError as e:
+ print(f"JSON parsing failed for entry {idx}: {e}")
+ continue
+ except Exception as e:
+ print(f"Error processing entry {idx}: {e}")
+ import traceback
+ traceback.print_exc()
+ continue
+
+ print(f"\nProcessing complete! Processed {len(lines)} entries")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/baseline_generate/levo/__pycache__/generate.cpython-311.pyc b/baseline_generate/levo/__pycache__/generate.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a05cc44f1a7ab4db09818536543d56d2315588e3
Binary files /dev/null and b/baseline_generate/levo/__pycache__/generate.cpython-311.pyc differ
diff --git a/baseline_generate/levo/convert.py b/baseline_generate/levo/convert.py
new file mode 100644
index 0000000000000000000000000000000000000000..58c3a0b809d7172518a671fcea07cc6f9e0d7379
--- /dev/null
+++ b/baseline_generate/levo/convert.py
@@ -0,0 +1,110 @@
+"""
+Convert data to ACE-STEP acceptable format
+"""
+
+import re
+import json
+import random
+from tqdm import tqdm
+
+random.seed(42)
+
+def load_jsonl(path:str) -> list[dict]:
+ data = []
+ with open(path, 'r') as file:
+ for line in tqdm(file, desc=f"Loading {path}"):
+ data.append(json.loads(line))
+ return data
+
+def save_jsonl(data:list, path:str):
+ with open(path, 'w', encoding='utf-8') as file:
+ for ele in tqdm(data, desc=f"Saving {path}"):
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+
+START_STR = "Please generate a song in the following style:"
+END_STR = "\nNext, I will tell you the requirements and lyrics"
+
+def process_tag(content:str) -> str:
+ """Process segment label"""
+ # Extract label
+ end = content.find("[desc:")
+ tag = content[1:end-1]
+ # Lowercase & remove numbers & remove parentheses
+ tag = tag.lower()
+ tag = re.sub(r'\d+', '', tag)
+ tag = re.sub(r'\([^)]*\)', '', tag).strip()
+ if tag == "pre-chorus":
+ tag = "chorus"
+ return f"[{tag}]"
+
+def process_lyrics(content:str) -> str:
+ """Process segment lyrics"""
+ # Extract lyrics
+ start = content.find("[lyrics:\n")
+ if start == -1:
+ return ""
+ end = content.find("][phoneme:")
+ lyric = content[start+len("[lyrics:\n"):end]
+
+ # Punctuation conversion
+ pattern = r'[,。",:;&—‘\'.\]\[()?\n-]'
+ lyric = re.sub(pattern, '.', lyric)
+ while lyric.find("..") != -1:
+ lyric = lyric.replace("..", ".")
+ if lyric.endswith('.'):
+ lyric = lyric[:-1]
+ return lyric
+
+def random_size() -> str:
+ # Intro/outro length
+ sizes = ['short', 'medium', 'long']
+ return random.choice(sizes)
+
+def process_one(messages:list[dict]):
+ """Process a conversation messages into input format, return gt_lyric and descriptions"""
+ # Overall style
+ style:str = messages[0]['content']
+ start = style.find(START_STR)
+ end = style.find(END_STR)
+ descriptions = style[start+len(START_STR):end]
+
+ # Line-by-line lyrics
+ start_tag = "intro-" + random_size()
+ end_tag = "outro-" + random_size()
+ gt_lyric = f"[{start_tag}] ;"
+ for message in messages[1:]:
+ if message['role'] == "assistant":
+ continue
+ content = message['content']
+ # Segment label
+ tag = process_tag(content)
+ # Segment lyrics
+ lyric = process_lyrics(content)
+ if lyric == "" or tag.startswith("[outro"):
+ gt_lyric += f" [{end_tag}]"
+ break
+ gt_lyric += f" {tag} {lyric} ;"
+ if not gt_lyric.endswith(f" [{end_tag}]"):
+ gt_lyric += f" [{end_tag}]"
+ return descriptions, gt_lyric
+
+def main():
+ path = "xxx/SongGeneration/data/inputs/test_messages.jsonl"
+ dataset = load_jsonl(path)
+ save_path = "xxx/SongGeneration/data/inputs/lyrics.jsonl"
+
+ with open(save_path, 'w', encoding='utf-8') as file:
+ for id, ele in tqdm(enumerate(dataset), desc="Processing"):
+ messages = ele['messages']
+ descriptions, gt_lyric = process_one(messages)
+ data = {
+ "idx": f"test_{id}",
+ "descriptions": descriptions,
+ "gt_lyric": gt_lyric
+ }
+ json.dump(data, file, ensure_ascii=False)
+ file.write("\n")
+
+if __name__ == "__main__":
+ main()
\ No newline at end of file
diff --git a/baseline_generate/levo/generate.py b/baseline_generate/levo/generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..4404c9312142bd71839cce4b6060bc7fd48fddeb
--- /dev/null
+++ b/baseline_generate/levo/generate.py
@@ -0,0 +1,591 @@
+from hmac import new
+import sys
+import os
+import argparse
+
+import time
+import json
+import torch
+import torchaudio
+import numpy as np
+from omegaconf import OmegaConf
+from codeclm.models import builders
+import gc
+from codeclm.trainer.codec_song_pl import CodecLM_PL
+from codeclm.models import CodecLM
+from third_party.demucs.models.pretrained import get_model_from_yaml
+import re
+import subprocess
+
+auto_prompt_type = ['Pop', 'R&B', 'Dance', 'Jazz', 'Folk', 'Rock', 'Chinese Style', 'Chinese Tradition', 'Metal', 'Reggae', 'Chinese Opera', 'Auto']
+
+def get_free_gpu() -> int:
+ """Return the GPU ID with the least memory usage"""
+ cmd = "nvidia-smi --query-gpu=index,memory.free --format=csv,noheader,nounits"
+ result = subprocess.check_output(cmd.split()).decode().strip().split("\n")
+
+ free_list = []
+ for line in result:
+ idx, free_mem = line.split(",")
+ free_list.append((int(idx), int(free_mem))) # (GPU id, free memory MiB)
+
+ # Sort by remaining memory
+ free_list.sort(key=lambda x: x[1], reverse=True)
+ return free_list[0][0]
+
+class Separator:
+ def __init__(self, dm_model_path='third_party/demucs/ckpt/htdemucs.pth', dm_config_path='third_party/demucs/ckpt/htdemucs.yaml', gpu_id=0) -> None:
+ gpu_id = get_free_gpu()
+ self.device = f"cuda:{gpu_id}"
+ print(f"Using {self.device}")
+
+ # if torch.cuda.is_available() and gpu_id < torch.cuda.device_count():
+ # self.device = torch.device(f"cuda:{gpu_id}")
+ # else:
+ # self.device = torch.device("cpu")
+
+ self.demucs_model = self.init_demucs_model(dm_model_path, dm_config_path)
+
+ def init_demucs_model(self, model_path, config_path):
+ model = get_model_from_yaml(config_path, model_path)
+ model.to(self.device)
+ model.eval()
+ return model
+
+ def load_audio(self, f):
+ a, fs = torchaudio.load(f)
+ if (fs != 48000):
+ a = torchaudio.functional.resample(a, fs, 48000)
+ if a.shape[-1] >= 48000*10:
+ a = a[..., :48000*10]
+ return a[:, 0:48000*10]
+
+ def run(self, audio_path, output_dir='tmp', ext=".flac"):
+ os.makedirs(output_dir, exist_ok=True)
+ name, _ = os.path.splitext(os.path.split(audio_path)[-1])
+ output_paths = []
+
+ for stem in self.demucs_model.sources:
+ output_path = os.path.join(output_dir, f"{name}_{stem}{ext}")
+ if os.path.exists(output_path):
+ output_paths.append(output_path)
+ if len(output_paths) == 1: # 4
+ vocal_path = output_paths[0]
+ else:
+ drums_path, bass_path, other_path, vocal_path = self.demucs_model.separate(audio_path, output_dir, device=self.device)
+ for path in [drums_path, bass_path, other_path]:
+ os.remove(path)
+ full_audio = self.load_audio(audio_path)
+ vocal_audio = self.load_audio(vocal_path)
+ bgm_audio = full_audio - vocal_audio
+ return full_audio, vocal_audio, bgm_audio
+
+
+def parse_args():
+ parser = argparse.ArgumentParser(description='Song Generation Script')
+
+ # Required parameters
+ parser.add_argument('--ckpt_path', type=str, required=True,
+ help='Path to the checkpoint directory containing config.yaml and model.pt')
+ parser.add_argument('--input_jsonl', type=str, required=True,
+ help='Path to input JSONL file containing generation tasks')
+ parser.add_argument('--save_dir', type=str, required=True,
+ help='Directory to save generated audio files and results')
+ # Optional parameters
+ parser.add_argument('--generate_type', type=str, default='mixed',
+ help='Type of generation: "vocal" or "bgm" or "separate" or "mixed" (default: "mixed")')
+ parser.add_argument('--use_flash_attn', action='store_true',
+ help='Whether to use flash attention (default: False)')
+ parser.add_argument('--low_mem', action='store_true',
+ help='Whether to use low memory mode (default: False)')
+ return parser.parse_args()
+
+def generate(args):
+ torch.set_num_threads(1)
+ ckpt_path = args.ckpt_path
+ input_jsonl = args.input_jsonl
+ save_dir = args.save_dir
+ cfg_path = os.path.join(ckpt_path, 'config.yaml')
+ ckpt_path = os.path.join(ckpt_path, 'model.pt')
+ cfg = OmegaConf.load(cfg_path)
+ cfg.lm.use_flash_attn_2 = args.use_flash_attn
+ print(f"use_flash_attn: {args.use_flash_attn}")
+ cfg.mode = 'inference'
+ max_duration = cfg.max_dur
+ gen_type = args.generate_type
+
+
+ separator = Separator()
+ auto_prompt = torch.load('tools/new_prompt.pt')
+ audio_tokenizer = builders.get_audio_tokenizer_model(cfg.audio_tokenizer_checkpoint, cfg)
+ audio_tokenizer = audio_tokenizer.eval().cuda()
+ with open(input_jsonl, "r") as fp:
+ lines = fp.readlines()
+
+
+ new_items = []
+ for line in lines:
+ item = json.loads(line)
+ target_wav_name = f"{save_dir}/audios/{item['idx']}.flac"
+ # get prompt audio
+ if "prompt_audio_path" in item:
+ assert os.path.exists(item['prompt_audio_path']), f"prompt_audio_path {item['prompt_audio_path']} not found"
+ assert 'auto_prompt_audio_type' not in item, f"auto_prompt_audio_type and prompt_audio_path cannot be used together"
+ with torch.no_grad():
+ pmt_wav, vocal_wav, bgm_wav = separator.run(item['prompt_audio_path'])
+ item['raw_pmt_wav'] = pmt_wav
+ item['raw_vocal_wav'] = vocal_wav
+ item['raw_bgm_wav'] = bgm_wav
+ if pmt_wav.dim() == 2:
+ pmt_wav = pmt_wav[None]
+ if pmt_wav.dim() != 3:
+ raise ValueError("Melody wavs should have a shape [B, C, T].")
+ pmt_wav = list(pmt_wav)
+ if vocal_wav.dim() == 2:
+ vocal_wav = vocal_wav[None]
+ if vocal_wav.dim() != 3:
+ raise ValueError("Vocal wavs should have a shape [B, C, T].")
+ vocal_wav = list(vocal_wav)
+ if bgm_wav.dim() == 2:
+ bgm_wav = bgm_wav[None]
+ if bgm_wav.dim() != 3:
+ raise ValueError("BGM wavs should have a shape [B, C, T].")
+ bgm_wav = list(bgm_wav)
+ if type(pmt_wav) == list:
+ pmt_wav = torch.stack(pmt_wav, dim=0)
+ if type(vocal_wav) == list:
+ vocal_wav = torch.stack(vocal_wav, dim=0)
+ if type(bgm_wav) == list:
+ bgm_wav = torch.stack(bgm_wav, dim=0)
+ pmt_wav = pmt_wav
+ vocal_wav = vocal_wav
+ bgm_wav = bgm_wav
+ with torch.no_grad():
+ pmt_wav, _ = audio_tokenizer.encode(pmt_wav.cuda())
+ melody_is_wav = False
+ elif "auto_prompt_audio_type" in item:
+ assert item["auto_prompt_audio_type"] in auto_prompt_type, f"auto_prompt_audio_type {item['auto_prompt_audio_type']} not found"
+ prompt_token = auto_prompt[item["auto_prompt_audio_type"]][np.random.randint(0, len(auto_prompt[item["auto_prompt_audio_type"]]))]
+ pmt_wav = prompt_token[:,[0],:]
+ vocal_wav = prompt_token[:,[1],:]
+ bgm_wav = prompt_token[:,[2],:]
+ melody_is_wav = False
+ else:
+ pmt_wav = None
+ vocal_wav = None
+ bgm_wav = None
+ melody_is_wav = True
+ item['pmt_wav'] = pmt_wav
+ item['vocal_wav'] = vocal_wav
+ item['bgm_wav'] = bgm_wav
+ item['melody_is_wav'] = melody_is_wav
+ item["idx"] = f"{item['idx']}"
+ item["wav_path"] = target_wav_name
+ new_items.append(item)
+
+ del audio_tokenizer
+ del separator
+
+ torch.cuda.empty_cache()
+
+ if "audio_tokenizer_checkpoint_sep" in cfg.keys():
+ seperate_tokenizer = builders.get_audio_tokenizer_model(cfg.audio_tokenizer_checkpoint_sep, cfg)
+ else:
+ seperate_tokenizer = None
+
+ if seperate_tokenizer is not None:
+ seperate_tokenizer = seperate_tokenizer.eval().cuda()
+
+ for item in new_items:
+ if "prompt_audio_path" in item:
+ with torch.no_grad():
+ vocal_wav, bgm_wav = seperate_tokenizer.encode(item['vocal_wav'].cuda(), item['bgm_wav'].cuda())
+ item['vocal_wav'] = vocal_wav
+ item['bgm_wav'] = bgm_wav
+
+ torch.cuda.empty_cache()
+ audiolm = builders.get_lm_model(cfg)
+ checkpoint = torch.load(ckpt_path, map_location='cpu')
+ audiolm_state_dict = {k.replace('audiolm.', ''): v for k, v in checkpoint.items() if k.startswith('audiolm')}
+ audiolm.load_state_dict(audiolm_state_dict, strict=False)
+ audiolm = audiolm.eval()
+ audiolm = audiolm.cuda().to(torch.float16)
+
+ model = CodecLM(name = "tmp",
+ lm = audiolm,
+ audiotokenizer = None,
+ max_duration = max_duration,
+ seperate_tokenizer = seperate_tokenizer,
+ )
+
+ cfg_coef = 1.5 #25
+ temp = 0.9
+ top_k = 50
+ top_p = 0.0
+ record_tokens = True
+ record_window = 50
+
+ model.set_generation_params(duration=max_duration, extend_stride=5, temperature=temp, cfg_coef=cfg_coef,
+ top_k=top_k, top_p=top_p, record_tokens=record_tokens, record_window=record_window)
+ os.makedirs(save_dir, exist_ok=True)
+ os.makedirs(save_dir + "/audios", exist_ok=True)
+ os.makedirs(save_dir + "/jsonl", exist_ok=True)
+
+ for item in new_items:
+ lyric = item["gt_lyric"]
+ descriptions = item["descriptions"] if "descriptions" in item else None
+ pmt_wav = item['pmt_wav']
+ vocal_wav = item['vocal_wav']
+ bgm_wav = item['bgm_wav']
+ melody_is_wav = item['melody_is_wav']
+ target_wav_name = f"{save_dir}/audios/{item['idx']}.flac"
+
+
+ generate_inp = {
+ 'lyrics': [lyric.replace(" ", " ")],
+ 'descriptions': [descriptions],
+ 'melody_wavs': pmt_wav,
+ 'vocal_wavs': vocal_wav,
+ 'bgm_wavs': bgm_wav,
+ 'melody_is_wav': melody_is_wav,
+ }
+ start_time = time.time()
+ with torch.autocast(device_type="cuda", dtype=torch.float16):
+ with torch.no_grad():
+ tokens = model.generate(**generate_inp, return_tokens=True)
+ mid_time = time.time()
+
+ with torch.no_grad():
+ if 'raw_pmt_wav' in item:
+ if gen_type == 'separate':
+ wav_seperate = model.generate_audio(tokens, item['raw_pmt_wav'], item['raw_vocal_wav'], item['raw_bgm_wav'], chunked=True, gen_type='mixed')
+ wav_vocal = model.generate_audio(tokens, item['raw_pmt_wav'], item['raw_vocal_wav'], item['raw_bgm_wav'], chunked=True, gen_type='vocal')
+ wav_bgm = model.generate_audio(tokens, item['raw_pmt_wav'], item['raw_vocal_wav'], item['raw_bgm_wav'], chunked=True, gen_type='bgm')
+ elif gen_type == 'mixed':
+ wav_seperate = model.generate_audio(tokens, item['raw_pmt_wav'], item['raw_vocal_wav'], item['raw_bgm_wav'],chunked=True, gen_type=gen_type)
+ else:
+ wav_seperate = model.generate_audio(tokens,chunked=True, gen_type=gen_type)
+ del item['raw_pmt_wav']
+ del item['raw_vocal_wav']
+ del item['raw_bgm_wav']
+ else:
+ if gen_type == 'separate':
+ wav_vocal = model.generate_audio(tokens, chunked=True, gen_type='vocal')
+ wav_bgm = model.generate_audio(tokens, chunked=True, gen_type='bgm')
+ wav_seperate = model.generate_audio(tokens, chunked=True, gen_type='mixed')
+ else:
+ wav_seperate = model.generate_audio(tokens, chunked=True, gen_type=gen_type)
+ del item['pmt_wav']
+ del item['vocal_wav']
+ del item['bgm_wav']
+ del item['melody_is_wav']
+ end_time = time.time()
+ if gen_type == 'separate':
+ torchaudio.save(target_wav_name.replace('.flac', '_vocal.flac'), wav_vocal[0].cpu().float(), cfg.sample_rate)
+ torchaudio.save(target_wav_name.replace('.flac', '_bgm.flac'), wav_bgm[0].cpu().float(), cfg.sample_rate)
+ torchaudio.save(target_wav_name, wav_seperate[0].cpu().float(), cfg.sample_rate)
+ else:
+ torchaudio.save(target_wav_name, wav_seperate[0].cpu().float(), cfg.sample_rate)
+
+ print(f"process{item['idx']}, lm cost {mid_time - start_time}s, diffusion cost {end_time - mid_time}")
+ item["idx"] = f"{item['idx']}"
+ item["wav_path"] = target_wav_name
+
+ src_jsonl_name = os.path.split(input_jsonl)[-1]
+ with open(f"{save_dir}/jsonl/{src_jsonl_name}.jsonl", "w", encoding='utf-8') as fw:
+ for item in new_items:
+ fw.writelines(json.dumps(item, ensure_ascii=False)+"\n")
+
+def generate_lowmem(args):
+ torch.set_num_threads(1)
+ ckpt_path = args.ckpt_path
+ input_jsonl = args.input_jsonl
+ save_dir = args.save_dir
+ cfg_path = os.path.join(ckpt_path, 'config.yaml')
+ ckpt_path = os.path.join(ckpt_path, 'model.pt')
+ cfg = OmegaConf.load(cfg_path)
+ cfg.lm.use_flash_attn_2 = args.use_flash_attn
+ print(f"use_flash_attn: {args.use_flash_attn}")
+ cfg.mode = 'inference'
+ max_duration = cfg.max_dur
+ gen_type = args.generate_type
+ chunk_size = 128
+ use_audio_tokenizer = False
+ with open(input_jsonl, "r") as fp:
+ lines = fp.readlines()
+ for line in lines:
+ item = json.loads(line)
+ if "prompt_audio_path" in item:
+ use_audio_tokenizer = True
+ break
+ if use_audio_tokenizer:
+ separator = Separator()
+ audio_tokenizer = builders.get_audio_tokenizer_model(cfg.audio_tokenizer_checkpoint, cfg)
+ audio_tokenizer = audio_tokenizer.eval().cuda()
+ auto_prompt = torch.load('tools/new_prompt.pt')
+ new_items = []
+ for line in lines:
+ item = json.loads(line)
+ target_wav_name = f"{save_dir}/audios/{item['idx']}.flac"
+ # get prompt audio
+ if "prompt_audio_path" in item:
+ assert os.path.exists(item['prompt_audio_path']), f"prompt_audio_path {item['prompt_audio_path']} not found"
+ assert 'auto_prompt_audio_type' not in item, f"auto_prompt_audio_type and prompt_audio_path cannot be used together"
+ with torch.no_grad():
+ pmt_wav, vocal_wav, bgm_wav = separator.run(item['prompt_audio_path'])
+ item['raw_pmt_wav'] = pmt_wav
+ item['raw_vocal_wav'] = vocal_wav
+ item['raw_bgm_wav'] = bgm_wav
+ if pmt_wav.dim() == 2:
+ pmt_wav = pmt_wav[None]
+ if pmt_wav.dim() != 3:
+ raise ValueError("Melody wavs should have a shape [B, C, T].")
+ pmt_wav = list(pmt_wav)
+ if vocal_wav.dim() == 2:
+ vocal_wav = vocal_wav[None]
+ if vocal_wav.dim() != 3:
+ raise ValueError("Vocal wavs should have a shape [B, C, T].")
+ vocal_wav = list(vocal_wav)
+ if bgm_wav.dim() == 2:
+ bgm_wav = bgm_wav[None]
+ if bgm_wav.dim() != 3:
+ raise ValueError("BGM wavs should have a shape [B, C, T].")
+ bgm_wav = list(bgm_wav)
+ if type(pmt_wav) == list:
+ pmt_wav = torch.stack(pmt_wav, dim=0)
+ if type(vocal_wav) == list:
+ vocal_wav = torch.stack(vocal_wav, dim=0)
+ if type(bgm_wav) == list:
+ bgm_wav = torch.stack(bgm_wav, dim=0)
+ with torch.no_grad():
+ pmt_wav, _ = audio_tokenizer.encode(pmt_wav.cuda())
+ melody_is_wav = False
+ elif "auto_prompt_audio_type" in item:
+ assert item["auto_prompt_audio_type"] in auto_prompt_type, f"auto_prompt_audio_type {item['auto_prompt_audio_type']} not found"
+ prompt_token = auto_prompt[item["auto_prompt_audio_type"]][np.random.randint(0, len(auto_prompt[item["auto_prompt_audio_type"]]))]
+ pmt_wav = prompt_token[:,[0],:]
+ vocal_wav = prompt_token[:,[1],:]
+ bgm_wav = prompt_token[:,[2],:]
+ melody_is_wav = False
+ else:
+ pmt_wav = None
+ vocal_wav = None
+ bgm_wav = None
+ melody_is_wav = True
+ item['pmt_wav'] = pmt_wav
+ item['vocal_wav'] = vocal_wav
+ item['bgm_wav'] = bgm_wav
+ item['melody_is_wav'] = melody_is_wav
+ item["idx"] = f"{item['idx']}"
+ item["wav_path"] = target_wav_name
+ new_items.append(item)
+
+ if use_audio_tokenizer:
+ del audio_tokenizer
+ del separator
+
+ torch.cuda.empty_cache()
+
+ if "audio_tokenizer_checkpoint_sep" in cfg.keys() and use_audio_tokenizer:
+ seperate_tokenizer = builders.get_audio_tokenizer_model(cfg.audio_tokenizer_checkpoint_sep, cfg)
+ else:
+ seperate_tokenizer = None
+
+ if seperate_tokenizer is not None:
+ seperate_tokenizer = seperate_tokenizer.eval().cuda()
+
+ for item in new_items:
+ if "prompt_audio_path" in item:
+ with torch.no_grad():
+ vocal_wav, bgm_wav = seperate_tokenizer.encode(item['vocal_wav'].cuda(), item['bgm_wav'].cuda())
+ item['vocal_wav'] = vocal_wav
+ item['bgm_wav'] = bgm_wav
+
+ if use_audio_tokenizer:
+ del seperate_tokenizer
+
+ torch.cuda.empty_cache()
+
+ # Define model or load pretrained model
+ audiolm = builders.get_lm_model(cfg)
+ checkpoint = torch.load(ckpt_path, map_location='cpu')
+ audiolm_state_dict = {k.replace('audiolm.', ''): v for k, v in checkpoint.items() if k.startswith('audiolm')}
+ audiolm.load_state_dict(audiolm_state_dict, strict=False)
+ audiolm = audiolm.eval()
+
+ offload_audiolm = True if 'offload' in cfg.keys() and 'audiolm' in cfg.offload else False
+ if offload_audiolm:
+ audiolm_offload_param = OffloadParamParse.parse_config(audiolm, cfg.offload.audiolm)
+ audiolm_offload_param.show()
+ offload_profiler = OffloadProfiler(device_index=0, **(audiolm_offload_param.init_param_dict()))
+ offload_profiler.offload_layer(**(audiolm_offload_param.offload_layer_param_dict()))
+ offload_profiler.clean_cache_wrapper(**(audiolm_offload_param.clean_cache_param_dict()))
+ else:
+ audiolm = audiolm.cuda().to(torch.float16)
+
+ model = CodecLM(name = "tmp",
+ lm = audiolm,
+ audiotokenizer = None,
+ max_duration = max_duration,
+ seperate_tokenizer = None,
+ )
+
+ cfg_coef = 1.5 #25
+ temp = 0.9
+ top_k = 50
+ top_p = 0.0
+ record_tokens = True
+ record_window = 50
+
+
+ model.set_generation_params(duration=max_duration, extend_stride=5, temperature=temp, cfg_coef=cfg_coef,
+ top_k=top_k, top_p=top_p, record_tokens=record_tokens, record_window=record_window)
+ os.makedirs(save_dir, exist_ok=True)
+ os.makedirs(save_dir + "/audios", exist_ok=True)
+ os.makedirs(save_dir + "/jsonl", exist_ok=True)
+
+
+ for item in new_items:
+ lyric = item["gt_lyric"]
+ descriptions = item["descriptions"] if "descriptions" in item else None
+ pmt_wav = item['pmt_wav']
+ vocal_wav = item['vocal_wav']
+ bgm_wav = item['bgm_wav']
+ melody_is_wav = item['melody_is_wav']
+
+ generate_inp = {
+ 'lyrics': [lyric.replace(" ", " ")],
+ 'descriptions': [descriptions],
+ 'melody_wavs': pmt_wav,
+ 'vocal_wavs': vocal_wav,
+ 'bgm_wavs': bgm_wav,
+ 'melody_is_wav': melody_is_wav,
+ }
+ with torch.autocast(device_type="cuda", dtype=torch.float16):
+ with torch.no_grad():
+ tokens = model.generate(**generate_inp, return_tokens=True)
+ if offload_audiolm:
+ offload_profiler.reset_empty_cache_mem_line()
+ item['tokens'] = tokens
+ if offload_audiolm:
+ offload_profiler.stop()
+ del offload_profiler
+ del audiolm_offload_param
+ del model
+ audiolm = audiolm.cpu()
+ del audiolm
+ del checkpoint
+ gc.collect()
+ torch.cuda.empty_cache()
+
+ seperate_tokenizer = builders.get_audio_tokenizer_model_cpu(cfg.audio_tokenizer_checkpoint_sep, cfg)
+ device = "cuda:0"
+ seperate_tokenizer.model.device = device
+ seperate_tokenizer.model.vae = seperate_tokenizer.model.vae.to(device)
+ seperate_tokenizer.model.model.device = torch.device(device)
+ seperate_tokenizer = seperate_tokenizer.eval()
+
+ # offload_wav_tokenizer_diffusion = True if 'offload' in cfg.keys() and 'wav_tokenizer_diffusion' in cfg.offload else False
+ offload_wav_tokenizer_diffusion = False
+ if offload_wav_tokenizer_diffusion:
+ sep_offload_param = OffloadParamParse.parse_config(seperate_tokenizer, cfg.offload.wav_tokenizer_diffusion)
+ sep_offload_param.show()
+ sep_offload_profiler = OffloadProfiler(device_index=0, **(sep_offload_param.init_param_dict()))
+ sep_offload_profiler.offload_layer(**(sep_offload_param.offload_layer_param_dict()))
+ sep_offload_profiler.clean_cache_wrapper(**(sep_offload_param.clean_cache_param_dict()))
+ else:
+ seperate_tokenizer.model.model = seperate_tokenizer.model.model.to(device)
+
+ model = CodecLM(name = "tmp",
+ lm = None,
+ audiotokenizer = None,
+ max_duration = max_duration,
+ seperate_tokenizer = seperate_tokenizer,
+ )
+
+ for item in new_items:
+ with torch.no_grad():
+ if 'raw_pmt_wav' in item:
+ if gen_type == 'separate':
+ wav_seperate = model.generate_audio(item['tokens'], item['raw_pmt_wav'], item['raw_vocal_wav'], item['raw_bgm_wav'],chunked=True, gen_type='mixed')
+ wav_vocal = model.generate_audio(item['tokens'],chunked=True, gen_type='vocal')
+ wav_bgm = model.generate_audio(item['tokens'], chunked=True, gen_type='bgm')
+ elif gen_type == 'mixed':
+ wav_seperate = model.generate_audio(item['tokens'], item['raw_pmt_wav'], item['raw_vocal_wav'], item['raw_bgm_wav'],chunked=True, gen_type=gen_type)
+ else:
+ wav_seperate = model.generate_audio(item['tokens'], chunked=True, gen_type=gen_type)
+ del item['raw_pmt_wav']
+ del item['raw_vocal_wav']
+ del item['raw_bgm_wav']
+ else:
+ if gen_type == 'separate':
+ wav_vocal = model.generate_audio(item['tokens'], chunked=True, gen_type='vocal')
+ wav_bgm = model.generate_audio(item['tokens'], chunked=True, gen_type='bgm')
+ wav_seperate = model.generate_audio(item['tokens'], chunked=True, gen_type='mixed')
+ else:
+ wav_seperate = model.generate_audio(item['tokens'], chunked=True, gen_type=gen_type)
+ if gen_type == 'separate':
+ torchaudio.save(item['wav_path'].replace('.flac', '_vocal.flac'), wav_vocal[0].cpu().float(), cfg.sample_rate)
+ torchaudio.save(item['wav_path'].replace('.flac', '_bgm.flac'), wav_bgm[0].cpu().float(), cfg.sample_rate)
+ torchaudio.save(item['wav_path'], wav_seperate[0].cpu().float(), cfg.sample_rate)
+ else:
+ torchaudio.save(item['wav_path'], wav_seperate[0].cpu().float(), cfg.sample_rate)
+ del item['tokens']
+ del item['pmt_wav']
+ del item['vocal_wav']
+ del item['bgm_wav']
+ del item['melody_is_wav']
+ if offload_wav_tokenizer_diffusion:
+ sep_offload_profiler.reset_empty_cache_mem_line()
+
+ if offload_wav_tokenizer_diffusion:
+ sep_offload_profiler.stop()
+ torch.cuda.empty_cache()
+ src_jsonl_name = os.path.split(input_jsonl)[-1]
+ with open(f"{save_dir}/jsonl/{src_jsonl_name}.jsonl", "w", encoding='utf-8') as fw:
+ for item in new_items:
+ fw.writelines(json.dumps(item, ensure_ascii=False)+"\n")
+
+
+if __name__ == "__main__":
+ torch.backends.cudnn.enabled = False
+ OmegaConf.register_new_resolver("eval", lambda x: eval(x))
+ OmegaConf.register_new_resolver("concat", lambda *x: [xxx for xx in x for xxx in xx])
+ OmegaConf.register_new_resolver("get_fname", lambda: os.path.splitext(os.path.basename(sys.argv[1]))[0])
+ OmegaConf.register_new_resolver("load_yaml", lambda x: list(OmegaConf.load(x)))
+ np.random.seed(int(time.time()))
+ # Parse command line arguments
+ args = parse_args()
+ if torch.cuda.is_available():
+ device = torch.cuda.current_device()
+ reserved = torch.cuda.memory_reserved(device)
+ total = torch.cuda.get_device_properties(device).total_memory
+ res_mem = (total - reserved) / 1024 / 1024 / 1024
+ print(f"reserved memory: {res_mem}GB")
+
+ model_name = args.ckpt_path.split("/")[-1].lower().replace('-', '_')
+ assert model_name in ['songgeneration_base', 'songgeneration_base_new', 'songgeneration_base_full', 'songgeneration_large'], f'{model_name} is not supported, currently only songgeneration_base, songgeneration_base_new, songgeneration_base_full, songgeneration_large are supported. Please download correct files and rename the folder to the corresponding version name.'
+ if model_name == 'songgeneration_base' or model_name == 'songgeneration_base_new' or model_name == 'songgeneration_base_full':
+ if res_mem > 24 and not args.low_mem:
+ print("use generate")
+ generate(args)
+ else:
+ from codeclm.utils.offload_profiler import OffloadProfiler, OffloadParamParse
+ print("use generate_lowmem")
+ generate_lowmem(args)
+ elif model_name == 'songgeneration_large':
+ if res_mem > 36 and not args.low_mem:
+ print("use generate")
+ generate(args)
+ else:
+ print("use generate_lowmem")
+ from codeclm.utils.offload_profiler import OffloadProfiler, OffloadParamParse
+ generate_lowmem(args)
+
+
+ # elif model_name == 'songgeneration_base_full':
+
+ else:
+ print("CUDA is not available")
+ exit()
+
diff --git a/baseline_generate/mureka_o2/__pycache__/generate.cpython-311.pyc b/baseline_generate/mureka_o2/__pycache__/generate.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..366a6d653a91d2ec594e2189141796bea5a0a0d2
Binary files /dev/null and b/baseline_generate/mureka_o2/__pycache__/generate.cpython-311.pyc differ
diff --git a/baseline_generate/mureka_o2/generate.py b/baseline_generate/mureka_o2/generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..11346cef28ea594070c975b2b93fc07e5770f20d
--- /dev/null
+++ b/baseline_generate/mureka_o2/generate.py
@@ -0,0 +1,390 @@
+#!/usr/bin/env python3
+"""
+Script for batch generating songs using Mureka API
+Processes the first 100 songs in cleaned_data_no_desc.json file
+"""
+
+import json
+import os
+import time
+import requests
+from typing import Dict, List, Optional
+from pathlib import Path
+
+# API Configuration
+API_URL = "https://api.mureka.cn/v1/song/generate"
+QUERY_API_URL = "https://api.mureka.cn/v1/song/query"
+API_KEY_ENV = "MUREKA_API_KEY"
+MODEL = "mureka-o2"
+
+# Configuration Parameters
+MAX_SONGS = 100
+RETRY_TIMES = 3
+RETRY_DELAY = 2 # seconds
+REQUEST_DELAY = 60 # Delay between requests (seconds) - set to 60 seconds (1 minute) to avoid rate limiting
+RATE_LIMIT_DELAY = 60 # Wait time when encountering 429 error (seconds)
+QUERY_INTERVAL = 10 # Interval for querying task status (seconds)
+MAX_QUERY_TIME = 3600 # Maximum query time (seconds), 1 hour
+
+def load_songs(json_file: str, max_count: int = MAX_SONGS) -> List[Dict]:
+ """Load song data from JSON file"""
+ with open(json_file, 'r', encoding='utf-8') as f:
+ data = json.load(f)
+
+ # Only take first max_count songs
+ return data[:max_count]
+
+def is_song_processed(output_file: Path) -> bool:
+ """
+ Check if song has been processed (including completed tasks)
+
+ Args:
+ output_file: Output file path
+
+ Returns:
+ True if file exists and contains valid API response
+ """
+ if not output_file.exists():
+ return False
+
+ try:
+ with open(output_file, 'r', encoding='utf-8') as f:
+ data = json.load(f)
+ # Check if contains api_response field
+ if 'api_response' in data and data['api_response']:
+ status = data['api_response'].get('status', '')
+ # If status is succeeded, failed, timeouted or cancelled, consider as processed
+ if status in ['succeeded', 'failed', 'timeouted', 'cancelled']:
+ return True
+ # If status is preparing, queued, running, streaming, reviewing, also consider as processed (task created)
+ if status in ['preparing', 'queued', 'running', 'streaming', 'reviewing']:
+ return True
+ except (json.JSONDecodeError, KeyError, IOError):
+ # File corrupted or format incorrect, consider as not processed
+ return False
+
+ return False
+
+def load_processed_song(output_file: Path) -> Optional[Dict]:
+ """
+ Load processed song results from existing file
+
+ Args:
+ output_file: Output file path
+
+ Returns:
+ Processed song data, returns None if loading fails
+ """
+ try:
+ with open(output_file, 'r', encoding='utf-8') as f:
+ data = json.load(f)
+ return data.get('api_response')
+ except (json.JSONDecodeError, KeyError, IOError):
+ return None
+
+def query_task_status(task_id: str, api_key: str) -> Optional[Dict]:
+ """
+ Query task status
+
+ Args:
+ task_id: Task ID
+ api_key: API key
+
+ Returns:
+ Task status data, returns None on failure
+ """
+ headers = {
+ "Authorization": f"Bearer {api_key}"
+ }
+
+ url = f"{QUERY_API_URL}/{task_id}"
+
+ try:
+ response = requests.get(url, headers=headers, timeout=30)
+ response.raise_for_status()
+ return response.json()
+ except requests.exceptions.RequestException as e:
+ print(f" Failed to query task status: {str(e)}")
+ return None
+
+def wait_for_task_completion(task_id: str, api_key: str) -> Optional[Dict]:
+ """
+ Wait for task completion and return final result
+
+ Args:
+ task_id: Task ID
+ api_key: API key
+
+ Returns:
+ Complete data after task completion, returns None on failure
+ """
+ start_time = time.time()
+ last_status = None
+
+ print(f" Waiting for task completion (Task ID: {task_id})...")
+
+ while time.time() - start_time < MAX_QUERY_TIME:
+ result = query_task_status(task_id, api_key)
+
+ if not result:
+ time.sleep(QUERY_INTERVAL)
+ continue
+
+ status = result.get('status', '')
+
+ # If status changed, print new status
+ if status != last_status:
+ print(f" Status: {status}")
+ last_status = status
+
+ # Task completed (success or failure)
+ if status in ['succeeded', 'failed', 'timeouted', 'cancelled']:
+ if status == 'succeeded':
+ print(f" ✓ Task completed!")
+ if 'choices' in result and result['choices']:
+ print(f" Found {len(result['choices'])} generated songs")
+ else:
+ print(f" ✗ Task failed: {status}")
+ if 'failed_reason' in result:
+ print(f" Failure reason: {result['failed_reason']}")
+ return result
+
+ # Task still processing, continue waiting
+ time.sleep(QUERY_INTERVAL)
+
+ print(f" ⚠ Query timeout (exceeded {MAX_QUERY_TIME} seconds)")
+ return None
+
+def generate_song(lyrics: str, prompt: str, api_key: str) -> Optional[Dict]:
+ """
+ Call API to generate a single song (serial processing, ensuring concurrency = 1)
+
+ Args:
+ lyrics: Lyrics content
+ prompt: Prompt (corresponds to description)
+ api_key: API key
+
+ Returns:
+ API response data, returns None on failure
+ """
+ headers = {
+ "Authorization": f"Bearer {api_key}",
+ "Content-Type": "application/json"
+ }
+
+ payload = {
+ "lyrics": lyrics,
+ "model": MODEL,
+ "prompt": prompt
+ }
+
+ for attempt in range(RETRY_TIMES):
+ try:
+ response = requests.post(API_URL, headers=headers, json=payload, timeout=300)
+
+ # Check if it's a 429 error (rate limit)
+ if response.status_code == 429:
+ # Try to get Retry-After time from response header
+ retry_after = response.headers.get('Retry-After')
+ if retry_after:
+ wait_time = int(retry_after)
+ else:
+ wait_time = RATE_LIMIT_DELAY
+
+ print(f" Attempt {attempt + 1}/{RETRY_TIMES} failed: 429 Too Many Requests")
+ print(f" Waiting {wait_time} seconds before retry...")
+ if attempt < RETRY_TIMES - 1:
+ time.sleep(wait_time)
+ continue
+ else:
+ print(f" All retries failed")
+ return None
+
+ response.raise_for_status()
+ return response.json()
+
+ except requests.exceptions.HTTPError as e:
+ if e.response and e.response.status_code == 429:
+ # 429 error already handled above
+ continue
+ print(f" Attempt {attempt + 1}/{RETRY_TIMES} failed: {str(e)}")
+ if attempt < RETRY_TIMES - 1:
+ time.sleep(RETRY_DELAY)
+ else:
+ print(f" All retries failed")
+ return None
+ except requests.exceptions.RequestException as e:
+ print(f" Attempt {attempt + 1}/{RETRY_TIMES} failed: {str(e)}")
+ if attempt < RETRY_TIMES - 1:
+ time.sleep(RETRY_DELAY)
+ else:
+ print(f" All retries failed")
+ return None
+
+ return None
+
+def main():
+ """Main function"""
+ # Check API key
+ api_key = os.getenv(API_KEY_ENV)
+ if not api_key:
+ print(f"Error: Please set environment variable {API_KEY_ENV}")
+ print(f"Example: export {API_KEY_ENV}=your_api_key")
+ return
+
+ # Load song data
+ json_file = "cleaned_data_no_desc.json"
+ if not os.path.exists(json_file):
+ print(f"Error: File not found {json_file}")
+ return
+
+ print(f"Loading song data from {json_file}...")
+ songs = load_songs(json_file, MAX_SONGS)
+ print(f"Loaded {len(songs)} songs")
+
+ # Create output directory
+ output_dir = Path("generated_songs")
+ output_dir.mkdir(exist_ok=True)
+
+ # List to save results
+ results = []
+
+ # Process each song (serial processing, ensuring concurrency = 1)
+ for idx, song in enumerate(songs, 1):
+ print(f"\n[{idx}/{len(songs)}] Processing song...")
+ print(f" Description: {song.get('description', 'N/A')[:50]}...")
+
+ # Check output file path
+ output_file = output_dir / f"song_{idx:03d}.json"
+
+ # Check if already processed
+ if is_song_processed(output_file):
+ print(f" ⊙ Already processed, skipping")
+ # Load processed results from file
+ existing_result = load_processed_song(output_file)
+ results.append({
+ "index": idx,
+ "status": "already_processed",
+ "output_file": str(output_file)
+ })
+ # Already processed songs don't need delay
+ continue
+
+ lyrics = song.get('lyrics', '')
+ description = song.get('description', '')
+
+ if not lyrics or not description:
+ print(f" Skipping: missing lyrics or description")
+ results.append({
+ "index": idx,
+ "status": "skipped",
+ "reason": "missing data"
+ })
+ continue
+
+ # Call API (serial execution, ensuring concurrency = 1)
+ result = generate_song(lyrics, description, api_key)
+
+ if result:
+ task_id = result.get('id')
+ initial_status = result.get('status', '')
+ print(f" ✓ Task created (ID: {task_id}, Status: {initial_status})")
+
+ # If task status is not final, wait for task completion
+ if initial_status not in ['succeeded', 'failed', 'timeouted', 'cancelled']:
+ final_result = wait_for_task_completion(task_id, api_key)
+ if final_result:
+ result = final_result
+ else:
+ # Query timeout, use initial result
+ print(f" ⚠ Using initial result (query timeout)")
+
+ # Save single result (including final status and choices)
+ with open(output_file, 'w', encoding='utf-8') as f:
+ json.dump({
+ "index": idx,
+ "original_data": song,
+ "api_response": result,
+ "task_id": task_id
+ }, f, ensure_ascii=False, indent=2)
+
+ # Check if successfully completed
+ final_status = result.get('status', '')
+ if final_status == 'succeeded':
+ results.append({
+ "index": idx,
+ "status": "success",
+ "output_file": str(output_file),
+ "task_id": task_id,
+ "has_audio": 'choices' in result and len(result.get('choices', [])) > 0
+ })
+ elif final_status in ['failed', 'timeouted', 'cancelled']:
+ results.append({
+ "index": idx,
+ "status": "failed",
+ "output_file": str(output_file),
+ "task_id": task_id,
+ "failed_reason": result.get('failed_reason', final_status)
+ })
+ else:
+ # Task still processing
+ results.append({
+ "index": idx,
+ "status": "processing",
+ "output_file": str(output_file),
+ "task_id": task_id,
+ "current_status": final_status
+ })
+ else:
+ print(f" ✗ Generation failed")
+ # Save failure information, including original data
+ error_file = output_dir / f"song_{idx:03d}_error.json"
+ with open(error_file, 'w', encoding='utf-8') as f:
+ json.dump({
+ "index": idx,
+ "original_data": song,
+ "error": "API call failed, generate_song returned None",
+ "timestamp": time.time()
+ }, f, ensure_ascii=False, indent=2)
+
+ results.append({
+ "index": idx,
+ "status": "failed",
+ "error_file": str(error_file),
+ "reason": "API call failed"
+ })
+
+ # Delay between requests to avoid rate limiting (ensuring concurrency = 1)
+ if idx < len(songs):
+ print(f" Waiting {REQUEST_DELAY} seconds before processing next song...")
+ time.sleep(REQUEST_DELAY)
+
+ # Save summary results
+ summary_file = output_dir / "summary.json"
+ with open(summary_file, 'w', encoding='utf-8') as f:
+ json.dump({
+ "total": len(songs),
+ "success": sum(1 for r in results if r.get("status") == "success"),
+ "processing": sum(1 for r in results if r.get("status") == "processing"),
+ "already_processed": sum(1 for r in results if r.get("status") == "already_processed"),
+ "failed": sum(1 for r in results if r.get("status") == "failed"),
+ "skipped": sum(1 for r in results if r.get("status") == "skipped"),
+ "results": results
+ }, f, ensure_ascii=False, indent=2)
+
+ # Print statistics
+ print(f"\n{'='*50}")
+ print(f"Processing complete!")
+ print(f"Total: {len(songs)} songs")
+ print(f"Successfully completed: {sum(1 for r in results if r.get('status') == 'success')} songs")
+ print(f"Processing: {sum(1 for r in results if r.get('status') == 'processing')} songs")
+ print(f"Already processed: {sum(1 for r in results if r.get('status') == 'already_processed')} songs")
+ print(f"Failed: {sum(1 for r in results if r.get('status') == 'failed')} songs")
+ print(f"Skipped: {sum(1 for r in results if r.get('status') == 'skipped')} songs")
+ print(f"Results saved in: {output_dir}/")
+ print(f"Summary file: {summary_file}")
+ print(f"\nTip: If tasks are still processing, you can rerun the script later to check status")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/baseline_generate/suno/__pycache__/suno_4_5.cpython-311.pyc b/baseline_generate/suno/__pycache__/suno_4_5.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f6029b98f4b768a98dcfe2b925975f5b8413dd92
Binary files /dev/null and b/baseline_generate/suno/__pycache__/suno_4_5.cpython-311.pyc differ
diff --git a/baseline_generate/suno/__pycache__/suno_5.cpython-311.pyc b/baseline_generate/suno/__pycache__/suno_5.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..446fe2e51f8dd6117b138e593b58e61394a93784
Binary files /dev/null and b/baseline_generate/suno/__pycache__/suno_5.cpython-311.pyc differ
diff --git a/baseline_generate/suno/config.py b/baseline_generate/suno/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d622c71d7897f6452fa7a03e5003a69696d1c4a
--- /dev/null
+++ b/baseline_generate/suno/config.py
@@ -0,0 +1,70 @@
+"""
+Configuration file example
+Copy this file as config.py and fill in your actual configuration
+"""
+
+# ============== API Configuration ==============
+
+# Suno API key (obtain from https://sunoapi.org)
+SUNO_API_KEY = ""
+
+# API base URL (usually no need to modify)
+SUNO_API_BASE_URL = "https://api.sunoapi.org"
+
+# ============== Generation Configuration ==============
+
+# Default model version
+DEFAULT_MODEL_VERSION = "V5" # Options: V3_5, V4, V4_5, V4_5PLUS, V5
+
+# Whether to enable custom mode
+DEFAULT_CUSTOM_MODE = True
+
+# Whether to generate instrumental by default
+DEFAULT_INSTRUMENTAL = False
+
+# ============== Task Configuration ==============
+
+# Maximum wait time (seconds)
+MAX_WAIT_TIME = 300
+
+# Check interval (seconds)
+CHECK_INTERVAL = 10
+
+# Retry count
+MAX_RETRIES = 3
+
+# ============== File Configuration ==============
+
+# Music file save directory
+OUTPUT_DIRECTORY = "./generated_music"
+
+# Audio format
+AUDIO_FORMAT = "mp3" # Options: mp3, wav
+
+# ============== Batch Generation Configuration ==============
+
+# Concurrency for batch generation
+BATCH_CONCURRENCY = 5
+
+# Batch generation delay (seconds, to avoid rate limiting)
+BATCH_DELAY = 2
+
+# ============== Logging Configuration ==============
+
+# Log level
+LOG_LEVEL = "INFO" # Options: DEBUG, INFO, WARNING, ERROR
+
+# Log file path
+LOG_FILE = "./suno_api.log"
+
+# Whether to output to console
+LOG_TO_CONSOLE = True
+
+# ============== Webhook Configuration ==============
+
+# Webhook callback URL (optional)
+WEBHOOK_URL = None # Example: "https://your-domain.com/webhook"
+
+# Webhook secret (for verifying callback requests)
+WEBHOOK_SECRET = None
+
diff --git a/baseline_generate/suno/suno_4_5.py b/baseline_generate/suno/suno_4_5.py
new file mode 100644
index 0000000000000000000000000000000000000000..c9490b6caee905fa5ab38f12cdee0b01e803f723
--- /dev/null
+++ b/baseline_generate/suno/suno_4_5.py
@@ -0,0 +1,766 @@
+# -*- coding: utf-8 -*-
+"""
+Suno API Batch Generation - V4.5 Special Edition
+Supported models: V4_5 (default), V4_5PLUS, V4_5ALL
+"""
+import json
+import time
+import requests
+import os
+import logging
+import csv
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from collections import deque
+from threading import Lock, Semaphore
+from tqdm import tqdm
+import sys
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from config import SUNO_API_KEY
+
+
+# Configure logging
+def setup_logging(output_dir):
+ log_file = os.path.join(output_dir, f"run_log_v4_5_{time.strftime('%Y%m%d_%H%M%S')}.txt")
+
+ # Create logger
+ logger = logging.getLogger('SunoBatchV4_5')
+ logger.setLevel(logging.INFO)
+
+ # Clear old handlers
+ if logger.hasHandlers():
+ logger.handlers.clear()
+
+ # File Handler
+ file_handler = logging.FileHandler(log_file, encoding='utf-8')
+ file_handler.setFormatter(logging.Formatter('%(message)s'))
+ logger.addHandler(file_handler)
+
+ # Console Handler
+ console_handler = logging.StreamHandler()
+ console_handler.setFormatter(logging.Formatter('%(message)s'))
+ logger.addHandler(console_handler)
+
+ return logger, log_file
+
+# Global logger
+logger = logging.getLogger('SunoBatchV4_5')
+
+# Replace print with logger.info
+def print_log(msg):
+ logger.info(msg)
+
+
+class SunoAPI:
+ """Simplified Suno API client"""
+
+ def __init__(self, api_key):
+ self.api_key = api_key
+ self.base_url = 'https://api.sunoapi.org/api/v1'
+ self.headers = {
+ 'Authorization': f'Bearer {api_key}',
+ 'Content-Type': 'application/json'
+ }
+
+ # Configure retry strategy
+ self.session = requests.Session()
+ retry_strategy = Retry(
+ total=5, # Maximum retry count
+ backoff_factor=1, # Retry interval (1s, 2s, 4s, 8s...)
+ status_forcelist=[500, 502, 503, 504], # Status codes that need retry
+ allowed_methods=["HEAD", "GET", "POST", "OPTIONS"] # Allowed retry methods
+ )
+ adapter = HTTPAdapter(max_retries=retry_strategy)
+ self.session.mount("https://", adapter)
+ self.session.mount("http://", adapter)
+
+ def generate_music(self, prompt, model='V4_5', vocalGender=None, **options):
+ """Generate music"""
+ payload = {
+ 'prompt': prompt,
+ 'model': model,
+ 'callBackUrl': 'https://example.com/callback',
+ **options
+ }
+
+ if vocalGender:
+ payload['vocalGender'] = vocalGender
+
+ try:
+ response = self.session.post(
+ f'{self.base_url}/generate',
+ headers=self.headers,
+ json=payload,
+ timeout=30
+ )
+
+ # Check HTTP errors
+ response.raise_for_status()
+
+ # Try to parse JSON
+ try:
+ result = response.json()
+ except json.JSONDecodeError:
+ raise Exception(f"API returned non-JSON response: {response.text[:200]}")
+
+ if result.get('code') != 200:
+ raise Exception(f"Generation failed: {result.get('msg', result)}")
+
+ return result['data']['taskId']
+
+ except requests.exceptions.RequestException as e:
+ raise Exception(f"Request exception: {str(e)}")
+
+ def get_task_status(self, task_id):
+ """Get task status"""
+ try:
+ response = self.session.get(
+ f'{self.base_url}/generate/record-info?taskId={task_id}',
+ headers={'Authorization': f'Bearer {self.api_key}'},
+ timeout=30
+ )
+ response.raise_for_status()
+ return response.json().get('data', {})
+ except Exception as e:
+ # Status query failure should not crash the program, return empty dict or throw specific exception
+ # print_log(f"Failed to get status: {e}")
+ raise e
+
+ def get_timestamped_lyrics(self, task_id, audio_id):
+ """Get timestamped lyrics"""
+ payload = {
+ 'taskId': task_id,
+ 'audioId': audio_id
+ }
+
+ try:
+ response = self.session.post(
+ f'{self.base_url}/generate/get-timestamped-lyrics',
+ headers=self.headers,
+ json=payload,
+ timeout=30
+ )
+ response.raise_for_status()
+ return response.json()
+ except Exception:
+ return {} # Lyrics retrieval failure is non-fatal error
+
+ def wait_for_completion(self, task_id, max_wait_time=600, check_interval=5):
+ """Wait for task completion, return result and polling statistics"""
+ start_time = time.time()
+ poll_count = 0
+ total_poll_time = 0
+
+ while time.time() - start_time < max_wait_time:
+ try:
+ poll_start = time.time()
+ status = self.get_task_status(task_id)
+ poll_time = time.time() - poll_start
+ poll_count += 1
+ total_poll_time += poll_time
+
+ current_status = status.get('status')
+
+ if current_status == 'SUCCESS':
+ return {
+ 'result': status.get('response'),
+ 'wait_time': time.time() - start_time,
+ 'poll_count': poll_count,
+ 'avg_poll_time': total_poll_time / poll_count if poll_count > 0 else 0
+ }
+ elif current_status == 'FAILED':
+ raise Exception(f"Task failed: {status.get('errorMessage')}")
+
+ time.sleep(check_interval)
+ except Exception as e:
+ if time.time() - start_time >= max_wait_time:
+ raise
+ time.sleep(check_interval)
+
+ raise Exception('Task timeout')
+
+ def download_file(self, url, save_path):
+ """Download file to local, return download statistics"""
+ try:
+ start_time = time.time()
+ downloaded_bytes = 0
+
+ # Use session to download
+ with self.session.get(url, stream=True, timeout=60) as r:
+ r.raise_for_status()
+ with open(save_path, 'wb') as f:
+ for chunk in r.iter_content(chunk_size=8192):
+ f.write(chunk)
+ downloaded_bytes += len(chunk)
+
+ download_time = time.time() - start_time
+ return {
+ 'success': True,
+ 'bytes': downloaded_bytes,
+ 'time': download_time,
+ 'speed': downloaded_bytes / download_time if download_time > 0 else 0
+ }
+ except Exception as e:
+ print_log(f"Download failed {url}: {e}")
+ return {'success': False, 'error': str(e)}
+
+
+# Result record lock
+result_lock = Lock()
+
+def save_result_record(output_dir, record):
+ """Save single result to CSV in real-time"""
+ file_path = os.path.join(output_dir, "generation_results.csv")
+ file_exists = os.path.isfile(file_path)
+
+ # Only record key information
+ row = {
+ 'song_id': record.get('song_id'),
+ 'task_id': record.get('task_id'),
+ 'status': 'SUCCESS' if record.get('success') else 'FAILED',
+ 'error': record.get('error', ''),
+ 'submit_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(record.get('submit_time', 0))),
+ 'total_time': f"{record.get('total_time', 0):.1f}",
+ 'tracks_count': record.get('tracks_count', 0)
+ }
+
+ with result_lock:
+ with open(file_path, 'a', newline='', encoding='utf-8') as f:
+ writer = csv.DictWriter(f, fieldnames=['song_id', 'task_id', 'status', 'error', 'submit_time', 'total_time', 'tracks_count'])
+ if not file_exists:
+ writer.writeheader()
+ writer.writerow(row)
+
+
+class ImprovedRateLimiter:
+ """Improved rate limiter (with statistics)
+
+ Precise control: maximum 8 requests per 10 seconds
+ Uses sliding window algorithm to ensure no more than 8 requests in any 10-second time window
+ """
+
+ def __init__(self, max_requests=5, time_window=10):
+ self.max_requests = max_requests
+ self.time_window = time_window
+ self.request_times = deque()
+ self.lock = Lock()
+ self.semaphore = Semaphore(max_requests)
+
+ # Statistics
+ self.total_wait_time = 0
+ self.wait_count = 0
+ self.total_requests = 0
+
+ def acquire(self):
+ """Acquire request permission"""
+ with self.lock:
+ now = time.time()
+
+ # Clean expired request records
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+
+ # If limit reached, calculate wait time needed
+ wait_time = 0
+ if len(self.request_times) >= self.max_requests:
+ oldest_request = self.request_times[0]
+ wait_time = self.time_window - (now - oldest_request) + 0.05 # Add buffer
+
+ if wait_time > 0:
+ print_log(f" [Rate Limit] Waiting {wait_time:.2f} seconds...")
+ time.sleep(wait_time)
+
+ # Record wait time
+ self.total_wait_time += wait_time
+ self.wait_count += 1
+
+ # Re-clean
+ now = time.time()
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+
+ # Record this request time
+ self.request_times.append(time.time())
+ self.total_requests += 1
+
+ def get_current_rate(self):
+ """Get current rate (number of requests in last 10 seconds)"""
+ with self.lock:
+ now = time.time()
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+ return len(self.request_times)
+
+ def get_stats(self):
+ """Get statistics"""
+ with self.lock:
+ return {
+ 'total_requests': self.total_requests,
+ 'total_wait_time': self.total_wait_time,
+ 'wait_count': self.wait_count,
+ 'avg_wait_time': self.total_wait_time / self.wait_count if self.wait_count > 0 else 0
+ }
+
+
+# Global rate limiter (5 requests per 10 seconds)
+rate_limiter = ImprovedRateLimiter(max_requests=5, time_window=10)
+
+
+def submit_generation_task(api, song_index, data):
+ """Phase 1: Submit generation task (rate limited)"""
+ # Use sunov4_5_000001 format
+ song_id = data.get("id", f"sunov4_5_{song_index:06d}")
+
+ try:
+ description = data.get("description", "")
+ lyrics = data.get("lyrics", "")
+ vocal_gender = data.get("vocalGender")
+
+ print_log(f"[Song {song_id}] Submitting task... (current rate: {rate_limiter.get_current_rate()}/5)")
+
+ # Record request start time
+ request_start = time.time()
+
+ # Rate limiting
+ rate_limiter.acquire()
+
+ # Submit task
+ submit_start = time.time()
+ task_id = api.generate_music(
+ prompt=lyrics,
+ style=description,
+ title=f"Song_{song_id}",
+ model='V4_5', # Explicitly specify V4.5 model
+ customMode=True,
+ instrumental=False,
+ vocalGender=vocal_gender
+ )
+ request_time = time.time() - submit_start
+
+ print_log(f"[Song {song_id}] ✓ Task submitted, ID: {task_id}")
+
+ return {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'data': data,
+ 'submit_time': time.time(),
+ 'request_time': request_time,
+ 'success': True
+ }
+
+ except Exception as e:
+ print_log(f"[Song {song_id}] ✗ Submission failed: {e}")
+ # If submission fails, also record it (even though not at download stage yet)
+ return {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'success': False,
+ 'error': str(e)
+ }
+
+
+def wait_and_download_result(api, task_info, output_dir):
+ """Phase 2: Wait for result and download (not rate limited)"""
+ if not task_info['success']:
+ return task_info
+
+ song_id = task_info['song_id']
+ song_index = task_info['song_index']
+ task_id = task_info['task_id']
+ data = task_info['data']
+ start_time = task_info['submit_time']
+
+ try:
+ original_lyrics = data.get("original_lyrics", data.get("lyrics", ""))
+ lyrics = data.get("lyrics", "")
+ description = data.get("description", "")
+
+ print_log(f"[Song {song_id}] Waiting for generation to complete...")
+
+ # Wait for completion (returns detailed statistics)
+ wait_result = api.wait_for_completion(task_id, max_wait_time=600, check_interval=8)
+ result = wait_result['result']
+
+ # Process returned result
+ tracks = []
+ if isinstance(result, dict):
+ if 'data' in result:
+ tracks = result['data']
+ elif 'sunoData' in result:
+ tracks = result['sunoData']
+ else:
+ for key, value in result.items():
+ if isinstance(value, list) and len(value) > 0 and 'audioUrl' in value[0]:
+ tracks = value
+ break
+
+ if not tracks:
+ raise Exception("Audio track data not found")
+
+ # Download phase statistics
+ download_start = time.time()
+ downloaded_files = []
+ total_download_bytes = 0
+ download_count = 0
+
+ # Process each track
+ for track_idx, track in enumerate(tracks):
+ audio_url = track.get('audioUrl') or track.get('audio_url')
+ audio_id = track.get('id')
+
+ base_filename = f"{song_id}_{track_idx}"
+ audio_path = os.path.join(output_dir, f"{base_filename}.mp3")
+ lyrics_path = os.path.join(output_dir, f"{base_filename}_lyrics.json")
+
+ # Download audio
+ if audio_url:
+ download_result = api.download_file(audio_url, audio_path)
+ if download_result['success']:
+ downloaded_files.append(audio_path)
+ total_download_bytes += download_result['bytes']
+ download_count += 1
+
+ # Get timestamped lyrics
+ timestamped_lyrics_data = None
+ if audio_id:
+ try:
+ lyrics_response = api.get_timestamped_lyrics(task_id, audio_id)
+ if lyrics_response.get('code') == 200:
+ timestamped_lyrics_data = lyrics_response.get('data')
+ except Exception as e:
+ print_log(f"[Song {song_id}] Track {track_idx+1}: Failed to get lyrics: {e}")
+
+ # Save lyrics and metadata
+ lyrics_content = {
+ "song_id": song_id,
+ "song_index": song_index,
+ "track_index": track_idx,
+ "original_lyrics": original_lyrics,
+ "cleaned_lyrics": lyrics,
+ "timestamped_lyrics": timestamped_lyrics_data,
+ "style": description,
+ "full_track_data": track
+ }
+
+ with open(lyrics_path, 'w', encoding='utf-8') as f:
+ json.dump(lyrics_content, f, ensure_ascii=False, indent=2)
+ downloaded_files.append(lyrics_path)
+
+ download_time = time.time() - download_start
+ total_time = time.time() - start_time
+
+ print_log(f"[Song {song_id}] ✓ Complete! {len(tracks)} tracks, took {total_time:.1f} seconds")
+
+ final_result = {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'success': True,
+ 'tracks_count': len(tracks),
+ 'files': downloaded_files,
+ 'total_time': total_time,
+ 'submit_time': start_time,
+ 'wait_time': wait_result['wait_time'],
+ 'poll_count': wait_result['poll_count'],
+ 'avg_poll_time': wait_result['avg_poll_time'],
+ 'download_time': download_time,
+ 'download_bytes': total_download_bytes,
+ 'download_count': download_count,
+ 'avg_download_speed': total_download_bytes / download_time if download_time > 0 else 0
+ }
+
+ # Save result in real-time
+ save_result_record(output_dir, final_result)
+ return final_result
+
+ except Exception as e:
+ total_time = time.time() - start_time
+ print_log(f"[Song {song_id}] ✗ Processing failed: {e} (took {total_time:.1f} seconds)")
+
+ error_result = {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'success': False,
+ 'error': str(e),
+ 'total_time': total_time,
+ 'submit_time': start_time
+ }
+
+ # Save result in real-time
+ save_result_record(output_dir, error_result)
+ return error_result
+
+
+def format_bytes(bytes_size):
+ """Format byte size"""
+ for unit in ['B', 'KB', 'MB', 'GB']:
+ if bytes_size < 1024.0:
+ return f"{bytes_size:.2f} {unit}"
+ bytes_size /= 1024.0
+ return f"{bytes_size:.2f} TB"
+
+
+def format_speed(bytes_per_sec):
+ """Format speed"""
+ return f"{format_bytes(bytes_per_sec)}/s"
+
+
+def main():
+ """Main program - two-phase concurrent processing"""
+ input_file = "cleaned_data_truncated.json"
+ output_dir = "sunov4_5_truncated"
+ # Create output directory
+ os.makedirs(output_dir, exist_ok=True)
+
+ # Initialize logging
+ global logger
+ logger, log_file = setup_logging(output_dir)
+
+ print_log("=" * 70)
+ print_log("Suno API Batch Generation - V4.5 Special Edition")
+ print_log("Strategy: Fast submission (5 requests/10s) + Parallel waiting + Detailed performance analysis")
+ print_log(f"Log file: {log_file}")
+ print_log("=" * 70)
+
+ # Read input file
+ try:
+ all_data = []
+ if input_file.endswith('.jsonl'):
+ try:
+ with open(input_file, 'r', encoding='utf-8') as f:
+ # Try reading first line to determine format
+ first_line = f.readline().strip()
+ if first_line.startswith('['):
+ # Looks like regular JSON array
+ f.seek(0)
+ all_data = json.load(f)
+ else:
+ # Try reading line by line
+ f.seek(0)
+ for i, line in enumerate(f):
+ line = line.strip()
+ if line:
+ all_data.append(json.loads(line))
+ except json.JSONDecodeError:
+ # If above parsing fails, try one final read as regular JSON
+ print_log(f"Note: Failed to parse {input_file} as JSONL format, trying as regular JSON...")
+ with open(input_file, 'r', encoding='utf-8') as f:
+ all_data = json.load(f)
+ else:
+ with open(input_file, 'r', encoding='utf-8') as f:
+ all_data = json.load(f)
+
+ except FileNotFoundError:
+ print_log(f"File {input_file} not found.")
+ return
+ except json.JSONDecodeError as e:
+ print_log(f"JSON parsing error: {e}")
+ return
+
+ # Initialize API
+ api = SunoAPI(SUNO_API_KEY)
+
+ print_log(f"\nPreparing to generate {len(all_data)} songs...")
+ print_log(f"Start time: {time.strftime('%H:%M:%S')}\n")
+
+ overall_start_time = time.time()
+
+ # ===== Phase 1: Batch Submission =====
+ print_log("\n" + "=" * 70)
+ print_log("Phase 1: Batch Submission")
+ print_log("=" * 70 + "\n")
+
+ submit_start_time = time.time()
+ submitted_tasks = []
+ total_request_time = 0
+
+ # Adjust rate limit: maximum 5 requests per 10 seconds
+ rate_limiter.max_requests = 5
+ rate_limiter.time_window = 10
+ rate_limiter.request_times.clear()
+ print_log(f"Rate limit: {rate_limiter.max_requests} requests / {rate_limiter.time_window} seconds")
+
+ # Only submit tasks that need to run
+ tasks_to_run = []
+ for i, data in enumerate(all_data, 1):
+ tasks_to_run.append((i, data))
+
+ print_log(f"Number of tasks to submit: {len(tasks_to_run)}")
+
+ # Use thread pool for submission
+ # Submission concurrency is controlled by rate_limiter, can be set to 5
+ with ThreadPoolExecutor(max_workers=5) as executor:
+ submit_futures = {
+ executor.submit(submit_generation_task, api, idx, data): idx
+ for idx, data in tasks_to_run
+ }
+
+ with tqdm(total=len(tasks_to_run), desc="Submitting tasks", unit="song") as pbar:
+ for future in as_completed(submit_futures):
+ result = future.result()
+ submitted_tasks.append(result)
+ if result.get('success') and 'request_time' in result:
+ total_request_time += result['request_time']
+ pbar.update(1)
+
+ submit_phase_time = time.time() - submit_start_time
+ success_submits = sum(1 for t in submitted_tasks if t['success'])
+
+ # Get rate limit statistics
+ rate_limit_stats = rate_limiter.get_stats()
+
+ print_log(f"\nSubmission phase complete: {success_submits}/{len(tasks_to_run)} successful")
+ print_log(f" Total time: {submit_phase_time:.1f} seconds")
+ print_log(f" Actual request time: {total_request_time:.2f} seconds")
+ print_log(f" Rate limit waiting: {rate_limit_stats['total_wait_time']:.2f} seconds ({rate_limit_stats['wait_count']} times)")
+ if rate_limit_stats['wait_count'] > 0:
+ print_log(f" Average wait time: {rate_limit_stats['avg_wait_time']:.2f} seconds/time")
+
+ # ===== Phase 2: Parallel Waiting and Download =====
+ print_log("\n" + "=" * 70)
+ print_log("Phase 2: Wait for Generation and Download")
+ print_log("=" * 70 + "\n")
+
+ wait_start_time = time.time()
+ final_results = []
+
+ # Use more threads for parallel waiting (not rate limited)
+ with ThreadPoolExecutor(max_workers=20) as executor:
+ download_futures = {
+ executor.submit(wait_and_download_result, api, task, output_dir): task
+ for task in submitted_tasks if task['success']
+ }
+
+ # Add failed submission tasks to results
+ for task in submitted_tasks:
+ if not task['success']:
+ final_results.append(task)
+
+ with tqdm(total=len(download_futures), desc="Downloading results", unit="song") as pbar:
+ for future in as_completed(download_futures):
+ result = future.result()
+ final_results.append(result)
+ pbar.update(1)
+
+ wait_phase_time = time.time() - wait_start_time
+
+ # ===== Detailed Statistics and Report =====
+ overall_time = time.time() - overall_start_time
+
+ print_log("\n" + "=" * 70)
+ print_log("Batch Generation Complete - Detailed Performance Report")
+ print_log("=" * 70)
+
+ success_count = sum(1 for r in final_results if r.get('success'))
+ fail_count = len(final_results) - success_count
+ total_tracks = sum(r.get('tracks_count', 0) for r in final_results if r.get('success'))
+
+ successful_results = [r for r in final_results if r.get('success')]
+
+ # Basic Statistics
+ print_log(f"\n[Basic Statistics]")
+ print_log(f" Total songs: {len(all_data)}")
+ print_log(f" Successful: {success_count}")
+ print_log(f" Failed: {fail_count}")
+ print_log(f" Total tracks: {total_tracks}")
+ if success_count > 0:
+ avg_tracks = total_tracks / success_count
+ print_log(f" Average tracks per song: {avg_tracks:.2f}")
+
+ # Time Statistics
+ print_log(f"\n[Time Statistics]")
+ print_log(f" ├── Submission phase: {submit_phase_time:.1f} seconds")
+ print_log(f" │ ├── Actual request time: {total_request_time:.2f} seconds")
+ print_log(f" │ └── Rate limit waiting: {rate_limit_stats['total_wait_time']:.2f} seconds")
+ print_log(f" ├── Generation waiting phase: {wait_phase_time:.1f} seconds")
+
+ if successful_results:
+ wait_times = [r.get('wait_time', 0) for r in successful_results if 'wait_time' in r]
+ download_times = [r.get('download_time', 0) for r in successful_results if 'download_time' in r]
+
+ if wait_times:
+ avg_wait = sum(wait_times) / len(wait_times)
+ min_wait = min(wait_times)
+ max_wait = max(wait_times)
+ print_log(f" │ ├── Average wait time: {avg_wait:.1f} seconds/song")
+ print_log(f" │ ├── Fastest: {min_wait:.1f} seconds")
+ print_log(f" │ └── Slowest: {max_wait:.1f} seconds")
+
+ if download_times:
+ total_download_time = sum(download_times)
+ avg_download = total_download_time / len(download_times)
+ print_log(f" ├── Download phase: {total_download_time:.1f} seconds")
+ print_log(f" │ └── Average download time: {avg_download:.2f} seconds/song")
+
+ print_log(f" └── Total time: {overall_time:.1f} seconds ({overall_time/60:.1f} minutes)")
+
+ # Single Song Generation Statistics
+ if successful_results:
+ total_times = [r.get('total_time', 0) for r in successful_results if 'total_time' in r]
+ if total_times:
+ print_log(f"\n[Single Song Generation Statistics]")
+ avg_time = sum(total_times) / len(total_times)
+ min_time = min(total_times)
+ max_time = max(total_times)
+ print_log(f" Average total time per song: {avg_time:.1f} seconds")
+ print_log(f" Fastest generation: {min_time:.1f} seconds")
+ print_log(f" Slowest generation: {max_time:.1f} seconds")
+
+ # Download Statistics
+ total_download_bytes = sum(r.get('download_bytes', 0) for r in successful_results)
+ total_download_count = sum(r.get('download_count', 0) for r in successful_results)
+
+ if total_download_bytes > 0:
+ print_log(f"\n[Download Statistics]")
+ print_log(f" Total download: {format_bytes(total_download_bytes)}")
+ print_log(f" Number of files: {total_download_count}")
+ print_log(f" Average file size: {format_bytes(total_download_bytes / total_download_count)}")
+
+ download_speeds = [r.get('avg_download_speed', 0) for r in successful_results if r.get('avg_download_speed', 0) > 0]
+ if download_speeds:
+ avg_speed = sum(download_speeds) / len(download_speeds)
+ print_log(f" Average download speed: {format_speed(avg_speed)}")
+
+ # Polling Statistics
+ poll_counts = [r.get('poll_count', 0) for r in successful_results if 'poll_count' in r]
+ if poll_counts:
+ total_polls = sum(poll_counts)
+ avg_polls = total_polls / len(poll_counts)
+ print_log(f"\n[Polling Statistics]")
+ print_log(f" Total polling count: {total_polls}")
+ print_log(f" Average polling per song: {avg_polls:.1f}")
+
+ # Efficiency Analysis
+ print_log(f"\n[Efficiency Analysis]")
+ if success_count > 0:
+ throughput = success_count / (overall_time / 60)
+ print_log(f" Actual throughput: {throughput:.2f} songs/minute")
+
+ # Theoretical fastest time (assuming no rate limit)
+ if wait_times:
+ ideal_time = submit_phase_time - rate_limit_stats['total_wait_time'] + max(wait_times)
+ efficiency = (ideal_time / overall_time) * 100
+ print_log(f" Theoretical fastest time: {ideal_time:.1f} seconds")
+ print_log(f" Concurrency efficiency: {efficiency:.1f}%")
+
+ # Show failed songs
+ if fail_count > 0:
+ print_log("\n" + "=" * 70)
+ print_log("Failed Songs List")
+ print_log("=" * 70)
+ for r in sorted(final_results, key=lambda x: x.get('song_index', 0)):
+ if not r.get('success'):
+ song_id = r.get('song_id', r.get('song_index', 'Unknown'))
+ print_log(f" [{song_id}] {r.get('error', 'Unknown error')}")
+
+ print_log("\n" + "=" * 70)
+ print_log(f"All files saved to: {os.path.abspath(output_dir)}")
+ print_log("=" * 70)
+
+
+if __name__ == '__main__':
+ main()
+
diff --git a/baseline_generate/suno/suno_5.py b/baseline_generate/suno/suno_5.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b4f65a9ed64b972f8210f612a340bf7e56fdd3f
--- /dev/null
+++ b/baseline_generate/suno/suno_5.py
@@ -0,0 +1,768 @@
+# -*- coding: utf-8 -*-
+"""
+Suno API Batch Generation - V5 Version (5 requests per 10 seconds)
+Changes:
+1. Rate control: 5 requests within 10 seconds
+2. ID format: sunov5_000001
+"""
+import json
+import time
+import requests
+import os
+import logging
+import csv
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from collections import deque
+from threading import Lock, Semaphore
+from tqdm import tqdm
+import sys
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from config import SUNO_API_KEY
+
+
+# Configure logging
+def setup_logging(output_dir):
+ log_file = os.path.join(output_dir, f"run_log_{time.strftime('%Y%m%d_%H%M%S')}.txt")
+
+ # Create logger
+ logger = logging.getLogger('SunoBatchV5')
+ logger.setLevel(logging.INFO)
+
+ # Clear old handlers
+ if logger.hasHandlers():
+ logger.handlers.clear()
+
+ # File Handler
+ file_handler = logging.FileHandler(log_file, encoding='utf-8')
+ file_handler.setFormatter(logging.Formatter('%(message)s'))
+ logger.addHandler(file_handler)
+
+ # Console Handler
+ console_handler = logging.StreamHandler()
+ console_handler.setFormatter(logging.Formatter('%(message)s'))
+ logger.addHandler(console_handler)
+
+ return logger, log_file
+
+# Global logger
+logger = logging.getLogger('SunoBatchV5')
+
+# Replace print with logger.info
+def print_log(msg):
+ logger.info(msg)
+
+
+class SunoAPI:
+ """Simplified Suno API client"""
+
+ def __init__(self, api_key):
+ self.api_key = api_key
+ self.base_url = 'https://api.sunoapi.org/api/v1'
+ self.headers = {
+ 'Authorization': f'Bearer {api_key}',
+ 'Content-Type': 'application/json'
+ }
+
+ # Configure retry strategy
+ self.session = requests.Session()
+ retry_strategy = Retry(
+ total=5, # Maximum retry count
+ backoff_factor=1, # Retry interval (1s, 2s, 4s, 8s...)
+ status_forcelist=[500, 502, 503, 504], # Status codes that need retry
+ allowed_methods=["HEAD", "GET", "POST", "OPTIONS"] # Allowed retry methods
+ )
+ adapter = HTTPAdapter(max_retries=retry_strategy)
+ self.session.mount("https://", adapter)
+ self.session.mount("http://", adapter)
+
+ def generate_music(self, prompt, model='V5', vocalGender=None, **options):
+ """Generate music"""
+ payload = {
+ 'prompt': prompt,
+ 'model': model,
+ 'callBackUrl': 'https://example.com/callback',
+ **options
+ }
+
+ if vocalGender:
+ payload['vocalGender'] = vocalGender
+
+ try:
+ response = self.session.post(
+ f'{self.base_url}/generate',
+ headers=self.headers,
+ json=payload,
+ timeout=30
+ )
+
+ # Check HTTP errors
+ response.raise_for_status()
+
+ # Try to parse JSON
+ try:
+ result = response.json()
+ except json.JSONDecodeError:
+ raise Exception(f"API returned non-JSON response: {response.text[:200]}")
+
+ if result.get('code') != 200:
+ raise Exception(f"Generation failed: {result.get('msg', result)}")
+
+ return result['data']['taskId']
+
+ except requests.exceptions.RequestException as e:
+ raise Exception(f"Request exception: {str(e)}")
+
+ def get_task_status(self, task_id):
+ """Get task status"""
+ try:
+ response = self.session.get(
+ f'{self.base_url}/generate/record-info?taskId={task_id}',
+ headers={'Authorization': f'Bearer {self.api_key}'},
+ timeout=30
+ )
+ response.raise_for_status()
+ return response.json().get('data', {})
+ except Exception as e:
+ # Status query failure should not crash the program, return empty dict or throw specific exception
+ # print_log(f"Failed to get status: {e}")
+ raise e
+
+ def get_timestamped_lyrics(self, task_id, audio_id):
+ """Get timestamped lyrics"""
+ payload = {
+ 'taskId': task_id,
+ 'audioId': audio_id
+ }
+
+ try:
+ response = self.session.post(
+ f'{self.base_url}/generate/get-timestamped-lyrics',
+ headers=self.headers,
+ json=payload,
+ timeout=30
+ )
+ response.raise_for_status()
+ return response.json()
+ except Exception:
+ return {} # Lyrics retrieval failure is non-fatal error
+
+ def wait_for_completion(self, task_id, max_wait_time=600, check_interval=5):
+ """Wait for task completion, return result and polling statistics"""
+ start_time = time.time()
+ poll_count = 0
+ total_poll_time = 0
+
+ while time.time() - start_time < max_wait_time:
+ try:
+ poll_start = time.time()
+ status = self.get_task_status(task_id)
+ poll_time = time.time() - poll_start
+ poll_count += 1
+ total_poll_time += poll_time
+
+ current_status = status.get('status')
+
+ if current_status == 'SUCCESS':
+ return {
+ 'result': status.get('response'),
+ 'wait_time': time.time() - start_time,
+ 'poll_count': poll_count,
+ 'avg_poll_time': total_poll_time / poll_count if poll_count > 0 else 0
+ }
+ elif current_status == 'FAILED':
+ raise Exception(f"Task failed: {status.get('errorMessage')}")
+
+ time.sleep(check_interval)
+ except Exception as e:
+ if time.time() - start_time >= max_wait_time:
+ raise
+ time.sleep(check_interval)
+
+ raise Exception('Task timeout')
+
+ def download_file(self, url, save_path):
+ """Download file to local, return download statistics"""
+ try:
+ start_time = time.time()
+ downloaded_bytes = 0
+
+ # Use session to download
+ with self.session.get(url, stream=True, timeout=60) as r:
+ r.raise_for_status()
+ with open(save_path, 'wb') as f:
+ for chunk in r.iter_content(chunk_size=8192):
+ f.write(chunk)
+ downloaded_bytes += len(chunk)
+
+ download_time = time.time() - start_time
+ return {
+ 'success': True,
+ 'bytes': downloaded_bytes,
+ 'time': download_time,
+ 'speed': downloaded_bytes / download_time if download_time > 0 else 0
+ }
+ except Exception as e:
+ print_log(f"Download failed {url}: {e}")
+ return {'success': False, 'error': str(e)}
+
+
+# Result record lock
+result_lock = Lock()
+
+def save_result_record(output_dir, record):
+ """Save single result to CSV in real-time"""
+ file_path = os.path.join(output_dir, "generation_results.csv")
+ file_exists = os.path.isfile(file_path)
+
+ # Only record key information
+ row = {
+ 'song_id': record.get('song_id'),
+ 'task_id': record.get('task_id'),
+ 'status': 'SUCCESS' if record.get('success') else 'FAILED',
+ 'error': record.get('error', ''),
+ 'submit_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(record.get('submit_time', 0))),
+ 'total_time': f"{record.get('total_time', 0):.1f}",
+ 'tracks_count': record.get('tracks_count', 0)
+ }
+
+ with result_lock:
+ with open(file_path, 'a', newline='', encoding='utf-8') as f:
+ writer = csv.DictWriter(f, fieldnames=['song_id', 'task_id', 'status', 'error', 'submit_time', 'total_time', 'tracks_count'])
+ if not file_exists:
+ writer.writeheader()
+ writer.writerow(row)
+
+
+class ImprovedRateLimiter:
+ """Improved rate limiter (with statistics)
+
+ Precise control: maximum 8 requests per 10 seconds
+ Uses sliding window algorithm to ensure no more than 8 requests in any 10-second time window
+ """
+
+ def __init__(self, max_requests=5, time_window=10):
+ self.max_requests = max_requests
+ self.time_window = time_window
+ self.request_times = deque()
+ self.lock = Lock()
+ self.semaphore = Semaphore(max_requests)
+
+ # Statistics
+ self.total_wait_time = 0
+ self.wait_count = 0
+ self.total_requests = 0
+
+ def acquire(self):
+ """Acquire request permission"""
+ with self.lock:
+ now = time.time()
+
+ # Clean expired request records
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+
+ # If limit reached, calculate wait time needed
+ wait_time = 0
+ if len(self.request_times) >= self.max_requests:
+ oldest_request = self.request_times[0]
+ wait_time = self.time_window - (now - oldest_request) + 0.05 # Add buffer
+
+ if wait_time > 0:
+ print_log(f" [Rate Limit] Waiting {wait_time:.2f} seconds...")
+ time.sleep(wait_time)
+
+ # Record wait time
+ self.total_wait_time += wait_time
+ self.wait_count += 1
+
+ # Re-clean
+ now = time.time()
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+
+ # Record this request time
+ self.request_times.append(time.time())
+ self.total_requests += 1
+
+ def get_current_rate(self):
+ """Get current rate (number of requests in last 10 seconds)"""
+ with self.lock:
+ now = time.time()
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+ return len(self.request_times)
+
+ def get_stats(self):
+ """Get statistics"""
+ with self.lock:
+ return {
+ 'total_requests': self.total_requests,
+ 'total_wait_time': self.total_wait_time,
+ 'wait_count': self.wait_count,
+ 'avg_wait_time': self.total_wait_time / self.wait_count if self.wait_count > 0 else 0
+ }
+
+
+# Global rate limiter (5 requests per 10 seconds)
+rate_limiter = ImprovedRateLimiter(max_requests=5, time_window=10)
+
+
+def submit_generation_task(api, song_index, data):
+ """Phase 1: Submit generation task (rate limited)"""
+ # Use sunov5_000001 format
+ song_id = data.get("id", f"sunov5_{song_index:06d}")
+
+ try:
+ description = data.get("description", "")
+ lyrics = data.get("lyrics", "")
+ vocal_gender = data.get("vocalGender")
+
+ print_log(f"[Song {song_id}] Submitting task... (current rate: {rate_limiter.get_current_rate()}/5)")
+
+ # Record request start time
+ request_start = time.time()
+
+ # Rate limiting
+ rate_limiter.acquire()
+
+ # Submit task
+ submit_start = time.time()
+ task_id = api.generate_music(
+ prompt=lyrics,
+ style=description,
+ title=f"Song_{song_id}",
+ model='V5',
+ customMode=True,
+ instrumental=False,
+ vocalGender=vocal_gender
+ )
+ request_time = time.time() - submit_start
+
+ print_log(f"[Song {song_id}] ✓ Task submitted, ID: {task_id}")
+
+ return {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'data': data,
+ 'submit_time': time.time(),
+ 'request_time': request_time,
+ 'success': True
+ }
+
+ except Exception as e:
+ print_log(f"[Song {song_id}] ✗ Submission failed: {e}")
+ # If submission fails, also record it (even though not at download stage yet)
+ return {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'success': False,
+ 'error': str(e)
+ }
+
+
+def wait_and_download_result(api, task_info, output_dir):
+ """Phase 2: Wait for result and download (not rate limited)"""
+ if not task_info['success']:
+ return task_info
+
+ song_id = task_info['song_id']
+ song_index = task_info['song_index']
+ task_id = task_info['task_id']
+ data = task_info['data']
+ start_time = task_info['submit_time']
+
+ try:
+ original_lyrics = data.get("original_lyrics", data.get("lyrics", ""))
+ lyrics = data.get("lyrics", "")
+ description = data.get("description", "")
+
+ print_log(f"[Song {song_id}] Waiting for generation to complete...")
+
+ # Wait for completion (returns detailed statistics)
+ wait_result = api.wait_for_completion(task_id, max_wait_time=600, check_interval=8)
+ result = wait_result['result']
+
+ # Process returned result
+ tracks = []
+ if isinstance(result, dict):
+ if 'data' in result:
+ tracks = result['data']
+ elif 'sunoData' in result:
+ tracks = result['sunoData']
+ else:
+ for key, value in result.items():
+ if isinstance(value, list) and len(value) > 0 and 'audioUrl' in value[0]:
+ tracks = value
+ break
+
+ if not tracks:
+ raise Exception("Audio track data not found")
+
+ # Download phase statistics
+ download_start = time.time()
+ downloaded_files = []
+ total_download_bytes = 0
+ download_count = 0
+
+ # Process each track
+ for track_idx, track in enumerate(tracks):
+ audio_url = track.get('audioUrl') or track.get('audio_url')
+ audio_id = track.get('id')
+
+ base_filename = f"{song_id}_{track_idx}"
+ audio_path = os.path.join(output_dir, f"{base_filename}.mp3")
+ lyrics_path = os.path.join(output_dir, f"{base_filename}_lyrics.json")
+
+ # Download audio
+ if audio_url:
+ download_result = api.download_file(audio_url, audio_path)
+ if download_result['success']:
+ downloaded_files.append(audio_path)
+ total_download_bytes += download_result['bytes']
+ download_count += 1
+
+ # Get timestamped lyrics
+ timestamped_lyrics_data = None
+ if audio_id:
+ try:
+ lyrics_response = api.get_timestamped_lyrics(task_id, audio_id)
+ if lyrics_response.get('code') == 200:
+ timestamped_lyrics_data = lyrics_response.get('data')
+ except Exception as e:
+ print_log(f"[Song {song_id}] Track {track_idx+1}: Failed to get lyrics: {e}")
+
+ # Save lyrics and metadata
+ lyrics_content = {
+ "song_id": song_id,
+ "song_index": song_index,
+ "track_index": track_idx,
+ "original_lyrics": original_lyrics,
+ "cleaned_lyrics": lyrics,
+ "timestamped_lyrics": timestamped_lyrics_data,
+ "style": description,
+ "full_track_data": track
+ }
+
+ with open(lyrics_path, 'w', encoding='utf-8') as f:
+ json.dump(lyrics_content, f, ensure_ascii=False, indent=2)
+ downloaded_files.append(lyrics_path)
+
+ download_time = time.time() - download_start
+ total_time = time.time() - start_time
+
+ print_log(f"[Song {song_id}] ✓ Complete! {len(tracks)} tracks, took {total_time:.1f} seconds")
+
+ final_result = {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'success': True,
+ 'tracks_count': len(tracks),
+ 'files': downloaded_files,
+ 'total_time': total_time,
+ 'submit_time': start_time,
+ 'wait_time': wait_result['wait_time'],
+ 'poll_count': wait_result['poll_count'],
+ 'avg_poll_time': wait_result['avg_poll_time'],
+ 'download_time': download_time,
+ 'download_bytes': total_download_bytes,
+ 'download_count': download_count,
+ 'avg_download_speed': total_download_bytes / download_time if download_time > 0 else 0
+ }
+
+ # Save result in real-time
+ save_result_record(output_dir, final_result)
+ return final_result
+
+ except Exception as e:
+ total_time = time.time() - start_time
+ print_log(f"[Song {song_id}] ✗ Processing failed: {e} (took {total_time:.1f} seconds)")
+
+ error_result = {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'success': False,
+ 'error': str(e),
+ 'total_time': total_time,
+ 'submit_time': start_time
+ }
+
+ # Save result in real-time
+ save_result_record(output_dir, error_result)
+ return error_result
+
+
+def format_bytes(bytes_size):
+ """Format byte size"""
+ for unit in ['B', 'KB', 'MB', 'GB']:
+ if bytes_size < 1024.0:
+ return f"{bytes_size:.2f} {unit}"
+ bytes_size /= 1024.0
+ return f"{bytes_size:.2f} TB"
+
+
+def format_speed(bytes_per_sec):
+ """Format speed"""
+ return f"{format_bytes(bytes_per_sec)}/s"
+
+
+def main():
+ """Main program - two-phase concurrent processing"""
+ input_file = "cleaned_data_truncated.json"
+ output_dir = "sunov5_truncated"
+ # Create output directory
+ os.makedirs(output_dir, exist_ok=True)
+
+ # Initialize logging
+ global logger
+ logger, log_file = setup_logging(output_dir)
+
+ print_log("=" * 70)
+ print_log("Suno API Batch Generation - V5 Special Edition")
+ print_log("Strategy: Fast submission (5 requests/10s) + Parallel waiting + Detailed performance analysis")
+ print_log(f"Log file: {log_file}")
+ print_log("=" * 70)
+
+ # Read input file
+ try:
+ all_data = []
+ if input_file.endswith('.jsonl'):
+ try:
+ with open(input_file, 'r', encoding='utf-8') as f:
+ # Try reading first line to determine format
+ first_line = f.readline().strip()
+ if first_line.startswith('['):
+ # Looks like regular JSON array
+ f.seek(0)
+ all_data = json.load(f)
+ else:
+ # Try reading line by line
+ f.seek(0)
+ for i, line in enumerate(f):
+ line = line.strip()
+ if line:
+ all_data.append(json.loads(line))
+ except json.JSONDecodeError:
+ # If above parsing fails, try one final read as regular JSON
+ print_log(f"Note: Failed to parse {input_file} as JSONL format, trying as regular JSON...")
+ with open(input_file, 'r', encoding='utf-8') as f:
+ all_data = json.load(f)
+ else:
+ with open(input_file, 'r', encoding='utf-8') as f:
+ all_data = json.load(f)
+
+ except FileNotFoundError:
+ print_log(f"File {input_file} not found.")
+ return
+ except json.JSONDecodeError as e:
+ print_log(f"JSON parsing error: {e}")
+ return
+
+ # Initialize API
+ api = SunoAPI(SUNO_API_KEY)
+
+ print_log(f"\nPreparing to generate {len(all_data)} songs...")
+ print_log(f"Start time: {time.strftime('%H:%M:%S')}\n")
+
+ overall_start_time = time.time()
+
+ # ===== Phase 1: Batch Submission =====
+ print_log("\n" + "=" * 70)
+ print_log("Phase 1: Batch Submission")
+ print_log("=" * 70 + "\n")
+
+ submit_start_time = time.time()
+ submitted_tasks = []
+ total_request_time = 0
+
+ # Adjust rate limit: maximum 5 requests per 10 seconds
+ rate_limiter.max_requests = 5
+ rate_limiter.time_window = 10
+ rate_limiter.request_times.clear()
+ print_log(f"Rate limit: {rate_limiter.max_requests} requests / {rate_limiter.time_window} seconds")
+
+ # Only submit tasks that need to run
+ tasks_to_run = []
+ for i, data in enumerate(all_data, 1):
+ tasks_to_run.append((i, data))
+
+ print_log(f"Number of tasks to submit: {len(tasks_to_run)}")
+
+ # Use thread pool for submission
+ # Submission concurrency is controlled by rate_limiter, can be set to 5
+ with ThreadPoolExecutor(max_workers=5) as executor:
+ submit_futures = {
+ executor.submit(submit_generation_task, api, idx, data): idx
+ for idx, data in tasks_to_run
+ }
+
+ with tqdm(total=len(tasks_to_run), desc="Submitting tasks", unit="song") as pbar:
+ for future in as_completed(submit_futures):
+ result = future.result()
+ submitted_tasks.append(result)
+ if result.get('success') and 'request_time' in result:
+ total_request_time += result['request_time']
+ pbar.update(1)
+
+ submit_phase_time = time.time() - submit_start_time
+ success_submits = sum(1 for t in submitted_tasks if t['success'])
+
+ # Get rate limit statistics
+ rate_limit_stats = rate_limiter.get_stats()
+
+ print_log(f"\nSubmission phase complete: {success_submits}/{len(tasks_to_run)} successful")
+ print_log(f" Total time: {submit_phase_time:.1f} seconds")
+ print_log(f" Actual request time: {total_request_time:.2f} seconds")
+ print_log(f" Rate limit waiting: {rate_limit_stats['total_wait_time']:.2f} seconds ({rate_limit_stats['wait_count']} times)")
+ if rate_limit_stats['wait_count'] > 0:
+ print_log(f" Average wait time: {rate_limit_stats['avg_wait_time']:.2f} seconds/time")
+
+ # ===== Phase 2: Parallel Waiting and Download =====
+ print_log("\n" + "=" * 70)
+ print_log("Phase 2: Wait for Generation and Download")
+ print_log("=" * 70 + "\n")
+
+ wait_start_time = time.time()
+ final_results = []
+
+ # Use more threads for parallel waiting (not rate limited)
+ with ThreadPoolExecutor(max_workers=20) as executor:
+ download_futures = {
+ executor.submit(wait_and_download_result, api, task, output_dir): task
+ for task in submitted_tasks if task['success']
+ }
+
+ # Add failed submission tasks to results
+ for task in submitted_tasks:
+ if not task['success']:
+ final_results.append(task)
+
+ with tqdm(total=len(download_futures), desc="Downloading results", unit="song") as pbar:
+ for future in as_completed(download_futures):
+ result = future.result()
+ final_results.append(result)
+ pbar.update(1)
+
+ wait_phase_time = time.time() - wait_start_time
+
+ # ===== Detailed Statistics and Report =====
+ overall_time = time.time() - overall_start_time
+
+ print_log("\n" + "=" * 70)
+ print_log("Batch Generation Complete - Detailed Performance Report")
+ print_log("=" * 70)
+
+ success_count = sum(1 for r in final_results if r.get('success'))
+ fail_count = len(final_results) - success_count
+ total_tracks = sum(r.get('tracks_count', 0) for r in final_results if r.get('success'))
+
+ successful_results = [r for r in final_results if r.get('success')]
+
+ # Basic Statistics
+ print_log(f"\n[Basic Statistics]")
+ print_log(f" Total songs: {len(all_data)}")
+ print_log(f" Successful: {success_count}")
+ print_log(f" Failed: {fail_count}")
+ print_log(f" Total tracks: {total_tracks}")
+ if success_count > 0:
+ avg_tracks = total_tracks / success_count
+ print_log(f" Average tracks per song: {avg_tracks:.2f}")
+
+ # Time Statistics
+ print_log(f"\n[Time Statistics]")
+ print_log(f" ├── Submission phase: {submit_phase_time:.1f} seconds")
+ print_log(f" │ ├── Actual request time: {total_request_time:.2f} seconds")
+ print_log(f" │ └── Rate limit waiting: {rate_limit_stats['total_wait_time']:.2f} seconds")
+ print_log(f" ├── Generation waiting phase: {wait_phase_time:.1f} seconds")
+
+ if successful_results:
+ wait_times = [r.get('wait_time', 0) for r in successful_results if 'wait_time' in r]
+ download_times = [r.get('download_time', 0) for r in successful_results if 'download_time' in r]
+
+ if wait_times:
+ avg_wait = sum(wait_times) / len(wait_times)
+ min_wait = min(wait_times)
+ max_wait = max(wait_times)
+ print_log(f" │ ├── Average wait time: {avg_wait:.1f} seconds/song")
+ print_log(f" │ ├── Fastest: {min_wait:.1f} seconds")
+ print_log(f" │ └── Slowest: {max_wait:.1f} seconds")
+
+ if download_times:
+ total_download_time = sum(download_times)
+ avg_download = total_download_time / len(download_times)
+ print_log(f" ├── Download phase: {total_download_time:.1f} seconds")
+ print_log(f" │ └── Average download time: {avg_download:.2f} seconds/song")
+
+ print_log(f" └── Total time: {overall_time:.1f} seconds ({overall_time/60:.1f} minutes)")
+
+ # Single Song Generation Statistics
+ if successful_results:
+ total_times = [r.get('total_time', 0) for r in successful_results if 'total_time' in r]
+ if total_times:
+ print_log(f"\n[Single Song Generation Statistics]")
+ avg_time = sum(total_times) / len(total_times)
+ min_time = min(total_times)
+ max_time = max(total_times)
+ print_log(f" Average total time per song: {avg_time:.1f} seconds")
+ print_log(f" Fastest generation: {min_time:.1f} seconds")
+ print_log(f" Slowest generation: {max_time:.1f} seconds")
+
+ # Download Statistics
+ total_download_bytes = sum(r.get('download_bytes', 0) for r in successful_results)
+ total_download_count = sum(r.get('download_count', 0) for r in successful_results)
+
+ if total_download_bytes > 0:
+ print_log(f"\n[Download Statistics]")
+ print_log(f" Total download: {format_bytes(total_download_bytes)}")
+ print_log(f" Number of files: {total_download_count}")
+ print_log(f" Average file size: {format_bytes(total_download_bytes / total_download_count)}")
+
+ download_speeds = [r.get('avg_download_speed', 0) for r in successful_results if r.get('avg_download_speed', 0) > 0]
+ if download_speeds:
+ avg_speed = sum(download_speeds) / len(download_speeds)
+ print_log(f" Average download speed: {format_speed(avg_speed)}")
+
+ # Polling Statistics
+ poll_counts = [r.get('poll_count', 0) for r in successful_results if 'poll_count' in r]
+ if poll_counts:
+ total_polls = sum(poll_counts)
+ avg_polls = total_polls / len(poll_counts)
+ print_log(f"\n[Polling Statistics]")
+ print_log(f" Total polling count: {total_polls}")
+ print_log(f" Average polling per song: {avg_polls:.1f}")
+
+ # Efficiency Analysis
+ print_log(f"\n[Efficiency Analysis]")
+ if success_count > 0:
+ throughput = success_count / (overall_time / 60)
+ print_log(f" Actual throughput: {throughput:.2f} songs/minute")
+
+ # Theoretical fastest time (assuming no rate limit)
+ if wait_times:
+ ideal_time = submit_phase_time - rate_limit_stats['total_wait_time'] + max(wait_times)
+ efficiency = (ideal_time / overall_time) * 100
+ print_log(f" Theoretical fastest time: {ideal_time:.1f} seconds")
+ print_log(f" Concurrency efficiency: {efficiency:.1f}%")
+
+ # Show failed songs
+ if fail_count > 0:
+ print_log("\n" + "=" * 70)
+ print_log("Failed Songs List")
+ print_log("=" * 70)
+ for r in sorted(final_results, key=lambda x: x.get('song_index', 0)):
+ if not r.get('success'):
+ song_id = r.get('song_id', r.get('song_index', 'Unknown'))
+ print_log(f" [{song_id}] {r.get('error', 'Unknown error')}")
+
+ print_log("\n" + "=" * 70)
+ print_log(f"All files saved to: {os.path.abspath(output_dir)}")
+ print_log("=" * 70)
+
+
+if __name__ == '__main__':
+ main()
+
diff --git a/baseline_generate/yue/__pycache__/infer_batch.cpython-311.pyc b/baseline_generate/yue/__pycache__/infer_batch.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fb85e3c162bfe4898604b6e876a6a5cc45e25c0e
Binary files /dev/null and b/baseline_generate/yue/__pycache__/infer_batch.cpython-311.pyc differ
diff --git a/baseline_generate/yue/batch.sh b/baseline_generate/yue/batch.sh
new file mode 100644
index 0000000000000000000000000000000000000000..345040c82b596366e8c961d31024b2acf316ff8c
--- /dev/null
+++ b/baseline_generate/yue/batch.sh
@@ -0,0 +1,55 @@
+#!/bin/bash
+
+# Batch music generation example script
+# Requires yue source repository, main code change is infer_batch.py replacing infer.py
+
+# Get absolute path of script directory (to avoid filesystem mount issues)
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd 2>/dev/null || echo "xxx/YuE/inference")"
+
+# Change to script directory (if possible, otherwise use absolute path)
+cd "$SCRIPT_DIR" 2>/dev/null || true
+
+
+# Set HuggingFace mirror
+export HF_ENDPOINT=https://hf-mirror.com
+
+# Set PyTorch CUDA memory management optimization
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+
+export CUDA_VISIBLE_DEVICES=1
+
+# Set JSONL file path
+JSONL_PATH=""
+
+# Set output directory
+OUTPUT_DIR=""
+
+# Set processing range (optional)
+# Example: only process first 5 songs
+START_IDX=0
+END_IDX=-1
+
+# Set generation parameters
+MAX_NEW_TOKENS=3500
+REPETITION_PENALTY=1.1
+RUN_N_SEGMENTS=24
+STAGE2_BATCH_SIZE=16
+CUDA_IDX=0
+SEED=42
+NO_SAMPLE=0
+
+# Run batch generation (using absolute path)
+python "$SCRIPT_DIR/infer_batch.py" \
+ --jsonl_path "$JSONL_PATH" \
+ --output_dir "$OUTPUT_DIR" \
+ --start_idx $START_IDX \
+ --end_idx $END_IDX \
+ --max_new_tokens $MAX_NEW_TOKENS \
+ --repetition_penalty $REPETITION_PENALTY \
+ --run_n_segments $RUN_N_SEGMENTS \
+ --stage2_batch_size $STAGE2_BATCH_SIZE \
+ --cuda_idx $CUDA_IDX \
+ --seed $SEED \
+ --rescale \
+ $( [ "$NO_SAMPLE" -eq 1 ] && echo "--no_sample" )
+
diff --git a/baseline_generate/yue/codecmanipulator.py b/baseline_generate/yue/codecmanipulator.py
new file mode 100644
index 0000000000000000000000000000000000000000..e617aa4013bff3349b8651a29d2adce09006a9fb
--- /dev/null
+++ b/baseline_generate/yue/codecmanipulator.py
@@ -0,0 +1,204 @@
+import json
+import numpy as np
+import einops
+
+
+class CodecManipulator(object):
+ r"""
+ **mm tokenizer v0.1**
+ see codeclm/hf/mm_tokenizer_v0.1_hf/id2vocab.json
+
+ text tokens:
+ llama tokenizer 0~31999
+
+ special tokens: "32000": "", "32001": "", "32002": "", "32003": "", "32004": "", "32005": "", "32006": "", "32007": "", "32008": "", "32009": "", "32010": "", "32011": "", "32012": "", "32013": "", "32014": "", "32015": "", "32016": "", "32017": "", "32018": "", "32019": "", "32020": "", "32021": ""
+
+ mm tokens:
+ dac_16k: 4 codebook, 1024 vocab, 32022 - 36117
+ dac_44k: 9 codebook, 1024 vocab, 36118 - 45333
+ xcodec: 12 codebook, 1024 vocab, 45334 - 57621
+ semantic mert: 1024, 57622 - 58645
+ semantic hubert: 512, 58646 - 59157
+ visual: 64000, not included in v0.1
+ semanticodec 100tps 16384: semantic=16384, 59158 - 75541, acoustic=8192, 75542 - 83733
+ """
+ def __init__(self, codec_type, quantizer_begin=None, n_quantizer=None, teacher_forcing=False, data_feature="codec"):
+ self.codec_type = codec_type
+ self.mm_v0_2_cfg = {
+ "dac16k": {"codebook_size": 1024, "num_codebooks": 4, "global_offset": 32022, "sep": [""], "fps": 50},
+ "dac44k": {"codebook_size": 1024, "num_codebooks": 9, "global_offset": 36118, "sep": [""]},
+ "xcodec": {"codebook_size": 1024, "num_codebooks": 12, "global_offset": 45334, "sep": [""], "fps": 50},
+ "mert": {"codebook_size": 1024, "global_offset": 57622, "sep": [""]},
+ "hubert": {"codebook_size": 512, "global_offset": 58646, "sep": [""]},
+ "semantic/s": {"codebook_size": 16384, "num_codebooks": 1, "global_offset": 59158, "sep": ["", ""]},
+ "semantic/a": {"codebook_size": 8192, "num_codebooks": 1, "global_offset": 75542, "sep": ["", ""]},
+ "semanticodec": {"codebook_size": [16384, 8192], "num_codebooks": 2, "global_offset": 59158, "sep": [""], "fps": 50},
+ "special_tokens": {
+ '': 32000, '': 32001, '': 32002, '': 32003, '': 32004, '': 32005, '': 32006, '': 32007, '': 32008, '': 32009, '': 32010, '': 32011, '': 32012, '': 32013, '': 32014, '': 32015, '': 32016, '': 32017, '': 32018, '': 32019, '': 32020, '': 32021
+ },
+ "metadata": {
+ "len": 83734,
+ "text_range": [0, 31999],
+ "special_range": [32000, 32021],
+ "mm_range": [32022, 83733]
+ },
+ "codec_range": {
+ "dac16k": [32022, 36117],
+ "dac44k": [36118, 45333],
+ "xcodec": [45334, 57621],
+ # "hifi16k": [53526, 57621],
+ "mert": [57622, 58645],
+ "hubert": [58646, 59157],
+ "semantic/s": [59158, 75541],
+ "semantic/a": [75542, 83733],
+ "semanticodec": [59158, 83733]
+ }
+ }
+ self.sep = self.mm_v0_2_cfg[self.codec_type]["sep"]
+ self.sep_ids = [self.mm_v0_2_cfg["special_tokens"][s] for s in self.sep]
+ self.codebook_size = self.mm_v0_2_cfg[self.codec_type]["codebook_size"]
+ self.num_codebooks = self.mm_v0_2_cfg[self.codec_type]["num_codebooks"]
+ self.global_offset = self.mm_v0_2_cfg[self.codec_type]["global_offset"]
+ self.fps = self.mm_v0_2_cfg[self.codec_type]["fps"] if "fps" in self.mm_v0_2_cfg[self.codec_type] else None
+
+ self.quantizer_begin = quantizer_begin if quantizer_begin is not None else 0
+ self.n_quantizer = n_quantizer if n_quantizer is not None else self.num_codebooks
+ self.teacher_forcing = teacher_forcing
+ self.data_feature = data_feature
+
+
+ def offset_tok_ids(self, x, global_offset=0, codebook_size=2048, num_codebooks=4):
+ """
+ x: (K, T)
+ """
+ if isinstance(codebook_size, int):
+ assert x.max() < codebook_size, f"max(x)={x.max()}, codebook_size={codebook_size}"
+ elif isinstance(codebook_size, list):
+ for i, cs in enumerate(codebook_size):
+ assert x[i].max() < cs, f"max(x)={x[i].max()}, codebook_size={cs}, layer_id={i}"
+ else:
+ raise ValueError(f"codebook_size={codebook_size}")
+ assert x.min() >= 0, f"min(x)={x.min()}"
+ assert x.shape[0] == num_codebooks or x.shape[0] == self.n_quantizer, \
+ f"x.shape[0]={x.shape[0]}, num_codebooks={num_codebooks}, n_quantizer={self.n_quantizer}"
+
+ _x = x.copy()
+ _x = _x.astype(np.uint32)
+ cum_offset = 0
+ quantizer_begin = self.quantizer_begin
+ quantizer_end = quantizer_begin+self.n_quantizer
+ for k in range(self.quantizer_begin, quantizer_end): # k: quantizer_begin to quantizer_end - 1
+ if isinstance(codebook_size, int):
+ _x[k] += global_offset + k * codebook_size
+ elif isinstance(codebook_size, list):
+ _x[k] += global_offset + cum_offset
+ cum_offset += codebook_size[k]
+ else:
+ raise ValueError(f"codebook_size={codebook_size}")
+ return _x[quantizer_begin:quantizer_end]
+
+ def unoffset_tok_ids(self, x, global_offset=0, codebook_size=2048, num_codebooks=4):
+ """
+ x: (K, T)
+ """
+ if isinstance(codebook_size, int):
+ assert x.max() < global_offset + codebook_size * num_codebooks, f"max(x)={x.max()}, codebook_size={codebook_size}"
+ elif isinstance(codebook_size, list):
+ assert x.max() < global_offset + sum(codebook_size), f"max(x)={x.max()}, codebook_size={codebook_size}"
+ assert x.min() >= global_offset, f"min(x)={x.min()}, global_offset={global_offset}"
+ assert x.shape[0] == num_codebooks or x.shape[0] == self.n_quantizer, \
+ f"x.shape[0]={x.shape[0]}, num_codebooks={num_codebooks}, n_quantizer={self.n_quantizer}"
+
+ _x = x.copy()
+ _x = _x.astype(np.uint32)
+ cum_offset = 0
+ quantizer_begin = self.quantizer_begin
+ quantizer_end = quantizer_begin+self.n_quantizer
+ for k in range(quantizer_begin, quantizer_end):
+ if isinstance(codebook_size, int):
+ _x[k-quantizer_begin] -= global_offset + k * codebook_size
+ elif isinstance(codebook_size, list):
+ _x[k-quantizer_begin] -= global_offset + cum_offset
+ cum_offset += codebook_size[k]
+ else:
+ raise ValueError(f"codebook_size={codebook_size}")
+ return _x
+
+ def flatten(self, x):
+ if len(x.shape) > 2:
+ x = x.squeeze()
+ assert x.shape[0] == self.num_codebooks or x.shape[0] == self.n_quantizer, \
+ f"x.shape[0]={x.shape[0]}, num_codebooks={self.num_codebooks}, n_quantizer={self.n_quantizer}"
+ return einops.rearrange(x, 'K T -> (T K)')
+
+ def unflatten(self, x, n_quantizer=None):
+ if x.ndim > 1 and x.shape[0] == 1:
+ x = x.squeeze(0)
+ assert len(x.shape) == 1
+ assert x.shape[0] % self.num_codebooks == 0 or x.shape[0] % self.n_quantizer == 0, \
+ f"x.shape[0]={x.shape[0]}, num_codebooks={self.num_codebooks}, n_quantizer={self.n_quantizer}"
+ if n_quantizer!=self.num_codebooks:
+ return einops.rearrange(x, '(T K) -> K T', K=n_quantizer)
+ return einops.rearrange(x, '(T K) -> K T', K=self.num_codebooks)
+
+ # def check_codec_type_from_path(self, path):
+ # if self.codec_type == "hifi16k":
+ # assert "academicodec_hifi_16k_320d_large_uni" in path
+
+ def get_codec_type_from_range(self, ids):
+ ids_range = [ids.min(), ids.max()]
+ codec_range = self.mm_v0_2_cfg["codec_range"]
+ for codec_type, r in codec_range.items():
+ if ids_range[0] >= r[0] and ids_range[1] <= r[1]:
+ return codec_type
+ raise ValueError(f"ids_range={ids_range}, codec_range={codec_range}")
+
+ def npy2ids(self, npy):
+ if isinstance(npy, str):
+ data = np.load(npy)
+ elif isinstance(npy, np.ndarray):
+ data = npy
+ else:
+ raise ValueError(f"not supported type: {type(npy)}")
+ # data = data.squeeze()
+
+ assert len(data.shape)==2, f'data shape: {data.shape} is not (n_codebook, seq_len)'
+ data = self.offset_tok_ids(
+ data,
+ global_offset=self.global_offset,
+ codebook_size=self.codebook_size,
+ num_codebooks=self.num_codebooks,
+ )
+ data = self.flatten(data)
+ codec_range = self.get_codec_type_from_range(data)
+ assert codec_range == self.codec_type, f"get_codec_type_from_range(data)={codec_range}, self.codec_type={self.codec_type}"
+ data = data.tolist()
+ return data
+
+ def ids2npy(self, token_ids):
+ # make sure token_ids starts with codebook 0
+ if isinstance(self.codebook_size, int):
+ codebook_0_range = (self.global_offset + self.quantizer_begin*self.codebook_size, self.global_offset + (self.quantizer_begin+1)*self.codebook_size)
+ elif isinstance(self.codebook_size, list):
+ codebook_0_range = (self.global_offset, self.global_offset + self.codebook_size[0])
+ assert token_ids[0] >= codebook_0_range[0] \
+ and token_ids[0] < codebook_0_range[1], f"token_ids[0]={token_ids[self.quantizer_begin]}, codebook_0_range={codebook_0_range}"
+ data = np.array(token_ids)
+ data = self.unflatten(data, n_quantizer=self.n_quantizer)
+ data = self.unoffset_tok_ids(
+ data,
+ global_offset=self.global_offset,
+ codebook_size=self.codebook_size,
+ num_codebooks=self.num_codebooks,
+ )
+ return data
+
+ def npy_to_json_str(self, npy_path):
+ data = self.npy2ids(npy_path)
+ return json.dumps({"text": data, "src": npy_path, "codec": self.codec_type})
+
+ def sep(self):
+ return ''.join(self.sep)
+
+ def sep_ids(self):
+ return self.sep_ids
diff --git a/baseline_generate/yue/infer_batch.py b/baseline_generate/yue/infer_batch.py
new file mode 100644
index 0000000000000000000000000000000000000000..df760db648ff7961796b47b2505a975253ba2b38
--- /dev/null
+++ b/baseline_generate/yue/infer_batch.py
@@ -0,0 +1,904 @@
+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'xcodec_mini_infer'))
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'xcodec_mini_infer', 'descriptaudiocodec'))
+import re
+import random
+import uuid
+import copy
+import json
+from tqdm import tqdm
+from collections import Counter
+import argparse
+import numpy as np
+import torch
+import torchaudio
+from torchaudio.transforms import Resample
+import soundfile as sf
+from einops import rearrange
+from transformers import AutoTokenizer, AutoModelForCausalLM, LogitsProcessor, LogitsProcessorList
+from omegaconf import OmegaConf
+from codecmanipulator import CodecManipulator
+from mmtokenizer import _MMSentencePieceTokenizer
+from models.soundstream_hubert_new import SoundStream
+from vocoder import build_codec_model, process_audio
+from post_process_audio import replace_low_freq_with_energy_matched
+
+os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
+
+parser = argparse.ArgumentParser()
+# Model Configuration:
+parser.add_argument("--stage1_model", type=str, default="m-a-p/YuE-s1-7B-anneal-en-cot", help="The model checkpoint path or identifier for the Stage 1 model.")
+parser.add_argument("--stage2_model", type=str, default="m-a-p/YuE-s2-1B-general", help="The model checkpoint path or identifier for the Stage 2 model.")
+parser.add_argument("--max_new_tokens", type=int, default=3000, help="The maximum number of new tokens to generate in one pass during text generation.")
+parser.add_argument("--repetition_penalty", type=float, default=1.1, help="repetition_penalty ranges from 1.0 to 2.0 (or higher in some cases). It controls the diversity and coherence of the audio tokens generated. The higher the value, the greater the discouragement of repetition. Setting value to 1.0 means no penalty.")
+parser.add_argument("--run_n_segments", type=int, default=2, help="The number of segments to process during generation. Each segment is ~30s (with default max_new_tokens=3000). For example: 2=~1min, 6=~3min, 8=~4min.")
+parser.add_argument("--stage2_batch_size", type=int, default=4, help="The batch size used in Stage 2 inference.")
+parser.add_argument(
+ "--no_sample",
+ action="store_true",
+ help="If set, disable sampling in Stage 1 generation (i.e., use deterministic decoding). When enabled, top_p/temperature will be ignored.",
+)
+# Prompt - Batch processing parameters
+parser.add_argument("--jsonl_path", type=str, required=True, help="The file path to a JSONL file containing genre and lyrics for batch processing.")
+parser.add_argument("--start_idx", type=int, default=0, help="Start index in the JSONL file for batch processing.")
+parser.add_argument("--end_idx", type=int, default=-1, help="End index in the JSONL file for batch processing. -1 means process all.")
+parser.add_argument("--use_audio_prompt", action="store_true", help="If set, the model will use an audio file as a prompt during generation. The audio file should be specified using --audio_prompt_path.")
+parser.add_argument("--audio_prompt_path", type=str, default="", help="The file path to an audio file to use as a reference prompt when --use_audio_prompt is enabled.")
+parser.add_argument("--prompt_start_time", type=float, default=0.0, help="The start time in seconds to extract the audio prompt from the given audio file.")
+parser.add_argument("--prompt_end_time", type=float, default=30.0, help="The end time in seconds to extract the audio prompt from the given audio file.")
+parser.add_argument("--use_dual_tracks_prompt", action="store_true", help="If set, the model will use dual tracks as a prompt during generation. The vocal and instrumental files should be specified using --vocal_track_prompt_path and --instrumental_track_prompt_path.")
+parser.add_argument("--vocal_track_prompt_path", type=str, default="", help="The file path to a vocal track file to use as a reference prompt when --use_dual_tracks_prompt is enabled.")
+parser.add_argument("--instrumental_track_prompt_path", type=str, default="", help="The file path to an instrumental track file to use as a reference prompt when --use_dual_tracks_prompt is enabled.")
+# Output
+parser.add_argument("--output_dir", type=str, default="./output", help="The directory where generated outputs will be saved.")
+parser.add_argument("--keep_intermediate", action="store_true", help="If set, intermediate outputs will be saved during processing.")
+parser.add_argument("--disable_offload_model", action="store_true", help="If set, the model will not be offloaded from the GPU to CPU after Stage 1 inference.")
+parser.add_argument("--cuda_idx", type=int, default=0)
+parser.add_argument("--seed", type=int, default=42, help="An integer value to reproduce generation.")
+# Config for xcodec and upsampler
+parser.add_argument('--basic_model_config', default='./xcodec_mini_infer/final_ckpt/config.yaml', help='YAML files for xcodec configurations.')
+parser.add_argument('--resume_path', default='./xcodec_mini_infer/final_ckpt/ckpt_00360000.pth', help='Path to the xcodec checkpoint.')
+parser.add_argument('--config_path', type=str, default='./xcodec_mini_infer/decoders/config.yaml', help='Path to Vocos config file.')
+parser.add_argument('--vocal_decoder_path', type=str, default='./xcodec_mini_infer/decoders/decoder_131000.pth', help='Path to Vocos decoder weights.')
+parser.add_argument('--inst_decoder_path', type=str, default='./xcodec_mini_infer/decoders/decoder_151000.pth', help='Path to Vocos decoder weights.')
+parser.add_argument('-r', '--rescale', action='store_true', help='Rescale output to avoid clipping.')
+
+
+args = parser.parse_args()
+if args.use_audio_prompt and not args.audio_prompt_path:
+ raise FileNotFoundError("Please offer audio prompt filepath using '--audio_prompt_path', when you enable 'use_audio_prompt'!")
+if args.use_dual_tracks_prompt and not args.vocal_track_prompt_path and not args.instrumental_track_prompt_path:
+ raise FileNotFoundError("Please offer dual tracks prompt filepath using '--vocal_track_prompt_path' and '--inst_decoder_path', when you enable '--use_dual_tracks_prompt'!")
+
+stage1_model = args.stage1_model
+stage2_model = args.stage2_model
+cuda_idx = args.cuda_idx
+max_new_tokens = args.max_new_tokens
+do_sample_stage1 = (not args.no_sample)
+
+def seed_everything(seed=42):
+ random.seed(seed)
+ np.random.seed(seed)
+ torch.manual_seed(seed)
+ torch.cuda.manual_seed_all(seed)
+ torch.backends.cudnn.deterministic = True
+ torch.backends.cudnn.benchmark = False
+
+seed_everything(args.seed)
+
+# Read JSONL file
+print(f"Reading JSONL file: {args.jsonl_path}")
+music_data_list = []
+with open(args.jsonl_path, 'r', encoding='utf-8') as f:
+ for line in f:
+ if line.strip():
+ music_data_list.append(json.loads(line))
+
+# Determine processing range
+start_idx = args.start_idx
+end_idx = len(music_data_list) if args.end_idx == -1 else min(args.end_idx, len(music_data_list))
+music_data_list = music_data_list[start_idx:end_idx]
+print(f"Total {len(music_data_list)} songs to generate (indices {start_idx} to {end_idx-1})")
+
+# Detect processed songs - check completion status of each stage
+def check_song_status(song_idx, output_dir):
+ """
+ Check song processing status
+ Returns: (stage1_done, stage2_done, stage3_done, song_dir, stage1_output_set, stage2_output_dir)
+ """
+ if not os.path.exists(output_dir):
+ return False, False, False, None, None, None
+
+ # Find song directory (may have multiple, take the latest or first)
+ song_dirs = []
+ for item in os.listdir(output_dir):
+ if item.startswith('song_') and os.path.isdir(os.path.join(output_dir, item)):
+ try:
+ idx = int(item.split('_')[1])
+ if idx == song_idx:
+ song_dirs.append(os.path.join(output_dir, item))
+ except (ValueError, IndexError):
+ continue
+
+ if not song_dirs:
+ return False, False, False, None, None, None
+
+ # Use the latest directory (sorted by modification time)
+ song_dir = max(song_dirs, key=lambda x: os.path.getmtime(x))
+
+ # Check Stage 1: whether stage1 directory has vtrack and itrack .npy files
+ stage1_dir = os.path.join(song_dir, "stage1")
+ stage1_done = False
+ stage1_output_set = []
+ if os.path.exists(stage1_dir):
+ stage1_files = [f for f in os.listdir(stage1_dir) if f.endswith('.npy')]
+ vtrack_files = [f for f in stage1_files if '_vtrack' in f]
+ itrack_files = [f for f in stage1_files if '_itrack' in f]
+ if vtrack_files and itrack_files:
+ stage1_done = True
+ # Build stage1_output_set
+ for f in vtrack_files + itrack_files:
+ stage1_output_set.append(os.path.join(stage1_dir, f))
+
+ # Check Stage 2: whether stage2 directory has corresponding .npy files
+ stage2_dir = os.path.join(song_dir, "stage2")
+ stage2_done = False
+ if stage1_done and os.path.exists(stage2_dir):
+ stage2_files = [f for f in os.listdir(stage2_dir) if f.endswith('.npy')]
+ # Check if all stage1 files have corresponding stage2 files
+ if stage1_output_set:
+ stage1_basenames = {os.path.basename(f) for f in stage1_output_set}
+ stage2_basenames = set(stage2_files)
+ if stage1_basenames.issubset(stage2_basenames):
+ stage2_done = True
+
+ # Check Stage 3: whether there is a final mixed file (in song_dir root directory)
+ stage3_done = False
+ for root, dirs, files in os.walk(song_dir):
+ if any(f.endswith('_mixed.mp3') for f in files):
+ stage3_done = True
+ break
+
+ return stage1_done, stage2_done, stage3_done, song_dir, stage1_output_set, stage2_dir
+
+# Detect processing status of all songs
+song_status_map = {} # {song_idx: (stage1_done, stage2_done, stage3_done, song_dir, stage1_output_set, stage2_output_dir)}
+if os.path.exists(args.output_dir):
+ print(f"\nDetecting processed songs...")
+ for list_idx in range(len(music_data_list)):
+ song_idx = start_idx + list_idx
+ stage1_done, stage2_done, stage3_done, song_dir, stage1_output_set, stage2_output_dir = check_song_status(song_idx, args.output_dir)
+ if stage1_done or stage2_done or stage3_done:
+ song_status_map[song_idx] = (stage1_done, stage2_done, stage3_done, song_dir, stage1_output_set, stage2_output_dir)
+
+ if song_status_map:
+ fully_completed = [idx for idx, (s1, s2, s3, _, _, _) in song_status_map.items() if s3]
+ partial_completed = [idx for idx, (s1, s2, s3, _, _, _) in song_status_map.items() if not s3]
+ print(f"✓ Found {len(fully_completed)} fully completed songs: {sorted(fully_completed)}")
+ if partial_completed:
+ print(f"✓ Found {len(partial_completed)} partially completed songs: {sorted(partial_completed)}")
+ for idx in sorted(partial_completed):
+ s1, s2, s3, _, _, _ = song_status_map[idx]
+ status_parts = []
+ if s1: status_parts.append("Stage1")
+ if s2: status_parts.append("Stage2")
+ if s3: status_parts.append("Stage3")
+ print(f" Index {idx}: Completed {', '.join(status_parts)}")
+ remaining_count = len(music_data_list) - len(fully_completed)
+ print(f"✓ Will skip fully completed songs, {remaining_count} songs remaining to process")
+ else:
+ print(f"✓ No processed songs found, will start from the beginning")
+else:
+ print(f"✓ Output directory does not exist, will start from the beginning")
+
+# Load tokenizer and model
+device = torch.device(f"cuda:{cuda_idx}" if torch.cuda.is_available() else "cpu")
+mmtokenizer = _MMSentencePieceTokenizer("./mm_tokenizer_v0.2_hf/tokenizer.model")
+print("Loading Stage 1 model...")
+model = AutoModelForCausalLM.from_pretrained(
+ stage1_model,
+ torch_dtype=torch.bfloat16,
+ attn_implementation="flash_attention_2", # Using flash_attention_2 for better performance
+ # device_map="auto",
+ )
+# to device, if gpu is available
+model.to(device)
+model.eval()
+
+if torch.__version__ >= "2.0.0":
+ try:
+ model = torch.compile(model)
+ except Exception as e:
+ print(f"Warning: torch.compile not available: {e}")
+
+codectool = CodecManipulator("xcodec", 0, 1)
+codectool_stage2 = CodecManipulator("xcodec", 0, 8)
+model_config = OmegaConf.load(args.basic_model_config)
+codec_model = eval(model_config.generator.name)(**model_config.generator.config).to(device)
+# Load checkpoint with weights_only=False to allow OmegaConf types
+# Note: Only use this if you trust the checkpoint source
+parameter_dict = torch.load(args.resume_path, map_location='cpu', weights_only=False)
+codec_model.load_state_dict(parameter_dict['codec_model'])
+codec_model.to(device)
+codec_model.eval()
+
+class BlockTokenRangeProcessor(LogitsProcessor):
+ def __init__(self, start_id, end_id):
+ self.blocked_token_ids = list(range(start_id, end_id))
+
+ def __call__(self, input_ids, scores):
+ scores[:, self.blocked_token_ids] = -float("inf")
+ return scores
+
+def load_audio_mono(filepath, sampling_rate=16000):
+ audio, sr = torchaudio.load(filepath)
+ # Convert to mono
+ audio = torch.mean(audio, dim=0, keepdim=True)
+ # Resample if needed
+ if sr != sampling_rate:
+ resampler = Resample(orig_freq=sr, new_freq=sampling_rate)
+ audio = resampler(audio)
+ return audio
+
+def encode_audio(codec_model, audio_prompt, device, target_bw=0.5):
+ if len(audio_prompt.shape) < 3:
+ audio_prompt.unsqueeze_(0)
+ with torch.no_grad():
+ raw_codes = codec_model.encode(audio_prompt.to(device), target_bw=target_bw)
+ raw_codes = raw_codes.transpose(0, 1)
+ raw_codes = raw_codes.cpu().numpy().astype(np.int16)
+ return raw_codes
+
+def split_lyrics(lyrics):
+ """
+ Split lyrics by segments, following YuE official best practices:
+
+ Official requirements:
+ 1. Lyrics should be segmented using structure tags: [verse], [chorus], [bridge], [outro], etc.
+ 2. Each segment is separated by two newlines "\n\n"
+ 3. Each segment is about 30 seconds (when --max_new_tokens 3000), don't put too many words
+ 4. Avoid using [intro] tag (not very stable), recommend starting with [verse] or [chorus]
+ 5. Supports multiple languages: English, Chinese, Cantonese, Japanese, Korean, etc.
+
+ Args:
+ lyrics: Raw lyrics string
+
+ Returns:
+ Structured lyrics segment list, each segment in [tag]\ncontent\n\n format
+ """
+ # Regular expression: match [any tag] and its following content
+ # Supports: [Verse 1], [Pre-Chorus], [Chorus (Outro)] and other complex tags
+ pattern = r"\[([^\]]+)\](.*?)(?=\[|\Z)"
+ segments = re.findall(pattern, lyrics, re.DOTALL)
+ structured_lyrics = [f"[{seg[0]}]\n{seg[1].strip()}\n\n" for seg in segments]
+ return structured_lyrics
+
+def save_audio(wav: torch.Tensor, path, sample_rate: int, rescale: bool = False):
+ folder_path = os.path.dirname(path)
+ if not os.path.exists(folder_path):
+ os.makedirs(folder_path)
+ limit = 0.99
+ max_val = wav.abs().max()
+ wav = wav * min(limit / max_val, 1) if rescale else wav.clamp(-limit, limit)
+ torchaudio.save(str(path), wav, sample_rate=sample_rate, encoding='PCM_S', bits_per_sample=16)
+
+def stage2_generate(model, prompt, batch_size=16):
+ codec_ids = codectool.unflatten(prompt, n_quantizer=1)
+ codec_ids = codectool.offset_tok_ids(
+ codec_ids,
+ global_offset=codectool.global_offset,
+ codebook_size=codectool.codebook_size,
+ num_codebooks=codectool.num_codebooks,
+ ).astype(np.int32)
+
+ # Prepare prompt_ids based on batch size or single input
+ if batch_size > 1:
+ codec_list = []
+ for i in range(batch_size):
+ idx_begin = i * 300
+ idx_end = (i + 1) * 300
+ codec_list.append(codec_ids[:, idx_begin:idx_end])
+
+ codec_ids = np.concatenate(codec_list, axis=0)
+ prompt_ids = np.concatenate(
+ [
+ np.tile([mmtokenizer.soa, mmtokenizer.stage_1], (batch_size, 1)),
+ codec_ids,
+ np.tile([mmtokenizer.stage_2], (batch_size, 1)),
+ ],
+ axis=1
+ )
+ else:
+ prompt_ids = np.concatenate([
+ np.array([mmtokenizer.soa, mmtokenizer.stage_1]),
+ codec_ids.flatten(), # Flatten the 2D array to 1D
+ np.array([mmtokenizer.stage_2])
+ ]).astype(np.int32)
+ prompt_ids = prompt_ids[np.newaxis, ...]
+
+ codec_ids = torch.as_tensor(codec_ids).to(device)
+ prompt_ids = torch.as_tensor(prompt_ids).to(device)
+ len_prompt = prompt_ids.shape[-1]
+
+ block_list = LogitsProcessorList([BlockTokenRangeProcessor(0, 46358), BlockTokenRangeProcessor(53526, mmtokenizer.vocab_size)])
+
+ # Teacher forcing generate loop
+ for frames_idx in range(codec_ids.shape[1]):
+ cb0 = codec_ids[:, frames_idx:frames_idx+1]
+ prompt_ids = torch.cat([prompt_ids, cb0], dim=1)
+ input_ids = prompt_ids
+
+ with torch.no_grad():
+ stage2_output = model.generate(input_ids=input_ids,
+ min_new_tokens=7,
+ max_new_tokens=7,
+ eos_token_id=mmtokenizer.eoa,
+ pad_token_id=mmtokenizer.eoa,
+ logits_processor=block_list,
+ )
+
+ assert stage2_output.shape[1] - prompt_ids.shape[1] == 7, f"output new tokens={stage2_output.shape[1]-prompt_ids.shape[1]}"
+ prompt_ids = stage2_output
+
+ # Return output based on batch size
+ if batch_size > 1:
+ output = prompt_ids.cpu().numpy()[:, len_prompt:]
+ output_list = [output[i] for i in range(batch_size)]
+ output = np.concatenate(output_list, axis=0)
+ else:
+ output = prompt_ids[0].cpu().numpy()[len_prompt:]
+
+ return output
+
+def sanitize_genres_for_filename(genres, max_length=80):
+ """
+ Clean and truncate genres string for filename generation
+ Ensure filename is not too long (Linux filename limit is 255 bytes)
+
+ Args:
+ genres: Raw genres string
+ max_length: Maximum length of genres part (default 80, leaving space for other parameters)
+
+ Returns:
+ Cleaned genres string
+ """
+ if not genres:
+ return "Unknown"
+
+ # Clean unsafe characters
+ genres_clean = re.sub(r'[<>:"/\\|?*\x00-\x1f]', '_', genres)
+ genres_clean = genres_clean.strip('_').strip()
+
+ # If contains comma-separated tags, try to keep first few tags
+ if ',' in genres_clean:
+ tags = [tag.strip() for tag in genres_clean.split(',')]
+ # Try to keep first few tags until reaching length limit
+ result_tags = []
+ current_length = 0
+ for tag in tags:
+ if current_length + len(tag) + 1 <= max_length: # +1 for comma
+ result_tags.append(tag)
+ current_length += len(tag) + 1
+ else:
+ break
+ if result_tags:
+ genres_clean = ','.join(result_tags)
+ else:
+ # If first tag is too long, directly truncate
+ genres_clean = tags[0][:max_length] if tags else genres_clean[:max_length]
+
+ # If still too long, directly truncate
+ if len(genres_clean) > max_length:
+ genres_clean = genres_clean[:max_length]
+
+ # Replace spaces with hyphens (for consistency)
+ genres_clean = genres_clean.replace(' ', '-')
+
+ return genres_clean
+
+def stage2_inference(model, stage1_output_set, stage2_output_dir, batch_size=4):
+ stage2_result = []
+ for i in tqdm(range(len(stage1_output_set)), desc="Stage 2 inference"):
+ output_filename = os.path.join(stage2_output_dir, os.path.basename(stage1_output_set[i]))
+
+ if os.path.exists(output_filename):
+ print(f'{output_filename} stage2 has done.')
+ stage2_result.append(output_filename)
+ continue
+
+ # Load the prompt
+ prompt = np.load(stage1_output_set[i]).astype(np.int32)
+
+ # Only accept 6s segments
+ output_duration = prompt.shape[-1] // 50 // 6 * 6
+ num_batch = output_duration // 6
+
+ if num_batch <= batch_size:
+ # If num_batch is less than or equal to batch_size, we can infer the entire prompt at once
+ output = stage2_generate(model, prompt[:, :output_duration*50], batch_size=num_batch)
+ else:
+ # If num_batch is greater than batch_size, process in chunks of batch_size
+ segments = []
+ num_segments = (num_batch // batch_size) + (1 if num_batch % batch_size != 0 else 0)
+
+ for seg in range(num_segments):
+ start_idx = seg * batch_size * 300
+ # Ensure the end_idx does not exceed the available length
+ end_idx = min((seg + 1) * batch_size * 300, output_duration*50) # Adjust the last segment
+ current_batch_size = batch_size if seg != num_segments-1 or num_batch % batch_size == 0 else num_batch % batch_size
+ segment = stage2_generate(
+ model,
+ prompt[:, start_idx:end_idx],
+ batch_size=current_batch_size
+ )
+ segments.append(segment)
+
+ # Concatenate all the segments
+ output = np.concatenate(segments, axis=0)
+
+ # Process the ending part of the prompt
+ if output_duration*50 != prompt.shape[-1]:
+ ending = stage2_generate(model, prompt[:, output_duration*50:], batch_size=1)
+ output = np.concatenate([output, ending], axis=0)
+ output = codectool_stage2.ids2npy(output)
+
+ # Fix invalid codes (a dirty solution, which may harm the quality of audio)
+ # We are trying to find better one
+ fixed_output = copy.deepcopy(output)
+ for i, line in enumerate(output):
+ for j, element in enumerate(line):
+ if element < 0 or element > 1023:
+ counter = Counter(line)
+ most_frequant = sorted(counter.items(), key=lambda x: x[1], reverse=True)[0][0]
+ fixed_output[i, j] = most_frequant
+ # save output
+ np.save(output_filename, fixed_output)
+ stage2_result.append(output_filename)
+ return stage2_result
+
+def process_one_song(music_data, song_idx, total_songs):
+ """Process Stage 1 for a single song"""
+
+ # Compatible with genre and description fields
+ genres = music_data.get('genre') or music_data.get('description', '')
+ lyrics_raw = music_data['lyrics']
+ description = music_data.get('description', '')
+
+ print(f"Description: {description[:100]}...")
+ print(f"Genre tags: {genres}")
+
+ # ===== Print original lyrics =====
+ print("\n" + "="*60)
+ print("【Original Lyrics (lyrics_raw)】")
+ print("="*60)
+ print(lyrics_raw)
+ print("="*60 + "\n")
+
+ lyrics = split_lyrics(lyrics_raw)
+
+ # Validate lyrics format and give warnings (following official best practices)
+ print(f"Lyrics analysis: Identified {len(lyrics)} segments")
+
+ # ===== Print segmented lyrics =====
+ print("\n" + "="*60)
+ print("【Segmented Lyrics (lyrics)】")
+ print("="*60)
+ for i, seg in enumerate(lyrics):
+ tag = seg.split('\n')[0].strip()
+ # Check if unstable [intro] tag is used
+ if 'intro' in tag.lower():
+ print(f" ⚠️ Warning: Segment {i+1} uses {tag} tag, official recommendation is to avoid [intro], use [verse] or [chorus] instead")
+ else:
+ print(f" Segment {i+1}. {tag}")
+ # Print each segment's content (limit length)
+ content = seg.strip()
+ if len(content) > 150:
+ print(f" Content preview: {content[:150]}...")
+ else:
+ print(f" Content: {content}")
+ print()
+ print("="*60 + "\n")
+
+ # Create output directory for this song
+ random_id = uuid.uuid4()
+ song_output_dir = os.path.join(args.output_dir, f"song_{song_idx:04d}_{random_id}")
+ stage1_output_dir = os.path.join(song_output_dir, "stage1")
+ stage2_output_dir = os.path.join(song_output_dir, "stage2")
+ os.makedirs(stage1_output_dir, exist_ok=True)
+ os.makedirs(stage2_output_dir, exist_ok=True)
+
+ # Stage 1: Generate audio tokens
+ print("--- Stage 1: Generate audio tokens ---")
+ stage1_output_set = []
+ full_lyrics = "\n".join(lyrics)
+ prompt_texts = [f"Generate music from the given lyrics segment by segment.\n[Genre] {genres}\n{full_lyrics}"]
+ prompt_texts += lyrics
+
+ # ===== Print prompt texts passed to model =====
+ print("\n" + "="*60)
+ print("【Prompt Texts Passed to Model (prompt_texts)】")
+ print("="*60)
+ print(f"Total {len(prompt_texts)} prompts (first is full prompt, subsequent are segments)\n")
+ for i, pt in enumerate(prompt_texts):
+ if i == 0:
+ print(f"Prompt {i} [Full prompt header]:")
+ if len(pt) > 300:
+ print(f"{pt[:300]}...")
+ else:
+ print(pt)
+ else:
+ print(f"\nPrompt {i} [Segment {i}]:")
+ if len(pt) > 200:
+ print(f"{pt[:200]}...")
+ else:
+ print(pt)
+ print("="*60 + "\n")
+
+ output_seq = None
+ # Here is suggested decoding config
+ top_p = 0.93
+ temperature = 1.0
+ repetition_penalty = args.repetition_penalty
+ if not do_sample_stage1:
+ print("Note: --no_sample is enabled, Stage 1 will use deterministic decoding; top_p/temperature will be ignored.")
+ # special tokens
+ start_of_segment = mmtokenizer.tokenize('[start_of_segment]')
+ end_of_segment = mmtokenizer.tokenize('[end_of_segment]')
+ # Format text prompt
+ # +1 because prompt_texts[0] is the full prompt which will be skipped, so need len(lyrics)+1 to process all segments
+ run_n_segments = min(args.run_n_segments+1, len(lyrics)+1)
+
+ for i, p in enumerate(tqdm(prompt_texts[:run_n_segments], desc="Stage1 inference")):
+ section_text = p.replace('[start_of_segment]', '').replace('[end_of_segment]', '')
+ guidance_scale = 1.5 if i <=1 else 1.2
+
+ # ===== Print currently processing segment =====
+ if i == 0:
+ print(f"\n[Segment {i}] Skipped (full prompt header)")
+ else:
+ print(f"\n" + "-"*60)
+ print(f"[Processing segment {i}/{len(prompt_texts[:run_n_segments])-1}]")
+ print("-"*60)
+ tag_line = section_text.split('\n')[0] if '\n' in section_text else section_text[:50]
+ print(f"Segment tag: {tag_line}")
+ print(f"Segment content length: {len(section_text)} characters")
+ if len(section_text) > 200:
+ print(f"Segment content preview: {section_text[:200]}...")
+ else:
+ print(f"Segment content: {section_text}")
+ print("-"*60)
+
+ if i==0:
+ continue
+ if i==1:
+ if args.use_dual_tracks_prompt or args.use_audio_prompt:
+ if args.use_dual_tracks_prompt:
+ vocals_ids = load_audio_mono(args.vocal_track_prompt_path)
+ instrumental_ids = load_audio_mono(args.instrumental_track_prompt_path)
+ vocals_ids = encode_audio(codec_model, vocals_ids, device, target_bw=0.5)
+ instrumental_ids = encode_audio(codec_model, instrumental_ids, device, target_bw=0.5)
+ vocals_ids = codectool.npy2ids(vocals_ids[0])
+ instrumental_ids = codectool.npy2ids(instrumental_ids[0])
+ ids_segment_interleaved = rearrange([np.array(vocals_ids), np.array(instrumental_ids)], 'b n -> (n b)')
+ audio_prompt_codec = ids_segment_interleaved[int(args.prompt_start_time*50*2): int(args.prompt_end_time*50*2)]
+ audio_prompt_codec = audio_prompt_codec.tolist()
+ elif args.use_audio_prompt:
+ audio_prompt = load_audio_mono(args.audio_prompt_path)
+ raw_codes = encode_audio(codec_model, audio_prompt, device, target_bw=0.5)
+ # Format audio prompt
+ code_ids = codectool.npy2ids(raw_codes[0])
+ audio_prompt_codec = code_ids[int(args.prompt_start_time *50): int(args.prompt_end_time *50)] # 50 is tps of xcodec
+ audio_prompt_codec_ids = [mmtokenizer.soa] + codectool.sep_ids + audio_prompt_codec + [mmtokenizer.eoa]
+ sentence_ids = mmtokenizer.tokenize("[start_of_reference]") + audio_prompt_codec_ids + mmtokenizer.tokenize("[end_of_reference]")
+ head_id = mmtokenizer.tokenize(prompt_texts[0]) + sentence_ids
+ else:
+ head_id = mmtokenizer.tokenize(prompt_texts[0])
+ prompt_ids = head_id + start_of_segment + mmtokenizer.tokenize(section_text) + [mmtokenizer.soa] + codectool.sep_ids
+ else:
+ prompt_ids = end_of_segment + start_of_segment + mmtokenizer.tokenize(section_text) + [mmtokenizer.soa] + codectool.sep_ids
+
+ prompt_ids = torch.as_tensor(prompt_ids).unsqueeze(0).to(device)
+ input_ids = torch.cat([raw_output, prompt_ids], dim=1) if i > 1 else prompt_ids
+ # Use window slicing in case output sequence exceeds the context of model
+ max_context = 16384-max_new_tokens-1
+ if input_ids.shape[-1] > max_context:
+ print(f'Section {i}: output length {input_ids.shape[-1]} exceeding context length {max_context}, now using the last {max_context} tokens.')
+ input_ids = input_ids[:, -(max_context):]
+ with torch.no_grad():
+ output_seq = model.generate(
+ input_ids=input_ids,
+ max_new_tokens=max_new_tokens,
+ min_new_tokens=100,
+ do_sample=do_sample_stage1,
+ top_p=top_p,
+ temperature=temperature,
+ repetition_penalty=repetition_penalty,
+ eos_token_id=mmtokenizer.eoa,
+ pad_token_id=mmtokenizer.eoa,
+ logits_processor=LogitsProcessorList([BlockTokenRangeProcessor(0, 32002), BlockTokenRangeProcessor(32016, 32016)]),
+ guidance_scale=guidance_scale,
+ )
+ if output_seq[0][-1].item() != mmtokenizer.eoa:
+ tensor_eoa = torch.as_tensor([[mmtokenizer.eoa]]).to(model.device)
+ output_seq = torch.cat((output_seq, tensor_eoa), dim=1)
+ if i > 1:
+ raw_output = torch.cat([raw_output, prompt_ids, output_seq[:, input_ids.shape[-1]:]], dim=1)
+ else:
+ raw_output = output_seq
+
+ # save raw output and check sanity
+ ids = raw_output[0].cpu().numpy()
+ soa_idx = np.where(ids == mmtokenizer.soa)[0].tolist()
+ eoa_idx = np.where(ids == mmtokenizer.eoa)[0].tolist()
+ if len(soa_idx)!=len(eoa_idx):
+ raise ValueError(f'invalid pairs of soa and eoa, Num of soa: {len(soa_idx)}, Num of eoa: {len(eoa_idx)}')
+
+ vocals = []
+ instrumentals = []
+ range_begin = 1 if args.use_audio_prompt or args.use_dual_tracks_prompt else 0
+ for i in range(range_begin, len(soa_idx)):
+ codec_ids = ids[soa_idx[i]+1:eoa_idx[i]]
+ if codec_ids[0] == 32016:
+ codec_ids = codec_ids[1:]
+ codec_ids = codec_ids[:2 * (codec_ids.shape[0] // 2)]
+ vocals_ids = codectool.ids2npy(rearrange(codec_ids,"(n b) -> b n", b=2)[0])
+ vocals.append(vocals_ids)
+ instrumentals_ids = codectool.ids2npy(rearrange(codec_ids,"(n b) -> b n", b=2)[1])
+ instrumentals.append(instrumentals_ids)
+ vocals = np.concatenate(vocals, axis=1)
+ instrumentals = np.concatenate(instrumentals, axis=1)
+ # Clean genres string to avoid filename being too long
+ genres_clean = sanitize_genres_for_filename(genres, max_length=80)
+ vocal_save_path = os.path.join(stage1_output_dir, f"{genres_clean}_tp{top_p}_T{temperature}_rp{repetition_penalty}_maxtk{max_new_tokens}_{random_id}_vtrack".replace('.', '@')+'.npy')
+ inst_save_path = os.path.join(stage1_output_dir, f"{genres_clean}_tp{top_p}_T{temperature}_rp{repetition_penalty}_maxtk{max_new_tokens}_{random_id}_itrack".replace('.', '@')+'.npy')
+ np.save(vocal_save_path, vocals)
+ np.save(inst_save_path, instrumentals)
+ stage1_output_set.append(vocal_save_path)
+ stage1_output_set.append(inst_save_path)
+
+ return stage1_output_set, stage2_output_dir, song_output_dir
+
+# Load Stage 2 model and vocoder (load only once)
+print("\n" + "="*60)
+print("Loading Stage 2 model...")
+print("="*60)
+model_stage2 = AutoModelForCausalLM.from_pretrained(
+ stage2_model,
+ torch_dtype=torch.bfloat16,
+ attn_implementation="flash_attention_2", # Using flash_attention_2 for better performance
+ # device_map="auto",
+ )
+model_stage2.to(device)
+model_stage2.eval()
+
+if torch.__version__ >= "2.0.0":
+ try:
+ model_stage2 = torch.compile(model_stage2)
+ except Exception as e:
+ print(f"Warning: torch.compile not available: {e}")
+
+print("Loading vocoder...")
+vocal_decoder, inst_decoder = build_codec_model(args.config_path, args.vocal_decoder_path, args.inst_decoder_path)
+
+# Batch process all songs - process each song completely before continuing to next
+all_results = []
+skipped_count = 0
+for list_idx, music_data in enumerate(music_data_list):
+ # Calculate actual song index (considering start_idx offset)
+ song_idx = start_idx + list_idx
+
+ try:
+ # Compatible with genre and description fields
+ genres = music_data.get('genre') or music_data.get('description', '')
+
+ # Check processing status
+ stage1_done = False
+ stage2_done = False
+ stage3_done = False
+ song_output_dir = None
+ stage1_output_set = None
+ stage2_output_dir = None
+
+ if song_idx in song_status_map:
+ stage1_done, stage2_done, stage3_done, song_output_dir, stage1_output_set, stage2_output_dir = song_status_map[song_idx]
+
+ # If all completed, skip
+ if stage3_done:
+ print(f"\n{'='*60}")
+ print(f"⏭️ Skipping song {list_idx+1}/{len(music_data_list)} (index {song_idx}, fully completed)")
+ print(f"{'='*60}")
+ skipped_count += 1
+ continue
+
+ # Decide which stage to start from based on completion status
+ print(f"\n{'='*60}")
+ print(f"Starting to process song {list_idx+1}/{len(music_data_list)} (index {song_idx})")
+ if stage1_done:
+ print(f" ✓ Stage 1 completed, will start from Stage 2")
+ if stage2_done:
+ print(f" ✓ Stage 2 completed, will start from Stage 3")
+ print(f"{'='*60}")
+
+ # Stage 1: Generate audio tokens (if not completed)
+ if not stage1_done:
+ stage1_output_set, stage2_output_dir, song_output_dir = process_one_song(music_data, song_idx, len(music_data_list))
+ print(f"✓ Stage 1 completed, generated {len(stage1_output_set)} files")
+ for f in stage1_output_set:
+ print(f" - {os.path.basename(f)}")
+ else:
+ print(f"⏭️ Skipping Stage 1 (completed)")
+ print(f" Using existing Stage 1 outputs:")
+ for f in stage1_output_set:
+ print(f" - {os.path.basename(f)}")
+
+ # Note: Do not unload Stage 1 model here, as subsequent songs still need it
+ # Stage 1 model will be unloaded uniformly after all songs are processed
+
+ # Stage 2: Process audio tokens (if not completed)
+ if not stage2_done:
+ print(f"\n--- Stage 2: Processing song {list_idx+1} (index {song_idx}) ---")
+ stage2_result = stage2_inference(model_stage2, stage1_output_set, stage2_output_dir, batch_size=args.stage2_batch_size)
+ print(f"✓ Stage 2 completed, generated {len(stage2_result)} files")
+ for f in stage2_result:
+ print(f" - {os.path.basename(f)}")
+ else:
+ print(f"\n⏭️ Skipping Stage 2 (completed)")
+ # Get existing stage2 results
+ stage2_result = []
+ if os.path.exists(stage2_output_dir):
+ for f in stage1_output_set:
+ basename = os.path.basename(f)
+ stage2_file = os.path.join(stage2_output_dir, basename)
+ if os.path.exists(stage2_file):
+ stage2_result.append(stage2_file)
+ print(f" Using existing Stage 2 outputs:")
+ for f in stage2_result:
+ print(f" - {os.path.basename(f)}")
+
+ # Stage 3: Reconstruct audio and mix (if not completed)
+ final_output = None
+ if not stage3_done:
+ print(f"\n--- Stage 3: Reconstructing audio for song {list_idx+1} (index {song_idx}) ---")
+
+ # reconstruct tracks
+ recons_output_dir = os.path.join(song_output_dir, "recons")
+ recons_mix_dir = os.path.join(recons_output_dir, 'mix')
+ os.makedirs(recons_mix_dir, exist_ok=True)
+ tracks = []
+ for npy in stage2_result:
+ codec_result = np.load(npy)
+ decodec_rlt=[]
+ with torch.no_grad():
+ decoded_waveform = codec_model.decode(torch.as_tensor(codec_result.astype(np.int16), dtype=torch.long).unsqueeze(0).permute(1, 0, 2).to(device))
+ decoded_waveform = decoded_waveform.cpu().squeeze(0)
+ decodec_rlt.append(torch.as_tensor(decoded_waveform))
+ decodec_rlt = torch.cat(decodec_rlt, dim=-1)
+ save_path = os.path.join(recons_output_dir, os.path.splitext(os.path.basename(npy))[0] + ".mp3")
+ tracks.append(save_path)
+ save_audio(decodec_rlt, save_path, 16000)
+
+ # mix tracks
+ recons_mix = None
+ for inst_path in tracks:
+ try:
+ if (inst_path.endswith('.wav') or inst_path.endswith('.mp3')) \
+ and '_itrack' in inst_path:
+ # find pair
+ vocal_path = inst_path.replace('_itrack', '_vtrack')
+ if not os.path.exists(vocal_path):
+ continue
+ # mix
+ recons_mix = os.path.join(recons_mix_dir, os.path.basename(inst_path).replace('_itrack', '_mixed'))
+ vocal_stem, sr = sf.read(inst_path)
+ instrumental_stem, _ = sf.read(vocal_path)
+ mix_stem = (vocal_stem + instrumental_stem) / 1
+ sf.write(recons_mix, mix_stem, sr)
+ except Exception as e:
+ print(e)
+
+ # vocoder to upsample audios
+ vocoder_output_dir = os.path.join(song_output_dir, 'vocoder')
+ vocoder_stems_dir = os.path.join(vocoder_output_dir, 'stems')
+ vocoder_mix_dir = os.path.join(vocoder_output_dir, 'mix')
+ os.makedirs(vocoder_mix_dir, exist_ok=True)
+ os.makedirs(vocoder_stems_dir, exist_ok=True)
+
+ for npy in stage2_result:
+ if '_itrack' in npy:
+ # Process instrumental
+ instrumental_output = process_audio(
+ npy,
+ os.path.join(vocoder_stems_dir, 'itrack.mp3'),
+ args.rescale,
+ args,
+ inst_decoder,
+ codec_model
+ )
+ else:
+ # Process vocal
+ vocal_output = process_audio(
+ npy,
+ os.path.join(vocoder_stems_dir, 'vtrack.mp3'),
+ args.rescale,
+ args,
+ vocal_decoder,
+ codec_model
+ )
+
+ # mix tracks
+ vocoder_mix = None
+ try:
+ mix_output = instrumental_output + vocal_output
+ vocoder_mix = os.path.join(vocoder_mix_dir, os.path.basename(recons_mix))
+ save_audio(mix_output, vocoder_mix, 44100, args.rescale)
+ print(f"Created mix: {vocoder_mix}")
+ except RuntimeError as e:
+ print(e)
+ print(f"Mix failed! inst: {instrumental_output.shape}, vocal: {vocal_output.shape}")
+
+ # Post process
+ if recons_mix and vocoder_mix:
+ final_output = os.path.join(song_output_dir, os.path.basename(recons_mix))
+ replace_low_freq_with_energy_matched(
+ a_file=recons_mix, # 16kHz
+ b_file=vocoder_mix, # 48kHz
+ c_file=final_output,
+ cutoff_freq=5500.0
+ )
+ print(f"✓ Song {list_idx+1} (index {song_idx}) completed! Output: {final_output}")
+ else:
+ print(f"\n⏭️ Skipping Stage 3 (completed)")
+ # Find final output file (usually in song_dir root directory)
+ # First check root directory
+ root_files = [f for f in os.listdir(song_output_dir) if f.endswith('_mixed.mp3')]
+ if root_files:
+ final_output = os.path.join(song_output_dir, root_files[0])
+ else:
+ # If root directory doesn't have it, traverse subdirectories to find
+ for root, dirs, files in os.walk(song_output_dir):
+ for f in files:
+ if f.endswith('_mixed.mp3'):
+ final_output = os.path.join(root, f)
+ break
+ if final_output:
+ break
+ if final_output:
+ print(f" Final output: {final_output}")
+
+ all_results.append({
+ 'song_idx': song_idx,
+ 'genres': genres,
+ 'output_path': final_output if recons_mix and vocoder_mix else None
+ })
+
+ except Exception as e:
+ print(f"✗ Error processing song {list_idx+1} (index {song_idx}): {e}")
+ import traceback
+ traceback.print_exc()
+ continue
+
+# After all songs are processed, unload models to free memory
+if not args.disable_offload_model:
+ print("\nCleaning up models to free memory...")
+ if 'model' in locals():
+ model.cpu()
+ del model
+ if 'model_stage2' in locals():
+ model_stage2.cpu()
+ del model_stage2
+ torch.cuda.empty_cache()
+ print("Models unloaded")
+
+print("\n" + "="*60)
+print("Batch generation complete!")
+newly_processed = len([r for r in all_results if r.get('output_path')])
+print(f"✓ Newly processed: {newly_processed} songs")
+if skipped_count > 0:
+ print(f"⏭️ Skipped (already completed): {skipped_count} songs")
+print(f"📊 Total completed: {newly_processed + skipped_count} songs")
+print("="*60)
+for result in all_results:
+ if result.get('output_path'):
+ print(f"Song {result['song_idx']+1}: {result['output_path']}")
+
diff --git a/baseline_generate/yue/mmtokenizer.py b/baseline_generate/yue/mmtokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..2b5e19ce39468783ff27d1185b34ba3d3d1a17e6
--- /dev/null
+++ b/baseline_generate/yue/mmtokenizer.py
@@ -0,0 +1,367 @@
+from abc import ABC
+from abc import abstractmethod
+
+
+class AbstractTokenizer(ABC):
+ """Abstract class for tokenizer."""
+
+ def __init__(self, name):
+ self.name = name
+ super().__init__()
+
+ @property
+ @abstractmethod
+ def vocab_size(self):
+ pass
+
+ @property
+ @abstractmethod
+ def vocab(self):
+ """Dictionary from vocab text token to id token."""
+ pass
+
+ @property
+ @abstractmethod
+ def inv_vocab(self):
+ """Dictionary from vocab id token to text token."""
+ pass
+
+ @abstractmethod
+ def tokenize(self, text):
+ pass
+
+ def detokenize(self, token_ids):
+ raise NotImplementedError('detokenizer is not implemented for {} '
+ 'tokenizer'.format(self.name))
+
+ @property
+ def cls(self):
+ raise NotImplementedError('CLS is not provided for {} '
+ 'tokenizer'.format(self.name))
+
+ @property
+ def sep(self):
+ raise NotImplementedError('SEP is not provided for {} '
+ 'tokenizer'.format(self.name))
+
+ @property
+ def pad(self):
+ raise NotImplementedError('PAD is not provided for {} '
+ 'tokenizer'.format(self.name))
+
+ @property
+ def eod(self):
+ raise NotImplementedError('EOD is not provided for {} '
+ 'tokenizer'.format(self.name))
+
+ @property
+ def mask(self):
+ raise NotImplementedError('MASK is not provided for {} '
+ 'tokenizer'.format(self.name))
+
+
+class _SentencePieceTokenizer(AbstractTokenizer):
+ """SentencePieceTokenizer-Megatron wrapper"""
+
+ def __init__(self, model_file, vocab_extra_ids=0):
+ name = 'SentencePieceTokenizer'
+ super().__init__(name)
+
+ import sentencepiece
+ self.tokenizer = sentencepiece.SentencePieceProcessor(model_file=model_file)
+ self._initalize(vocab_extra_ids)
+
+ def _populate_vocab(self):
+ self._vocab = {}
+ self._inv_vocab = {}
+
+ for i in range(len(self.tokenizer)):
+ t = self.tokenizer.id_to_piece(i)
+ self._inv_vocab[i] = t
+ self._vocab[t] = i
+
+ def _initalize(self, vocab_extra_ids):
+ self._populate_vocab()
+ self._special_tokens = {}
+ self._inv_special_tokens = {}
+
+ self._t5_tokens = []
+
+ def _add_special_token(t):
+ if t not in self._vocab:
+ next_id = len(self._vocab)
+ self._vocab[t] = next_id
+ self._inv_vocab[next_id] = t
+ self._special_tokens[t] = self._vocab[t]
+ self._inv_special_tokens[self._vocab[t]] = t
+
+ _add_special_token('')
+ self._cls_id = self._vocab['']
+ _add_special_token('')
+ self._sep_id = self._vocab['']
+ _add_special_token('')
+ self._eod_id = self._vocab['']
+ _add_special_token('')
+ self._mask_id = self._vocab['']
+
+ pad_id = self.tokenizer.pad_id()
+ try:
+ pad_token = self.tokenizer.id_to_piece(pad_id)
+ except IndexError:
+ pad_token = ''
+ _add_special_token(pad_token)
+ self._pad_id = self._vocab[pad_token]
+
+ bos_id = self.tokenizer.bos_id()
+ try:
+ bos_token = self.tokenizer.id_to_piece(bos_id)
+ except IndexError:
+ bos_token = ''
+ _add_special_token(bos_token)
+ self._bos_id = self._vocab[bos_token]
+
+ eos_id = self.tokenizer.eos_id()
+ try:
+ eos_token = self.tokenizer.id_to_piece(eos_id)
+ except IndexError:
+ eos_token = ''
+ _add_special_token(eos_token)
+ self._eos_id = self._vocab[eos_token]
+
+ for i in range(vocab_extra_ids):
+ t = "".format(i)
+ _add_special_token(t)
+ self._t5_tokens += [t]
+
+ @property
+ def vocab_size(self):
+ return len(self._vocab)
+
+ @property
+ def vocab(self):
+ return self._vocab
+
+ @property
+ def inv_vocab(self):
+ return self._inv_vocab
+
+ @property
+ def decoder(self):
+ return self._inv_vocab
+
+ @property
+ def encoder(self):
+ return self._vocab
+
+ # From:
+ # https://github.com/NVIDIA/NeMo/blob/c8fa217e811d60d11d014827c7f3845ff6c99ae7/nemo/collections/common/tokenizers/sentencepiece_tokenizer.py#L89
+ def tokenize(self, text):
+ ids = []
+ idx = 0
+
+ while 1:
+ indices = {}
+ for token in self._special_tokens:
+ try:
+ indices[token] = text[idx:].index(token)
+ except ValueError:
+ continue
+ if len(indices) == 0:
+ break
+
+ next_token = min(indices, key=indices.get)
+ next_idx = idx + indices[next_token]
+
+ ids.extend(self.tokenizer.encode_as_ids(text[idx:next_idx]))
+ ids.append(self._special_tokens[next_token])
+ idx = next_idx + len(next_token)
+
+ ids.extend(self.tokenizer.encode_as_ids(text[idx:]))
+ return ids
+
+ # From:
+ # https://github.com/NVIDIA/NeMo/blob/c8fa217e811d60d11d014827c7f3845ff6c99ae7/nemo/collections/common/tokenizers/sentencepiece_tokenizer.py#L125
+ def detokenize(self, ids):
+ text = ""
+ last_i = 0
+
+ for i, id in enumerate(ids):
+ if id in self._inv_special_tokens:
+ text += self.tokenizer.decode_ids(ids[last_i:i]) + " "
+ text += self._inv_special_tokens[id] + " "
+ last_i = i + 1
+
+ text += self.tokenizer.decode_ids(ids[last_i:])
+ return text
+
+ @property
+ def cls(self):
+ return self._cls_id
+
+ @property
+ def sep(self):
+ return self._sep_id
+
+ @property
+ def pad(self):
+ return self._pad_id
+
+ @property
+ def bos_token_id(self):
+ return self._bos_id
+
+ @property
+ def bos(self):
+ return self._bos_id
+
+ @property
+ def eod(self):
+ return self._eod_id
+
+ @property
+ def eos_token_id(self):
+ return self._eos_id
+
+ @property
+ def eos(self):
+ return self._eos_id
+
+ @property
+ def mask(self):
+ return self._mask_id
+
+ @property
+ def additional_special_tokens_ids(self):
+ return [self.vocab[k] for k in self._t5_tokens]
+
+class _MMSentencePieceTokenizer(_SentencePieceTokenizer):
+ """SentencePieceTokenizer-Megatron wrapper"""
+
+ def __init__(self, model_file, vocab_extra_ids=0):
+ super().__init__(model_file, vocab_extra_ids)
+
+
+ def _initalize(self, vocab_extra_ids):
+ self._populate_vocab()
+ self._special_tokens = {}
+ self._inv_special_tokens = {}
+
+ self._t5_tokens = []
+
+ def _add_special_token(t):
+ if t not in self._vocab:
+ next_id = len(self._vocab)
+ self._vocab[t] = next_id
+ self._inv_vocab[next_id] = t
+ self._special_tokens[t] = self._vocab[t]
+ self._inv_special_tokens[self._vocab[t]] = t
+
+ _add_special_token('')
+ self._cls_id = self._vocab['']
+ _add_special_token('')
+ self._sep_id = self._vocab['']
+ _add_special_token('')
+ self._eod_id = self._vocab['']
+ _add_special_token('')
+ self._mask_id = self._vocab['']
+
+ _add_special_token('')
+ self._soa_id = self._vocab['']
+ _add_special_token('')
+ self._eoa_id = self._vocab['']
+ _add_special_token('')
+ self._sov_id = self._vocab['']
+ _add_special_token('')
+ self._eov_id = self._vocab['']
+ _add_special_token('')
+ self._soi_id = self._vocab['']
+ _add_special_token('')
+ self._eoi_id = self._vocab['']
+ _add_special_token('')
+ self._s_local_id = self._vocab['']
+ _add_special_token('')
+ self._e_local_id = self._vocab['']
+ _add_special_token('')
+ self._s_global_id = self._vocab['']
+ _add_special_token('')
+ self._e_global_id = self._vocab['']
+ _add_special_token('')
+ self._stage_1_id = self._vocab['']
+ _add_special_token('')
+ self._stage_2_id = self._vocab['']
+ pad_id = self.tokenizer.pad_id()
+ try:
+ pad_token = self.tokenizer.id_to_piece(pad_id)
+ except IndexError:
+ pad_token = ''
+ _add_special_token(pad_token)
+ self._pad_id = self._vocab[pad_token]
+
+ bos_id = self.tokenizer.bos_id()
+ try:
+ bos_token = self.tokenizer.id_to_piece(bos_id)
+ except IndexError:
+ bos_token = ''
+ _add_special_token(bos_token)
+ self._bos_id = self._vocab[bos_token]
+
+ eos_id = self.tokenizer.eos_id()
+ try:
+ eos_token = self.tokenizer.id_to_piece(eos_id)
+ except IndexError:
+ eos_token = ''
+ _add_special_token(eos_token)
+ self._eos_id = self._vocab[eos_token]
+
+ for i in range(vocab_extra_ids):
+ t = "".format(i)
+ _add_special_token(t)
+ self._t5_tokens += [t]
+
+ @property
+ def soa(self):
+ return self._soa_id
+
+ @property
+ def eoa(self):
+ return self._eoa_id
+
+ @property
+ def sov(self):
+ return self._sov_id
+
+ @property
+ def eov(self):
+ return self._eov_id
+
+ @property
+ def soi(self):
+ return self._soi_id
+
+ @property
+ def eoi(self):
+ return self._eoi_id
+
+ @property
+ def s_local(self):
+ return self._s_local_id
+
+ @property
+ def e_local(self):
+ return self._e_local_id
+
+ @property
+ def s_global(self):
+ return self._s_global_id
+
+ @property
+ def e_global(self):
+ return self._e_global_id
+
+ @property
+ def stage_1(self):
+ return self._stage_1_id
+
+ @property
+ def stage_2(self):
+ return self._stage_2_id
diff --git a/data_pipeline/lyrics_gene/__pycache__/filter_all_cn.cpython-311.pyc b/data_pipeline/lyrics_gene/__pycache__/filter_all_cn.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5dcd89a9554e4035f6da0fc66311a5a1e550ac5f
Binary files /dev/null and b/data_pipeline/lyrics_gene/__pycache__/filter_all_cn.cpython-311.pyc differ
diff --git a/data_pipeline/lyrics_gene/__pycache__/filter_all_en.cpython-311.pyc b/data_pipeline/lyrics_gene/__pycache__/filter_all_en.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2ce724911bfc1e687802010ebc4caa2e1c1e84bc
Binary files /dev/null and b/data_pipeline/lyrics_gene/__pycache__/filter_all_en.cpython-311.pyc differ
diff --git a/data_pipeline/lyrics_gene/__pycache__/gen_lyrics_cn.cpython-311.pyc b/data_pipeline/lyrics_gene/__pycache__/gen_lyrics_cn.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4d7f07422890c67f04967db4d86a1853b2453d6f
Binary files /dev/null and b/data_pipeline/lyrics_gene/__pycache__/gen_lyrics_cn.cpython-311.pyc differ
diff --git a/data_pipeline/lyrics_gene/filter_all_cn.py b/data_pipeline/lyrics_gene/filter_all_cn.py
new file mode 100644
index 0000000000000000000000000000000000000000..aab497d91f0d8676c7bea309cf23a4066a889d5a
--- /dev/null
+++ b/data_pipeline/lyrics_gene/filter_all_cn.py
@@ -0,0 +1,272 @@
+import json
+import os
+import re
+import torch
+from sentence_transformers import SentenceTransformer, util
+from tqdm import tqdm
+
+# os.environ["HTTP_PROXY"] = "http://127.0.0.1:7890"
+# os.environ["HTTPS_PROXY"] = "http://127.0.0.1:7890"
+
+# Use HuggingFace mirror site
+os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
+# Set HuggingFace cache directory so SentenceTransformer can recognize downloaded models
+os.environ["HF_HOME"] = os.path.expanduser("~/.cache/huggingface")
+
+
+def clean_lyrics(lyrics):
+ """
+ Clean lyrics by removing segment tags, timestamp tags, and newlines, keeping only pure lyric text
+
+ Args:
+ lyrics: Raw lyrics text (contains segment tags like [Verse 1], timestamps like [00:07.00], and newlines)
+
+ Returns:
+ Cleaned lyrics text (plain text, no tags and newlines)
+ """
+ # Use regex to remove all [tag] format content (including segment tags and timestamps)
+ # Pattern matching [any content]
+ cleaned = re.sub(r'\[.*?\]', '', lyrics)
+
+ # Remove all newlines, replace with spaces
+ cleaned = cleaned.replace('\n', ' ')
+
+ # Remove extra spaces (replace multiple consecutive spaces with single space)
+ cleaned = re.sub(r'\s+', ' ', cleaned)
+
+ # Remove leading and trailing spaces
+ cleaned = cleaned.strip()
+
+ return cleaned
+
+
+def load_music_data(input_file, max_count=None):
+ """
+ Load music data from jsonl file
+
+ Args:
+ input_file: Path to input jsonl file
+ max_count: Maximum number to read, None means read all
+
+ Returns:
+ List of music data
+ """
+ music_list = []
+ print(f"Loading music data: {input_file}")
+ if max_count:
+ print(f"Limiting to first {max_count} songs")
+
+ with open(input_file, 'r', encoding='utf-8') as f:
+ for line in tqdm(f, desc="Loading data"):
+ try:
+ data = json.loads(line.strip())
+ # Ensure required fields are present
+ if 'description' in data and 'lyrics' in data:
+ music_list.append(data)
+ # If reached maximum count, stop reading
+ if max_count and len(music_list) >= max_count:
+ break
+ except json.JSONDecodeError:
+ continue
+ print(f"Successfully loaded {len(music_list)} songs")
+ return music_list
+
+
+def deduplicate_music(music_list, texts, model, threshold=0.90, output_file=None, save_interval=10000, matrix_save_dir=None):
+ """
+ Deduplicate music data based on text similarity
+
+ Args:
+ music_list: List of music data
+ texts: List of texts for comparison
+ model: SentenceTransformer model
+ threshold: Similarity threshold
+ output_file: Output file path, if provided supports incremental saving
+ save_interval: Save every N valid songs processed
+ matrix_save_dir: Directory to save matrices
+
+ Returns:
+ Deduplicated music data list
+ """
+ print(f"Computing embeddings for {len(texts)} texts...")
+ embeddings = model.encode(texts, convert_to_tensor=True, show_progress_bar=True)
+
+ print("Computing similarity matrix...")
+ cos_scores = util.pytorch_cos_sim(embeddings, embeddings)
+
+ # Save similarity matrix and embeddings
+ if matrix_save_dir:
+ os.makedirs(matrix_save_dir, exist_ok=True)
+ embeddings_path = os.path.join(matrix_save_dir, 'embeddings.pt')
+ cos_scores_path = os.path.join(matrix_save_dir, 'cos_scores.pt')
+ print(f"Saving embeddings to: {embeddings_path}")
+ torch.save(embeddings.cpu(), embeddings_path)
+ print(f"Saving similarity matrix to: {cos_scores_path}")
+ torch.save(cos_scores.cpu(), cos_scores_path)
+ print("Matrix saving complete!")
+
+ print(f"Deduplicating (threshold: {threshold})...")
+ keep_idx = []
+ removed = set()
+
+ # If output file provided, open in write mode
+ f = None
+ if output_file:
+ os.makedirs(os.path.dirname(output_file), exist_ok=True)
+ f = open(output_file, 'w', encoding='utf-8')
+
+ saved_count = 0
+
+ for i in tqdm(range(len(music_list)), desc="Deduplication progress"):
+ if i in removed:
+ continue
+ keep_idx.append(i)
+
+ # If incremental saving enabled, save every save_interval songs
+ if f and len(keep_idx) - saved_count >= save_interval:
+ # Save all valid songs from saved_count to current
+ for idx in range(saved_count, len(keep_idx)):
+ music = music_list[keep_idx[idx]]
+ f.write(json.dumps(music, ensure_ascii=False) + '\n')
+ f.flush() # Ensure write to disk
+ saved_count = len(keep_idx)
+ print(f"Saved {saved_count} valid songs to file")
+
+ for j in range(i+1, len(music_list)):
+ if cos_scores[i][j] > threshold:
+ removed.add(j)
+
+ # Save remaining valid songs
+ if f:
+ for idx in range(saved_count, len(keep_idx)):
+ music = music_list[keep_idx[idx]]
+ f.write(json.dumps(music, ensure_ascii=False) + '\n')
+ f.close()
+ print(f"Saved all {len(keep_idx)} valid songs to file")
+
+ deduped_music_list = [music_list[i] for i in keep_idx]
+ print(f"Deduplication complete: {len(music_list)} -> {len(deduped_music_list)} (removed {len(removed)} songs)")
+
+ return deduped_music_list
+
+
+def dedup_by_description_and_lyrics(input_file, output_file, threshold=0.95, max_count=None, device='cuda:1', save_interval=10000, matrix_save_dir=None):
+ """
+ Method 2: Deduplicate based on description + lyrics
+
+ Args:
+ input_file: Path to input jsonl file
+ output_file: Path to output jsonl file
+ threshold: Similarity threshold
+ max_count: Maximum number to read, None means read all
+ device: Device to use, default cuda:1 (GPU1)
+ save_interval: Save every N valid songs processed, default 10000
+ matrix_save_dir: Directory to save matrices, if provided saves embeddings and similarity matrix
+ """
+ print("\n========== Method 2: Deduplicate based on description + lyrics ==========")
+ print(f"Using device: {device}")
+
+ # Load data
+ music_list = load_music_data(input_file, max_count=max_count)
+
+ # Extract combined text from description + lyrics
+ combined_texts = []
+ for music in music_list:
+ description = music.get('description', '')
+ lyrics = music.get('lyrics', '')
+ # Clean lyrics, remove structure tags
+ cleaned_lyrics = clean_lyrics(lyrics)
+ # Concatenate description and cleaned lyrics (separated by delimiter)
+ combined_text = f"{description} [SEP] {cleaned_lyrics}"
+ combined_texts.append(combined_text)
+
+ # Load Chinese model and specify device
+ # Check if local model exists, if so use local path directly to avoid re-downloading
+ hf_home = os.environ.get("HF_HOME", os.path.expanduser("~/.cache/huggingface"))
+ model_cache_dir = os.path.join(hf_home, "hub", "models--shibing624--text2vec-bge-large-chinese", "snapshots")
+
+ # Find model snapshot directory
+ local_model_path = None
+ if os.path.exists(model_cache_dir):
+ snapshots = [d for d in os.listdir(model_cache_dir) if os.path.isdir(os.path.join(model_cache_dir, d))]
+ if snapshots:
+ # Use latest snapshot (usually unique)
+ local_model_path = os.path.join(model_cache_dir, snapshots[0])
+ if os.path.exists(os.path.join(local_model_path, "config.json")):
+ print(f"Detected local model, using path: {local_model_path}")
+ model = SentenceTransformer(local_model_path, device=device)
+ else:
+ local_model_path = None
+
+ if local_model_path is None:
+ print(f"Loading model to {device}...")
+ model = SentenceTransformer('shibing624/text2vec-bge-large-chinese', device=device)
+
+ # Deduplicate (supports incremental saving)
+ deduped_music_list = deduplicate_music(
+ music_list,
+ combined_texts,
+ model,
+ threshold,
+ output_file=output_file,
+ save_interval=save_interval,
+ matrix_save_dir=matrix_save_dir
+ )
+
+ # Deduplication function already handled saving, just print info here
+ if output_file:
+ print(f"✓ Save complete! Remaining {len(deduped_music_list)} songs after deduplication\n")
+ else:
+ # If no output file provided, save once here (compatibility with old code)
+ print(f"Saving results to: {output_file}")
+ os.makedirs(os.path.dirname(output_file), exist_ok=True)
+ with open(output_file, 'w', encoding='utf-8') as f:
+ for music in deduped_music_list:
+ f.write(json.dumps(music, ensure_ascii=False) + '\n')
+ print(f"✓ Save complete! Remaining {len(deduped_music_list)} songs after deduplication\n")
+
+ return deduped_music_list
+
+
+if __name__ == '__main__':
+ # Input file path
+ input_file = 'lrc_4w_single_pro_des.jsonl'
+
+ # Output file path
+ output_file = 'filter_all_4w.jsonl'
+
+ # Matrix save directory
+ matrix_save_dir = 'generate_lrc'
+
+ # Set maximum read count (for testing, None means read all)
+ max_count = None # Test first 5 songs
+
+ # Deduplicate based on description + lyrics
+ print("\nDeduplicating based on description + lyrics")
+ dedup_by_description_and_lyrics(
+ input_file,
+ output_file,
+ threshold=0.90,
+ max_count=max_count,
+ device='cuda:7',
+ save_interval=10000, # Save every 10000 valid songs
+ matrix_save_dir=matrix_save_dir # Save similarity matrix
+ )
+ print(f"\nComplete! Results saved to: {output_file}")
+ print(f"Similarity matrix saved to: {matrix_save_dir}")
+ # Test lyrics cleaning effect
+ # print("\n========== Test Lyrics Cleaning Effect ==========")
+ # music_list = load_music_data(input_file, max_count=max_count)
+
+ # print("\n" + "="*80)
+ # for i, music in enumerate(music_list, 1):
+ # print(f"\n[Song {i}]")
+ # print(f"Description: {music.get('description', '')}")
+ # print("\n--- Original Lyrics ---")
+ # original_lyrics = music.get('lyrics', '')
+ # print(original_lyrics[:500] + "..." if len(original_lyrics) > 500 else original_lyrics)
+ # print("\n--- Cleaned Lyrics ---")
+ # cleaned_lyrics = clean_lyrics(original_lyrics)
+ # print(cleaned_lyrics)
+ # print("\n" + "-"*80)
+
diff --git a/data_pipeline/lyrics_gene/filter_all_en.py b/data_pipeline/lyrics_gene/filter_all_en.py
new file mode 100644
index 0000000000000000000000000000000000000000..dd3a70e920f4b37217c6e75d5aef944f4542586a
--- /dev/null
+++ b/data_pipeline/lyrics_gene/filter_all_en.py
@@ -0,0 +1,256 @@
+import json
+import os
+import re
+import torch
+from sentence_transformers import SentenceTransformer, util
+from tqdm import tqdm
+
+# os.environ["HTTP_PROXY"] = "http://127.0.0.1:7890"
+# os.environ["HTTPS_PROXY"] = "http://127.0.0.1:7890"
+
+# Use HuggingFace mirror site
+os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
+# Set HuggingFace cache directory so SentenceTransformer can recognize downloaded models
+os.environ["HF_HOME"] = os.path.expanduser("~/.cache/huggingface")
+
+MODEL_EN = "sentence-transformers/all-MiniLM-L6-v2"
+
+
+def clean_lyrics(lyrics):
+ """
+ Clean lyrics by removing segment tags, timestamp tags, and newlines, keeping only pure lyric text
+
+ Args:
+ lyrics: Raw lyrics text (contains segment tags like [Verse 1], timestamps like [00:07.00], and newlines)
+
+ Returns:
+ Cleaned lyrics text (plain text, no tags and newlines)
+ """
+ # Use regex to remove all [tag] format content (including segment tags and timestamps)
+ # Pattern matching [any content]
+ cleaned = re.sub(r'\[.*?\]', '', lyrics)
+
+ # Remove all newlines, replace with spaces
+ cleaned = cleaned.replace('\n', ' ')
+
+ # Remove extra spaces (replace multiple consecutive spaces with single space)
+ cleaned = re.sub(r'\s+', ' ', cleaned)
+
+ # Remove leading and trailing spaces
+ cleaned = cleaned.strip()
+
+ return cleaned
+
+
+def load_music_data(input_file, max_count=None):
+ """
+ Load music data from jsonl file
+
+ Args:
+ input_file: Path to input jsonl file
+ max_count: Maximum number to read, None means read all
+
+ Returns:
+ List of music data
+ """
+ music_list = []
+ print(f"Loading music data: {input_file}")
+ if max_count:
+ print(f"Limiting to first {max_count} songs")
+
+ with open(input_file, 'r', encoding='utf-8') as f:
+ for line in tqdm(f, desc="Loading data"):
+ try:
+ data = json.loads(line.strip())
+ # Ensure required fields are present
+ if 'description' in data and 'lyrics' in data:
+ music_list.append(data)
+ # If reached maximum count, stop reading
+ if max_count and len(music_list) >= max_count:
+ break
+ except json.JSONDecodeError:
+ continue
+ print(f"Successfully loaded {len(music_list)} songs")
+ return music_list
+
+
+def deduplicate_music(music_list, texts, model, threshold=0.90, output_file=None, save_interval=10000, matrix_save_dir=None):
+ """
+ Deduplicate music data based on text similarity
+
+ Args:
+ music_list: List of music data
+ texts: List of texts for comparison
+ model: SentenceTransformer model
+ threshold: Similarity threshold
+ output_file: Output file path, if provided supports incremental saving
+ save_interval: Save every N valid songs processed
+ matrix_save_dir: Directory to save matrices
+
+ Returns:
+ Deduplicated music data list
+ """
+ print(f"Computing embeddings for {len(texts)} texts...")
+ embeddings = model.encode(texts, convert_to_tensor=True, show_progress_bar=True)
+
+ print("Computing similarity matrix...")
+ cos_scores = util.pytorch_cos_sim(embeddings, embeddings)
+
+ # Save similarity matrix and embeddings
+ if matrix_save_dir:
+ os.makedirs(matrix_save_dir, exist_ok=True)
+ embeddings_path = os.path.join(matrix_save_dir, 'embeddings.pt')
+ cos_scores_path = os.path.join(matrix_save_dir, 'cos_scores.pt')
+ print(f"Saving embeddings to: {embeddings_path}")
+ torch.save(embeddings.cpu(), embeddings_path)
+ print(f"Saving similarity matrix to: {cos_scores_path}")
+ torch.save(cos_scores.cpu(), cos_scores_path)
+ print("Matrix saving complete!")
+
+ print(f"Deduplicating (threshold: {threshold})...")
+ keep_idx = []
+ removed = set()
+
+ # If output file provided, open in write mode
+ f = None
+ if output_file:
+ os.makedirs(os.path.dirname(output_file), exist_ok=True)
+ f = open(output_file, 'w', encoding='utf-8')
+
+ saved_count = 0
+
+ for i in tqdm(range(len(music_list)), desc="Deduplication progress"):
+ if i in removed:
+ continue
+ keep_idx.append(i)
+
+ # If incremental saving enabled, save every save_interval songs
+ if f and len(keep_idx) - saved_count >= save_interval:
+ # Save all valid songs from saved_count to current
+ for idx in range(saved_count, len(keep_idx)):
+ music = music_list[keep_idx[idx]]
+ f.write(json.dumps(music, ensure_ascii=False) + '\n')
+ f.flush() # Ensure write to disk
+ saved_count = len(keep_idx)
+ print(f"Saved {saved_count} valid songs to file")
+
+ for j in range(i+1, len(music_list)):
+ if cos_scores[i][j] > threshold:
+ removed.add(j)
+
+ # Save remaining valid songs
+ if f:
+ for idx in range(saved_count, len(keep_idx)):
+ music = music_list[keep_idx[idx]]
+ f.write(json.dumps(music, ensure_ascii=False) + '\n')
+ f.close()
+ print(f"Saved all {len(keep_idx)} valid songs to file")
+
+ deduped_music_list = [music_list[i] for i in keep_idx]
+ print(f"Deduplication complete: {len(music_list)} -> {len(deduped_music_list)} (removed {len(removed)} songs)")
+
+ return deduped_music_list
+
+
+def dedup_by_description_and_lyrics(input_file, output_file, threshold=0.95, max_count=None, device='cuda:1', save_interval=10000, matrix_save_dir=None):
+ """
+ Method 2: Deduplicate based on description + lyrics
+
+ Args:
+ input_file: Path to input jsonl file
+ output_file: Path to output jsonl file
+ threshold: Similarity threshold
+ max_count: Maximum number to read, None means read all
+ device: Device to use, default cuda:1 (GPU1)
+ save_interval: Save every N valid songs processed, default 10000
+ matrix_save_dir: Directory to save matrices, if provided saves embeddings and similarity matrix
+ """
+ print("\n========== Method 2: Deduplicate based on description + lyrics ==========")
+ print(f"Using device: {device}")
+
+ # Load data
+ music_list = load_music_data(input_file, max_count=max_count)
+
+ # Extract combined text from description + lyrics
+ combined_texts = []
+ for music in music_list:
+ description = music.get('description', '')
+ lyrics = music.get('lyrics', '')
+ # Clean lyrics, remove structure tags
+ cleaned_lyrics = clean_lyrics(lyrics)
+ # Concatenate description and cleaned lyrics (separated by delimiter)
+ combined_text = f"{description} [SEP] {cleaned_lyrics}"
+ combined_texts.append(combined_text)
+
+ # Load English model
+ print(f"Loading English model {MODEL_EN} to {device}...")
+ model = SentenceTransformer(MODEL_EN, device=device)
+
+ # Deduplicate (supports incremental saving)
+ deduped_music_list = deduplicate_music(
+ music_list,
+ combined_texts,
+ model,
+ threshold,
+ output_file=output_file,
+ save_interval=save_interval,
+ matrix_save_dir=matrix_save_dir
+ )
+
+ # Deduplication function already handled saving, just print info here
+ if output_file:
+ print(f"✓ Save complete! Remaining {len(deduped_music_list)} songs after deduplication\n")
+ else:
+ # If no output file provided, save once here (compatibility with old code)
+ print(f"Saving results to: {output_file}")
+ os.makedirs(os.path.dirname(output_file), exist_ok=True)
+ with open(output_file, 'w', encoding='utf-8') as f:
+ for music in deduped_music_list:
+ f.write(json.dumps(music, ensure_ascii=False) + '\n')
+ print(f"✓ Save complete! Remaining {len(deduped_music_list)} songs after deduplication\n")
+
+ return deduped_music_list
+
+
+if __name__ == '__main__':
+ # Input file path
+ input_file = 'en_lrc_4w_single_pro_des.jsonl'
+
+ # Output file path
+ output_file = 'filter_en_single_4w(0.9).jsonl'
+
+ # Matrix save directory
+ matrix_save_dir = 'en_matrix'
+
+ # Set maximum read count (for testing, None means read all)
+ max_count = None # Test first 5 songs
+
+ # Deduplicate based on description + lyrics
+ print("\nDeduplicating based on description + lyrics")
+ dedup_by_description_and_lyrics(
+ input_file,
+ output_file,
+ threshold=0.90,
+ max_count=max_count,
+ device='cuda:7',
+ save_interval=10000, # Save every 10000 valid songs
+ matrix_save_dir=matrix_save_dir # Save similarity matrix
+ )
+ print(f"\nComplete! Results saved to: {output_file}")
+ print(f"Similarity matrix saved to: {matrix_save_dir}")
+ # Test lyrics cleaning effect
+ # print("\n========== Test Lyrics Cleaning Effect ==========")
+ # music_list = load_music_data(input_file, max_count=max_count)
+
+ # print("\n" + "="*80)
+ # for i, music in enumerate(music_list, 1):
+ # print(f"\n[Song {i}]")
+ # print(f"Description: {music.get('description', '')}")
+ # print("\n--- Original Lyrics ---")
+ # original_lyrics = music.get('lyrics', '')
+ # print(original_lyrics[:500] + "..." if len(original_lyrics) > 500 else original_lyrics)
+ # print("\n--- Cleaned Lyrics ---")
+ # cleaned_lyrics = clean_lyrics(original_lyrics)
+ # print(cleaned_lyrics)
+ # print("\n" + "-"*80)
+
diff --git a/data_pipeline/lyrics_gene/gen_lyrics_cn.py b/data_pipeline/lyrics_gene/gen_lyrics_cn.py
new file mode 100644
index 0000000000000000000000000000000000000000..22b31a466492355376e580a8e828d8bd9cea26b7
--- /dev/null
+++ b/data_pipeline/lyrics_gene/gen_lyrics_cn.py
@@ -0,0 +1,568 @@
+import os
+import json
+import time
+import random
+import re
+from openai import OpenAI
+from tqdm import tqdm
+import threading
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+# Set environment variables
+# Note: Set these environment variables before running the script
+# export OPENAI_API_KEY="your-api-key"
+# export OPENAI_BASE_URL="https://api.openai.com/v1" # or your custom API URL
+if not os.environ.get("OPENAI_API_KEY"):
+ os.environ["OPENAI_API_KEY"] = "" # Replace with your API key or set via environment variable
+if not os.environ.get("OPENAI_BASE_URL"):
+ os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1" # Replace with API URL or set via environment variable
+
+# Initialize client
+client = OpenAI()
+
+
+def _extract_lyrics_timestamps(lyrics_text):
+ """
+ Extract timestamps from lyrics and convert to seconds
+ Args:
+ lyrics_text: Lyrics string
+ Returns:
+ List[float]: Timestamps in order (seconds)
+ """
+ if not isinstance(lyrics_text, str):
+ return []
+ pattern = re.compile(r'\[(\d{2}):(\d{2})(?:\.(\d{2}))?\]')
+ timestamps = []
+ for match in pattern.finditer(lyrics_text):
+ minutes = int(match.group(1))
+ seconds = int(match.group(2))
+ fraction = match.group(3)
+ total_seconds = minutes * 60 + seconds
+ if fraction is not None:
+ divisor = 100 if len(fraction) == 2 else 10 ** len(fraction)
+ total_seconds += int(fraction) / divisor
+ timestamps.append(total_seconds)
+ return timestamps
+
+
+def _validate_timestamps(lyrics_text, min_last_timestamp=170, max_interval=35):
+ """
+ Validate if lyrics timestamps meet requirements
+ Args:
+ lyrics_text: Lyrics string
+ min_last_timestamp: Minimum value of last timestamp (seconds)
+ max_interval: Maximum interval between last two timestamps (seconds)
+ Returns:
+ bool: Whether validation passes
+ """
+ timestamps = _extract_lyrics_timestamps(lyrics_text)
+ if len(timestamps) < 2:
+ print("Validation failed: Timestamp count less than 2")
+ return False
+ last = timestamps[-1]
+ second_last = timestamps[-2]
+ if last < min_last_timestamp:
+ print(f"Validation failed: Last timestamp {last:.2f}s is less than {min_last_timestamp}s")
+ return False
+ if last - second_last > max_interval:
+ print(f"Validation failed: Interval between last two timestamps {last - second_last:.2f}s is greater than {max_interval}s")
+ return False
+ return True
+
+
+def chat_gpt(text, model='gpt-4o-mini'):
+ while True:
+ try:
+ # Call OpenAI chat completions API
+ completion = client.chat.completions.create(
+ model=model, # Use GPT-4o-mini model
+ messages=[
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": text}
+ ]
+ )
+ # Get reply content
+ if getattr(completion.choices[0].message, 'content', None):
+ content = completion.choices[0].message.content.strip()
+ return content
+ else:
+ print('error_wait_2s')
+ except Exception as e:
+ print(f"Error: {e}")
+ time.sleep(2)
+
+
+def chat_gpt_call(text, model='gpt-4o-mini'):
+ # Call OpenAI chat completions API
+ completion = client.chat.completions.create(
+ model=model, # Use GPT-4o-mini model
+ messages=[
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": text}
+ ]
+ )
+ # Get reply content
+ if getattr(completion.choices[0].message, 'content', None):
+ content = completion.choices[0].message.content.strip()
+ return content
+ else:
+ print('error_wait_2s')
+
+
+def generate_music_descriptions(all_music_data, index_pool, output_file, file_lock, sample_size=20, model='gpt-4o-mini', max_retries=0):
+ """
+ Read music data file, randomly sample, call GPT to generate new music descriptions and lyrics
+
+ Args:
+ all_music_data: All music data list
+ index_pool: Index pool object (thread-safe)
+ output_file: Output jsonl file path
+ file_lock: File write lock
+ sample_size: Number of random samples
+ model: Model name to use
+ max_retries: Maximum retry count
+
+ Returns:
+ (used_indices, success_count): List of used indices and count of successful generations
+ """
+ # duration_ranges = [
+ # ("3.00", "3.15", 50), ("3.15", "3.30", 50), ("3.30", "3.45", 60),
+ # ("3.45", "4.00", 60),
+ # ("4.00", "4.15", 70), ("4.15", "4.30", 70), ("4.30", "4.45", 70)
+ # ]
+ duration_ranges = [
+ ("3.00", "3.15", 60), ("3.15", "3.30", 70), ("3.30", "3.45", 80),("3.45", "4.00", 90),
+ ("4.00", "4.15", 100), ("4.15", "4.30", 100), ("4.30", "4.45", 100)
+ ]
+ selected_range = random.choice(duration_ranges)
+ require_length = selected_range[2]
+
+ # Directly convert to timestamp format (strictly corresponds to left and right ends of tuple)
+ start_timestamp = f"[{selected_range[0].replace('.', ':')}.00]"
+ end_timestamp = f"[{selected_range[1].replace('.', ':')}.00]"
+
+ # Generate duration description
+ start_duration = f"{selected_range[0].replace('.', '分')}秒"
+ end_duration = f"{selected_range[1].replace('.', '分')}秒"
+
+ # Generate random timestamps for examples (randomly generated within time range)
+ # Parse time string to minutes and seconds
+ start_parts = selected_range[0].split('.')
+ end_parts = selected_range[1].split('.')
+
+ start_minutes = int(start_parts[0])
+ start_seconds = int(start_parts[1])
+ end_minutes = int(end_parts[0])
+ end_seconds = int(end_parts[1])
+
+ # Convert to total seconds
+ start_total_seconds = start_minutes * 60 + start_seconds
+ end_total_seconds = end_minutes * 60 + end_seconds
+
+ # Randomly generate within range
+ example1_seconds = random.randint(start_total_seconds, end_total_seconds)
+ example2_seconds = random.randint(start_total_seconds, end_total_seconds)
+
+ example1_minutes = example1_seconds // 60
+ example1_secs = example1_seconds % 60
+ example2_minutes = example2_seconds // 60
+ example2_secs = example2_seconds % 60
+
+ example1_timestamp = f"[{example1_minutes:02d}:{example1_secs:02d}.00]"
+ example2_timestamp = f"[{example2_minutes:02d}:{example2_secs:02d}.00]"
+
+ # Get sample indices from index pool (thread-safe)
+ selected_indices = index_pool.get_indices(sample_size)
+ if not selected_indices:
+ return [], 0
+
+ sample_data = [all_music_data[i] for i in selected_indices]
+
+ # Extract all unique styles
+ styles = []
+ for data in sample_data:
+ style = data.get('style', '')
+ if style and style not in styles:
+ styles.append(style)
+
+ styles_text = "、".join(styles)
+
+ # Build example text - include all sampled data (excluding style)
+ examples = []
+ for i, data in enumerate(sample_data, 1):
+ lyrics_text = " ".join(data.get('lyrics', [])) if isinstance(data.get('lyrics'), list) else data.get('lyrics', '')
+ description = data.get('description', '')
+ examples.append(f"示例{i}:\ndescription: {description}\nlyrics: {lyrics_text}")
+
+ examples_text = "\n\n".join(examples)
+
+ prompt = f"""生成2首完整的歌曲,每首歌必须满足以下硬性指标:
+- 严禁生成小于{require_length}行的歌词!
+- 每首歌的歌词行数必须严格大于{require_length}行,这是硬性要求!
+- 最后一句时间戳必须在{start_timestamp}到{end_timestamp}之间
+- 两首歌的时长、行数必须有差异,严禁最后的时间戳均相同
+- 相邻歌词行的时间戳间隔不得超过10秒!必须保证时间戳连续自然递进
+- 严禁出现如"[03:25.00]在心中[04:25.00]最后一行歌词"的生硬间隔,严禁超过10s的间隔
+如果生成的歌曲不满足以上任意一项,则视为不合格,请重新生成。
+请生成2首新的、具有多样性的音乐描述和LRC格式歌词,语言为中文。
+
+
+创作要求:
+1.风格与流派需要确保多样性。
+2.Description标签化要求(必须严格遵守):
+ description字段必须使用结构化的标签格式,包括以下标签,用逗号分隔:
+ - 音乐风格标签
+ - 音乐流派标签
+ - 乐器标签
+ - 情感基调标签
+ - 氛围标签
+ - 演唱方式和人声标签,仅限男声或女生二选一,单人独唱
+ 注意:每个标签简洁明了,多个同类标签可用斜杠分隔(如"钢琴/小提琴")
+3.歌词创造力:lyrics应该具有深度和艺术性:
+ - 主题可以涉及爱情、人生、社会、自然、哲思、梦想、回忆等各个方面
+ - 运用丰富的文学手法:比喻、意象、对比、排比等
+ - 情感真挚,注重韵律和节奏感
+ - 可以是叙事性、抒情性或意识流风格
+4.歌词结构和长度要求(必须严格遵守):
+ - lyrics必须按照以下结构组织,并使用段落标签标注每个部分
+ - 结构顺序必须严格遵循该顺序,共8个段落标签:[Verse 1]主歌1 → [Pre-Chorus]预副歌 → [Chorus]副歌 → [Verse 2]主歌2 → [Pre-Chorus]预副歌 → [Chorus]副歌 → [Bridge]桥段 → [Chorus (Outro)]副歌(结尾)
+ - 一首歌词的段落标签只有8个,即[Verse 1]和[Verse 2]只出现一次,[Pre-Chorus]和[Chorus]各出现两次,[Bridge]和[Chorus (Outro)]各出现一次,禁止额外添加或重复更多的段落标签
+ - 每个段落标签(如[Verse 1]、[Chorus]等)必须独占一行,后面紧跟该段落的LRC格式歌词
+ - 段落之间用空行分隔
+ - **总行数要求**:整首歌必须包含至少{require_length}行带时间戳的歌词(不包括段落标签行和空行)
+5.LRC格式强制规则(必须严格遵守):
+ - 每行歌词格式必须为 `[mm:ss.xx]歌词内容`,时间戳与歌词间无空格,歌词内容需完整连贯
+ - **每一行只能包含一小句歌词**,遇到逗号、句号等标点符号时必须换行。
+ - **严禁将多句歌词合并在同一行**
+ - 时间戳需自然分布,**第一句歌词起始时间不得为 [00:00.00]**,需考虑前奏空白(建议从[00:05.00]到[00:15.00]之间开始)
+ - 时间戳间隔要求多样性:每首歌内部的时间戳间隔必须多样化,多采用小数点数间隔,严禁使用固定间隔:
+ * 同一首歌内必须包含多种不同的间隔,不要所有句子都使用相同间隔(如不要全部都是4秒间隔)
+ * 根据歌词内容的情感强度和音乐节拍来动态调整间隔
+ * 相邻歌词行的间隔应该有所变化,体现音乐的节奏起伏
+ - 时间戳分配应根据歌曲的风格、情感、节奏来合理推测,而非机械地按照歌词长度分配
+ - 每行歌词长度应自然变化,切勿长度一致
+ - **歌曲总时长必须达到{start_duration}到{end_duration}(即最后一句时间戳必须在{start_timestamp}到{end_timestamp}之间)这是硬性要求!**
+6.歌词长度要求:lyrics字段的歌词行数必须大于{require_length}行,若生成长度过短请重新生成。
+7.独特性和原创性:每首作品都应该是独一无二的,避免简单重复示例的内容。
+8.格式要求:
+ - 直接返回JSON数组格式,包含2个歌曲对象,每个对象只有description和lyrics两个字段
+ - description字段:必须是标签格式,不是叙述性文本
+ - lyrics字段:带段落标签的LRC格式字符串
+ - 严禁在JSON中插入任何额外的符号、标记、注释或说明文字
+
+LRC格式示例(带段落标签):
+[Verse 1]
+[00:08.00]第一句歌词
+[00:12.50]第二句歌词
+[00:17.20]第三句歌词
+
+[Pre-Chorus]
+[00:22.00]预副歌歌词
+[00:26.50]预副歌歌词
+
+[Chorus]
+[00:31.00]副歌歌词
+[00:35.50]副歌歌词
+
+负面示例(禁止出现):
+- 错误:[01:30.00](钢琴间奏) - 禁止在时间戳后加括号注释
+- 错误:[00:00.00]开始的歌词 - 第一句不能从00:00.00开始
+- 错误: [00:05.00]在那片熟悉的田野,阳光洒满金色的麦穗 - 严禁多句歌词放在同一行
+
+现在,请充分发挥你的创造力,生成2首全新的、完整的音乐描述和LRC格式歌词作品。
+特别提醒:每首歌必须是完整歌曲,不要缩写或省略!必须包含完整的8个段落(Verse 1, Pre-Chorus, Chorus, Verse 2, Pre-Chorus, Chorus, Bridge, Chorus Outro),严格确保大于{require_length}行歌词。
+
+直接返回JSON数组格式:
+[
+ {{"description": "...", "lyrics": "..."}},
+ {{"description": "...", "lyrics": "..."}}
+]"""
+ # Try to generate with retry mechanism
+ for attempt in range(max_retries + 1):
+ try:
+ # Call OpenAI API
+ completion = client.chat.completions.create(
+ model=model,
+ messages=[
+ {"role": "system", "content": f"You are a creative music lyricist and composer. Please generate diverse and creative music tag-based descriptions and LRC format lyrics with song structure tags. CRITICAL REQUIREMENTS: 1) Description must be structured tags separated by commas, NOT narrative text. 2) Return ONLY pure, valid JSON format without any extra symbols, markers, or comments. 3) Each song must include structure tags like [Verse 1], [Chorus], [Bridge], etc., followed by LRC format lyrics [mm:ss.xx]lyric_content. 4) MANDATORY: Each song must have MORE than {require_length} lines of lyrics with timestamps. "},
+ {"role": "user", "content": prompt}
+ ],
+ n=1,
+ temperature=1.0,
+ )
+ #print(prompt)
+ # Extract all responses
+ results = []
+ filtered_count = 0
+ last_content = None
+
+ for i, choice in enumerate(completion.choices, 1):
+ try:
+ content = choice.message.content.strip()
+ last_content = content
+ print(f"\n=== GPT Response {i} ===")
+ print(content)
+ print("=" * 50)
+ # Try to extract JSON content
+ if "```json" in content:
+ content = content.split("```json")[1].split("```")[0].strip()
+ elif "```" in content:
+ content = content.split("```")[1].split("```")[0].strip()
+
+ # Clean trailing commas in JSON (extra commas)
+ # Remove commas after last element of object/array
+ content = re.sub(r',(\s*[}\]])', r'\1', content)
+
+ # Parse JSON array
+ result_array = json.loads(content)
+
+ # Ensure it's a list
+ if isinstance(result_array, list):
+ # Validate each object in array
+ for song in result_array:
+ if isinstance(song, dict) and 'description' in song and 'lyrics' in song:
+ if _validate_timestamps(song.get('lyrics', '')):
+ results.append(song)
+ else:
+ filtered_count += 1
+ # If returned a single object (compatibility with old format)
+ elif isinstance(result_array, dict) and 'description' in result_array and 'lyrics' in result_array:
+ if _validate_timestamps(result_array.get('lyrics', '')):
+ results.append(result_array)
+ else:
+ filtered_count += 1
+
+ except json.JSONDecodeError:
+ continue
+
+ if filtered_count:
+ print(f"Total {filtered_count} songs filtered due to timestamp validation failure")
+
+ # Print parsing results
+ print(f"\nParsing complete, results length: {len(results)}")
+ print(f"Results content: {results}")
+ print(start_duration, end_duration,example1_timestamp,example2_timestamp,require_length)
+
+ # If parsed result length is not 5, write model response content to test.txt
+ if len(results) != 2:
+ print(f"Warning: Parsed result length is not 2, actual is {len(results)}, will write to test.txt")
+ with open('test.txt', 'w', encoding='utf-8') as f:
+ if last_content is not None:
+ f.write(last_content)
+ print("Written to test.txt file")
+
+ # Check if successfully generated 50 songs (10 responses * 5 each)
+ if len(results) >= 50:
+ # Append save results to file (use lock to ensure thread safety)
+ with file_lock:
+ with open(output_file, 'a', encoding='utf-8') as f:
+ for result in results[:50]: # Only save first 50 songs
+ f.write(json.dumps(result, ensure_ascii=False) + '\n')
+
+ return selected_indices, min(len(results), 50)
+ elif attempt < max_retries:
+ print(f"Only successfully parsed {len(results)}/50 songs, retrying...")
+ time.sleep(2)
+ else:
+ # Last attempt, save even if not 50 songs
+ if len(results) > 0:
+ with file_lock:
+ with open(output_file, 'a', encoding='utf-8') as f:
+ for result in results:
+ f.write(json.dumps(result, ensure_ascii=False) + '\n')
+ return selected_indices, len(results)
+
+ except Exception as e:
+ if attempt < max_retries:
+ print(f"Error occurred during generation: {e}, retrying...")
+ time.sleep(2)
+ else:
+ print(f"Generation failed: {e}")
+ return selected_indices, 0
+
+ return selected_indices, 0
+
+
+class IndexPool:
+ """Thread-safe index pool with automatic reset support"""
+
+ def __init__(self, total_size, selected_file):
+ self.total_size = total_size
+ self.selected_file = selected_file
+ self.lock = threading.Lock()
+ self.available_indices = []
+ self.selected_indices = set()
+ self.reset_count = 0 # Record reset count
+
+ # Load selected indices from file
+ self._load_selected_indices()
+ # Initialize available indices
+ self._reset_pool()
+
+ def _load_selected_indices(self):
+ """Load selected indices from file"""
+ if os.path.exists(self.selected_file):
+ with open(self.selected_file, 'r', encoding='utf-8') as f:
+ for line in f:
+ self.selected_indices.add(int(line.strip()))
+
+ def _reset_pool(self):
+ """Reset index pool"""
+ # Calculate available indices
+ self.available_indices = [i for i in range(self.total_size) if i not in self.selected_indices]
+ random.shuffle(self.available_indices) # Shuffle order
+
+ if len(self.available_indices) == 0:
+ # If no available indices, all have been used, reset selected_indices
+ self.reset_count += 1
+ print(f"\nIndex pool exhausted, resetting pool for the {self.reset_count}th time, re-selecting from {self.total_size} songs")
+ self.selected_indices.clear()
+ self.available_indices = list(range(self.total_size))
+ random.shuffle(self.available_indices)
+
+ def get_indices(self, count):
+ """
+ Thread-safe get specified number of indices
+
+ Args:
+ count: Number of indices needed
+
+ Returns:
+ List of selected indices
+ """
+ with self.lock:
+ # Check if pool needs to be reset
+ if len(self.available_indices) < count:
+ self._reset_pool()
+
+ # Get indices
+ selected = self.available_indices[:count]
+ self.available_indices = self.available_indices[count:]
+
+ # Add to selected set
+ for idx in selected:
+ self.selected_indices.add(idx)
+
+ # Write to file
+ with open(self.selected_file, 'a', encoding='utf-8') as f:
+ for idx in selected:
+ f.write(f"{idx}\n")
+
+ return selected
+
+ def get_stats(self):
+ """Get statistics"""
+ with self.lock:
+ return {
+ 'available': len(self.available_indices),
+ 'selected': len(self.selected_indices),
+ 'reset_count': self.reset_count
+ }
+
+
+def batch_generate_music(input_file, output_file, selected_file, total_songs=1000, sample_size=20, model='gpt-4o-mini', num_threads=10):
+ """
+ Batch generate music descriptions and lyrics (multi-threaded version)
+
+ Args:
+ input_file: Path to input jsonl file
+ output_file: Path to output jsonl file
+ selected_file: Path to file recording selected indices
+ total_songs: Total number of songs to generate
+ sample_size: Number of samples to extract each time
+ model: Model name to use
+ num_threads: Number of threads
+ """
+ # Load all music data
+ print("Loading music data...")
+ all_music_data = []
+ with open(input_file, 'r', encoding='utf-8') as f:
+ for line in f:
+ data = json.loads(line.strip())
+ all_music_data.append(data)
+ print(f"Loaded {len(all_music_data)} songs")
+
+ # Create thread-safe index pool
+ index_pool = IndexPool(len(all_music_data), selected_file)
+ stats = index_pool.get_stats()
+ print(f"Currently selected indices: {stats['selected']}")
+ print(f"Currently available indices: {stats['available']}")
+
+ # Calculate number of calls needed (5 songs per call)
+ num_iterations = (total_songs + 1) // 2 # Round up
+ print(f"Need to call {num_iterations} times to generate approximately {total_songs} songs (5 per call)")
+ print(f"Using {num_threads} threads for parallel processing\n")
+
+ # Create file write lock
+ file_lock = threading.Lock()
+
+ # Statistics
+ total_generated = 0
+ generated_lock = threading.Lock()
+
+ def worker_task(task_id):
+ """Worker thread task"""
+ try:
+ used_indices, success_count = generate_music_descriptions(
+ all_music_data=all_music_data,
+ index_pool=index_pool,
+ output_file=output_file,
+ file_lock=file_lock,
+ sample_size=sample_size,
+ model=model,
+ max_retries=0 # Retry
+ )
+ return success_count
+ except Exception as e:
+ print(f"Task {task_id} failed: {e}")
+ return 0
+
+ # Use thread pool and progress bar
+ with ThreadPoolExecutor(max_workers=num_threads) as executor:
+ # Submit all tasks
+ futures = {executor.submit(worker_task, i): i for i in range(num_iterations)}
+
+ # Use tqdm to show progress
+ with tqdm(total=num_iterations, desc="Generation progress", unit="batch") as pbar:
+ for future in as_completed(futures):
+ success_count = future.result()
+
+ with generated_lock:
+ total_generated += success_count
+
+ # Get current statistics
+ stats = index_pool.get_stats()
+
+ # Update progress bar
+ pbar.set_postfix({
+ 'Batch': f'{success_count}/5',
+ 'Total': total_generated,
+ 'Remaining': stats['available'],
+ 'Resets': stats['reset_count']
+ })
+ pbar.update(1)
+
+ # Final statistics
+ stats = index_pool.get_stats()
+ print(f"\nGeneration complete!")
+ print(f"Total generated: {total_generated} songs")
+ print(f"Used {stats['selected']} indices")
+ print(f"Remaining available indices: {stats['available']}")
+ print(f"Pool reset count: {stats['reset_count']}")
+
+
+if __name__ == '__main__':
+ input_file = 'tagged_musics.jsonl'
+ output_file = 'generate_lrc_5mini.jsonl'
+ selected_file = 'selected.txt'
+ # n=1, max_retries=0, sample 10 songs each time, generate 5 new songs
+ batch_generate_music(
+ input_file=input_file,
+ output_file=output_file,
+ selected_file=selected_file,
+ total_songs=10,
+ sample_size=4,
+ model='gpt-5-mini',
+ num_threads=5 # Test with 1 thread first
+ )
+ # Append to txt file
\ No newline at end of file
diff --git a/data_pipeline/lyrics_gene/gen_lyrics_en.py b/data_pipeline/lyrics_gene/gen_lyrics_en.py
new file mode 100644
index 0000000000000000000000000000000000000000..341b59069c8145e61c1f44ce023f8eea10ca815e
--- /dev/null
+++ b/data_pipeline/lyrics_gene/gen_lyrics_en.py
@@ -0,0 +1,577 @@
+import os
+import json
+import time
+import random
+import re
+from openai import OpenAI
+from tqdm import tqdm
+import threading
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+# Set environment variables
+# Note: Set these environment variables before running the script
+# export OPENAI_API_KEY="your-api-key"
+# export OPENAI_BASE_URL="https://api.openai.com/v1" # or your custom API URL
+if not os.environ.get("OPENAI_API_KEY"):
+ os.environ["OPENAI_API_KEY"] = "" # Replace with your API key or set via environment variable
+if not os.environ.get("OPENAI_BASE_URL"):
+ os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1" # Replace with API URL or set via environment variable
+
+# Initialize client
+client = OpenAI()
+
+
+def _extract_lyrics_timestamps(lyrics_text):
+ """
+ Extract timestamps from lyrics and convert to seconds
+ Args:
+ lyrics_text: Lyrics string
+ Returns:
+ List[float]: Timestamps in order (seconds)
+ """
+ if not isinstance(lyrics_text, str):
+ return []
+ pattern = re.compile(r'\[(\d{2}):(\d{2})(?:\.(\d{2}))?\]')
+ timestamps = []
+ for match in pattern.finditer(lyrics_text):
+ minutes = int(match.group(1))
+ seconds = int(match.group(2))
+ fraction = match.group(3)
+ total_seconds = minutes * 60 + seconds
+ if fraction is not None:
+ divisor = 100 if len(fraction) == 2 else 10 ** len(fraction)
+ total_seconds += int(fraction) / divisor
+ timestamps.append(total_seconds)
+ return timestamps
+
+
+def _validate_timestamps(lyrics_text, min_last_timestamp=170, max_interval=35):
+ """
+ Validate if timestamps in lyrics meet requirements
+ Args:
+ lyrics_text: Lyrics string
+ min_last_timestamp: Minimum value of last timestamp (seconds)
+ max_interval: Maximum interval between last two timestamps (seconds)
+ Returns:
+ bool: Whether validation passed
+ """
+ timestamps = _extract_lyrics_timestamps(lyrics_text)
+ if len(timestamps) < 2:
+ print("Validation failed: Timestamp count less than 2")
+ return False
+ last = timestamps[-1]
+ second_last = timestamps[-2]
+ if last < min_last_timestamp:
+ print(f"Validation failed: Last timestamp {last:.2f}s is less than {min_last_timestamp}s")
+ return False
+ if last - second_last > max_interval:
+ print(f"Validation failed: Interval between last two timestamps {last - second_last:.2f}s is greater than {max_interval}s")
+ return False
+ return True
+
+
+def chat_gpt(text, model='gpt-4o-mini'):
+ while True:
+ try:
+ # Call OpenAI chat completions API
+ completion = client.chat.completions.create(
+ model=model, # Use GPT-4o-mini model
+ messages=[
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": text}
+ ]
+ )
+ # Get response content
+ if getattr(completion.choices[0].message, 'content', None):
+ content = completion.choices[0].message.content.strip()
+ return content
+ else:
+ print('error_wait_2s')
+ except Exception as e:
+ print(f"Error: {e}")
+ time.sleep(2)
+
+
+def chat_gpt_call(text, model='gpt-4o-mini'):
+ # Call OpenAI chat completions API
+ completion = client.chat.completions.create(
+ model=model, # Use GPT-4o-mini model
+ messages=[
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": text}
+ ]
+ )
+ # Get response content
+ if getattr(completion.choices[0].message, 'content', None):
+ content = completion.choices[0].message.content.strip()
+ return content
+ else:
+ print('error_wait_2s')
+
+
+def generate_music_descriptions(all_music_data, index_pool, output_file, file_lock, sample_size=20, model='gpt-4o-mini', max_retries=0):
+ """
+ Read music data file, randomly sample, call GPT to generate new music descriptions and lyrics
+
+ Args:
+ all_music_data: List of all music data
+ index_pool: Index pool object (thread-safe)
+ output_file: Path to output jsonl file
+ file_lock: File write lock
+ sample_size: Number of samples to randomly extract
+ model: Model name to use
+ max_retries: Maximum retry count
+
+ Returns:
+ (used_indices, success_count): List of used indices and number of successful generations
+ """
+ # duration_ranges = [
+ # ("3.00", "3.15", 50), ("3.15", "3.30", 50), ("3.30", "3.45", 60),
+ # ("3.45", "4.00", 60),
+ # ("4.00", "4.15", 70), ("4.15", "4.30", 70), ("4.30", "4.45", 70)
+ # ]
+ duration_ranges = [
+ ("3.00", "3.15", 60), ("3.15", "3.30", 70), ("3.30", "3.45", 80),("3.45", "4.00", 90),
+ ("4.00", "4.15", 100), ("4.15", "4.30", 100), ("4.30", "4.45", 100)
+ ]
+ selected_range = random.choice(duration_ranges)
+ require_length = selected_range[2]
+
+ # Directly convert to timestamp format (strictly corresponding to left and right ends of tuple)
+ start_timestamp = f"[{selected_range[0].replace('.', ':')}.00]"
+ end_timestamp = f"[{selected_range[1].replace('.', ':')}.00]"
+
+ # Generate duration description
+ # Convert seconds to minutes and seconds format
+ start_seconds = float(selected_range[0])
+ start_minutes = int(start_seconds // 60)
+ start_secs = int(start_seconds % 60)
+ start_duration = f"{start_minutes}min {start_secs}sec"
+
+ end_seconds = float(selected_range[1])
+ end_minutes = int(end_seconds // 60)
+ end_secs = int(end_seconds % 60)
+ end_duration = f"{end_minutes}min {end_secs}sec"
+
+ # Generate random timestamps in examples (randomly generated within time range)
+ # Parse time string to minutes and seconds
+ start_parts = selected_range[0].split('.')
+ end_parts = selected_range[1].split('.')
+
+ start_minutes = int(start_parts[0])
+ start_seconds = int(start_parts[1])
+ end_minutes = int(end_parts[0])
+ end_seconds = int(end_parts[1])
+
+ # Convert to total seconds
+ start_total_seconds = start_minutes * 60 + start_seconds
+ end_total_seconds = end_minutes * 60 + end_seconds
+
+ # Randomly generate within range
+ example1_seconds = random.randint(start_total_seconds, end_total_seconds)
+ example2_seconds = random.randint(start_total_seconds, end_total_seconds)
+
+ example1_minutes = example1_seconds // 60
+ example1_secs = example1_seconds % 60
+ example2_minutes = example2_seconds // 60
+ example2_secs = example2_seconds % 60
+
+ example1_timestamp = f"[{example1_minutes:02d}:{example1_secs:02d}.00]"
+ example2_timestamp = f"[{example2_minutes:02d}:{example2_secs:02d}.00]"
+
+ # Get sample indices from index pool (thread-safe)
+ selected_indices = index_pool.get_indices(sample_size)
+ if not selected_indices:
+ return [], 0
+
+ sample_data = [all_music_data[i] for i in selected_indices]
+
+ # Extract all unique styles
+ styles = []
+ for data in sample_data:
+ style = data.get('style', '')
+ if style and style not in styles:
+ styles.append(style)
+
+ styles_text = "、".join(styles)
+
+ # Build example text - include all sampled data (excluding style)
+ examples = []
+ for i, data in enumerate(sample_data, 1):
+ lyrics_text = " ".join(data.get('lyrics', [])) if isinstance(data.get('lyrics'), list) else data.get('lyrics', '')
+ description = data.get('description', '')
+ examples.append(f"Example {i}:\ndescription: {description}\nlyrics: {lyrics_text}")
+
+ examples_text = "\n\n".join(examples)
+
+ prompt = f"""Generate 2 complete songs. Each song must meet the following hard requirements:
+- Strictly forbidden to generate lyrics with fewer than {require_length} lines!
+- The number of lyric lines for each song must be strictly greater than {require_length}. This is a hard requirement!
+- The timestamp of the final line must be between {start_timestamp} and {end_timestamp}.
+- The two songs must differ in duration and line count; their final timestamps must not be identical.
+- The timestamp interval between adjacent lyric lines must not exceed 10 seconds! Timestamps must be continuous and progress naturally.
+- Awkward gaps like "[03:25.00]in the heart[04:25.00]the last lyric" are strictly forbidden. Do not exceed a 10-second interval.
+- It is strictly forbidden to repeat the entire structure or its sections after one iteration is complete. It is also strictly forbidden to repeat the same lyric line multiple times.
+If any of the above requirements are not met, the generation is considered a failure. Please regenerate.
+Please generate 2 new, diverse music descriptions and LRC format lyrics. The language should be English.
+
+
+Creative Requirements:
+1. Style and Genre must be diverse.
+2. Description Tagging Requirements (Must be strictly followed):
+ The description field must use a structured tag format, including the following tags, separated by commas:
+ - Music Style tag
+ - Music Genre tag
+ - Instruments tag
+ - Emotional Tone tag
+ - Mood/Atmosphere tag
+ - Vocal Style and Voice tag, limited to either "male voice" or "female voice", solo performance only.
+ Note: Each tag should be concise. Multiple tags of the same category can be separated by a slash (e.g., "Piano/Violin").
+3. Lyric Creativity: The lyrics should have depth and artistry:
+ - Themes can cover various aspects such as love, life, society, nature, philosophy, dreams, memories, etc.
+ - Use rich literary devices: metaphors, imagery, contrast, parallelism, etc.
+ - Express sincere emotions with a focus on rhyme and rhythm.
+ - The style can be narrative, lyrical, or stream-of-consciousness.
+4. Lyric Structure and Length Requirements (Must be strictly followed):
+ - The lyrics must be organized using the following structure, with section tags annotating each part.
+ - The structure must strictly follow this order, for a total of 8 section tags: [Verse 1] → [Pre-Chorus] → [Chorus] → [Verse 2] → [Pre-Chorus] → [Chorus] → [Bridge] → [Chorus (Outro)].
+ - A single song can only have these 8 section tags. [Verse 1] and [Verse 2] appear once; [Pre-Chorus] and [Chorus] appear twice; [Bridge] and [Chorus (Outro)] appear once. Do not add or repeat extra section tags.
+ - Each section tag (e.g., [Verse 1], [Chorus]) must be on its own line, immediately followed by the LRC format lyrics for that section.
+ - Separate sections with a blank line.
+ - **Total Line Count Requirement**: The entire song must contain at least {require_length} lines of timestamped lyrics (not including section tags or blank lines).
+5. LRC Format Mandatory Rules (Must be strictly followed):
+ - Each line of lyrics must be in the format `[mm:ss.xx]Lyric content`, with no space between the timestamp and the lyrics. The lyric content should be coherent.
+ - **Each line must contain only one short phrase of lyrics.** Start a new line when encountering punctuation like commas or periods.
+ - **Strictly forbidden to merge multiple sentences or clauses onto the same line.**
+ - Timestamps must be distributed naturally. **The first line's timestamp must not be [00:00.00]**. Allow for an instrumental intro (suggestion: start between [00:05.00] and [00:15.00]).
+ - Timestamp intervals must be varied: The intervals within each song must be diverse, often using decimal values. Do not use a fixed interval:
+ * A single song must contain a variety of different intervals; do not use the same interval for all lines (e.g., not all 4-second gaps).
+ * Dynamically adjust intervals based on the emotional intensity and rhythm of the lyrics.
+ * The gap between adjacent lines should vary to reflect the musical rhythm.
+ - Timestamp allocation should be reasonably inferred based on the song's style, emotion, and rhythm, not mechanically assigned based on lyric length.
+ - The length of each lyric line should vary naturally; do not make them all uniform.
+ - **The total song duration must be between {start_duration} and {end_duration} (meaning the final line's timestamp must be between {start_timestamp} and {end_timestamp}). This is a hard requirement!**
+6. Lyric Length Requirement: The number of lyric lines in the lyrics field must be greater than {require_length}. If the generated length is too short, please regenerate.
+7. Uniqueness and Originality: Each piece should be unique. Avoid simply repeating the content from examples.
+8. Format Requirements:
+ - Directly return a JSON array containing 2 song objects. Each object must have only "description" and "lyrics" fields.
+ - `description` field: Must be in tag format, not narrative text.
+ - `lyrics` field: A string in LRC format with section tags.
+ - Strictly forbidden to insert any extra symbols, markers, comments, or explanatory text within the JSON.
+
+LRC Format Example (with section tags):
+[Verse 1]
+[00:08.00]First line of lyrics
+[00:12.50]Second line of lyrics
+[00:17.20]Third line of lyrics
+
+[Pre-Chorus]
+[00:22.00]Pre-chorus lyrics
+[00:26.50]Pre-chorus lyrics
+
+[Chorus]
+[00:31.00]Chorus lyrics
+[00:35.50]Chorus lyrics
+
+Negative Examples (to avoid):
+- Incorrect: [01:30.00](Piano Interlude) - Do not add parenthetical comments after the timestamp.
+- Incorrect: [00:00.00]Starting lyric - The first line cannot start at 00:00.00.
+- Incorrect: [00:05.00]In the familiar field, the sun casts golden rays upon the wheat - Strictly forbidden to place multiple clauses on the same line.
+- Incorrect: [03:00.00] In the light of hope[03:05.50] In the light of hope[03:10.20] In the light of hope -Excessive repetition of the exact same lyric line is strictly forbidden. Lyrical content must show variation.
+Now, please fully unleash your creativity and generate 2 new, complete works of music descriptions and LRC format lyrics.
+Special Reminder: Each song must be complete, not abbreviated or omitted! It must contain the full 8 sections (Verse 1, Pre-Chorus, Chorus, Verse 2, Pre-Chorus, Chorus, Bridge, Chorus Outro) and strictly ensure more than {require_length} lines of lyrics.
+
+Directly return in JSON array format:
+[
+ {{"description": "...", "lyrics": "..."}},
+ {{"description": "...", "lyrics": "..."}}
+]"""
+ # Try to generate with retry mechanism
+ for attempt in range(max_retries + 1):
+ try:
+ # Call OpenAI API
+ completion = client.chat.completions.create(
+ model=model,
+ messages=[
+ {"role": "system", "content": f"You are a creative music lyricist and composer. Please generate diverse and creative music tag-based descriptions and LRC format lyrics with song structure tags. CRITICAL REQUIREMENTS: 1) Description must be structured tags separated by commas, NOT narrative text. 2) Return ONLY pure, valid JSON format without any extra symbols, markers, or comments. 3) Each song must include structure tags like [Verse 1], [Chorus], [Bridge], etc., followed by LRC format lyrics [mm:ss.xx]lyric_content. 4) MANDATORY: Each song must have MORE than {require_length} lines of lyrics with timestamps. "},
+ {"role": "user", "content": prompt}
+ ],
+ n=1,
+ temperature=1.0,
+ )
+ #print(prompt)
+ # Extract all responses
+ results = []
+ filtered_count = 0
+ last_content = None
+
+ for i, choice in enumerate(completion.choices, 1):
+ try:
+ content = choice.message.content.strip()
+ last_content = content
+ print(f"\n=== GPT Response {i} ===")
+ print(content)
+ print("=" * 50)
+ # Try to extract JSON content
+ if "```json" in content:
+ content = content.split("```json")[1].split("```")[0].strip()
+ elif "```" in content:
+ content = content.split("```")[1].split("```")[0].strip()
+
+ # Clean trailing commas in JSON (extra commas)
+ # Remove commas after last element of object/array
+ content = re.sub(r',(\s*[}\]])', r'\1', content)
+
+ # Parse JSON array
+ result_array = json.loads(content)
+
+ # Ensure it's a list
+ if isinstance(result_array, list):
+ # Validate each object in array
+ for song in result_array:
+ if isinstance(song, dict) and 'description' in song and 'lyrics' in song:
+ if _validate_timestamps(song.get('lyrics', '')):
+ results.append(song)
+ else:
+ filtered_count += 1
+ # If returned a single object (compatibility with old format)
+ elif isinstance(result_array, dict) and 'description' in result_array and 'lyrics' in result_array:
+ if _validate_timestamps(result_array.get('lyrics', '')):
+ results.append(result_array)
+ else:
+ filtered_count += 1
+
+ except json.JSONDecodeError:
+ continue
+
+ if filtered_count:
+ print(f"Total {filtered_count} songs filtered due to timestamp validation failure")
+
+ # Print parsing results
+ print(f"\nParsing complete, results length: {len(results)}")
+ print(f"Results content: {results}")
+ print(start_duration, end_duration,example1_timestamp,example2_timestamp,require_length)
+
+ # If parsed result length is not 2, write model response content to test.txt
+ if len(results) != 2:
+ print(f"Warning: Parsed result length is not 2, actual is {len(results)}, will write to test.txt")
+ with open('test.txt', 'w', encoding='utf-8') as f:
+ if last_content is not None:
+ f.write(last_content)
+ print("Written to test.txt file")
+
+ # Check if successfully generated 50 songs (10 responses * 5 each)
+ if len(results) >= 50:
+ # Append save results to file (use lock to ensure thread safety)
+ with file_lock:
+ with open(output_file, 'a', encoding='utf-8') as f:
+ for result in results[:50]: # Only save first 50 songs
+ f.write(json.dumps(result, ensure_ascii=False) + '\n')
+
+ return selected_indices, min(len(results), 50)
+ elif attempt < max_retries:
+ print(f"Only successfully parsed {len(results)}/50 songs, retrying...")
+ time.sleep(2)
+ else:
+ # Last attempt, save even if not 50 songs
+ if len(results) > 0:
+ with file_lock:
+ with open(output_file, 'a', encoding='utf-8') as f:
+ for result in results:
+ f.write(json.dumps(result, ensure_ascii=False) + '\n')
+ return selected_indices, len(results)
+
+ except Exception as e:
+ if attempt < max_retries:
+ print(f"Error occurred during generation: {e}, retrying...")
+ time.sleep(2)
+ else:
+ print(f"Generation failed: {e}")
+ return selected_indices, 0
+
+ return selected_indices, 0
+
+
+class IndexPool:
+ """Thread-safe index pool with automatic reset support"""
+
+ def __init__(self, total_size, selected_file):
+ self.total_size = total_size
+ self.selected_file = selected_file
+ self.lock = threading.Lock()
+ self.available_indices = []
+ self.selected_indices = set()
+ self.reset_count = 0 # Record reset count
+
+ # Load selected indices from file
+ self._load_selected_indices()
+ # Initialize available indices
+ self._reset_pool()
+
+ def _load_selected_indices(self):
+ """Load selected indices from file"""
+ if os.path.exists(self.selected_file):
+ with open(self.selected_file, 'r', encoding='utf-8') as f:
+ for line in f:
+ self.selected_indices.add(int(line.strip()))
+
+ def _reset_pool(self):
+ """Reset index pool"""
+ # Calculate available indices
+ self.available_indices = [i for i in range(self.total_size) if i not in self.selected_indices]
+ random.shuffle(self.available_indices) # Shuffle order
+
+ if len(self.available_indices) == 0:
+ # If no available indices, all have been used, reset selected_indices
+ self.reset_count += 1
+ print(f"\nIndex pool exhausted, resetting pool for the {self.reset_count}th time, re-selecting from {self.total_size} songs")
+ self.selected_indices.clear()
+ self.available_indices = list(range(self.total_size))
+ random.shuffle(self.available_indices)
+
+ def get_indices(self, count):
+ """
+ Thread-safe get specified number of indices
+
+ Args:
+ count: Number of indices needed
+
+ Returns:
+ List of selected indices
+ """
+ with self.lock:
+ # Check if pool needs to be reset
+ if len(self.available_indices) < count:
+ self._reset_pool()
+
+ # Get indices
+ selected = self.available_indices[:count]
+ self.available_indices = self.available_indices[count:]
+
+ # Add to selected set
+ for idx in selected:
+ self.selected_indices.add(idx)
+
+ # Write to file
+ with open(self.selected_file, 'a', encoding='utf-8') as f:
+ for idx in selected:
+ f.write(f"{idx}\n")
+
+ return selected
+
+ def get_stats(self):
+ """Get statistics"""
+ with self.lock:
+ return {
+ 'available': len(self.available_indices),
+ 'selected': len(self.selected_indices),
+ 'reset_count': self.reset_count
+ }
+
+
+def batch_generate_music(input_file, output_file, selected_file, total_songs=1000, sample_size=20, model='gpt-4o-mini', num_threads=10):
+ """
+ Batch generate music descriptions and lyrics (multi-threaded version)
+
+ Args:
+ input_file: Path to input jsonl file
+ output_file: Path to output jsonl file
+ selected_file: Path to file recording selected indices
+ total_songs: Total number of songs to generate
+ sample_size: Number of samples to extract each time
+ model: Model name to use
+ num_threads: Number of threads
+ """
+ # Load all music data
+ print("Loading music data...")
+ all_music_data = []
+ with open(input_file, 'r', encoding='utf-8') as f:
+ for line in f:
+ data = json.loads(line.strip())
+ all_music_data.append(data)
+ print(f"Loaded {len(all_music_data)} songs")
+
+ # Create thread-safe index pool
+ index_pool = IndexPool(len(all_music_data), selected_file)
+ stats = index_pool.get_stats()
+ print(f"Currently selected indices: {stats['selected']}")
+ print(f"Currently available indices: {stats['available']}")
+
+ # Calculate number of calls needed (5 songs per call)
+ num_iterations = (total_songs + 1) // 2 # Round up
+ print(f"Need to call {num_iterations} times to generate approximately {total_songs} songs (5 per call)")
+ print(f"Using {num_threads} threads for parallel processing\n")
+
+ # Create file write lock
+ file_lock = threading.Lock()
+
+ # Statistics
+ total_generated = 0
+ generated_lock = threading.Lock()
+
+ def worker_task(task_id):
+ """Worker thread task"""
+ try:
+ used_indices, success_count = generate_music_descriptions(
+ all_music_data=all_music_data,
+ index_pool=index_pool,
+ output_file=output_file,
+ file_lock=file_lock,
+ sample_size=sample_size,
+ model=model,
+ max_retries=0 # Retry
+ )
+ return success_count
+ except Exception as e:
+ print(f"Task {task_id} failed: {e}")
+ return 0
+
+ # Use thread pool and progress bar
+ with ThreadPoolExecutor(max_workers=num_threads) as executor:
+ # Submit all tasks
+ futures = {executor.submit(worker_task, i): i for i in range(num_iterations)}
+
+ # Use tqdm to show progress
+ with tqdm(total=num_iterations, desc="Generation progress", unit="batch") as pbar:
+ for future in as_completed(futures):
+ success_count = future.result()
+
+ with generated_lock:
+ total_generated += success_count
+
+ # Get current statistics
+ stats = index_pool.get_stats()
+
+ # Update progress bar
+ pbar.set_postfix({
+ 'Batch': f'{success_count}/5',
+ 'Total': total_generated,
+ 'Remaining': stats['available'],
+ 'Resets': stats['reset_count']
+ })
+ pbar.update(1)
+
+ # Final statistics
+ stats = index_pool.get_stats()
+ print(f"\nGeneration complete!")
+ print(f"Total generated: {total_generated} songs")
+ print(f"Used {stats['selected']} indices")
+ print(f"Remaining available indices: {stats['available']}")
+ print(f"Pool reset count: {stats['reset_count']}")
+
+
+if __name__ == '__main__':
+ input_file = 'tagged_musics.jsonl'
+ output_file = 'generate_en_lrc.jsonl'
+ selected_file = 'selected.txt'
+ # n=1, max_retries=0, sample 10 songs each time, generate 5 new songs
+ batch_generate_music(
+ input_file=input_file,
+ output_file=output_file,
+ selected_file=selected_file,
+ total_songs=100,
+ sample_size=4,
+ model='gpt-4o-mini',
+ num_threads=20 # Test with 1 thread first
+ )
+ # Append to txt file
\ No newline at end of file
diff --git a/data_pipeline/meta_process/convert_convs.py b/data_pipeline/meta_process/convert_convs.py
new file mode 100644
index 0000000000000000000000000000000000000000..cc886913dffec3c779a0a04ce9642afc32461c85
--- /dev/null
+++ b/data_pipeline/meta_process/convert_convs.py
@@ -0,0 +1,98 @@
+"""
+Generate multi-turn dialogue data, each turn contains lyric text and corresponding audio token slices
+"""
+
+import json
+import re
+import os
+import torch
+from tqdm import tqdm
+from my_tool import load_jsonl
+
+TOKEN_PER_SECOND = 25 # Number of tokens per second of audio
+NUM_ITEMS = 100000 # Process N items first
+
+timestamp_pattern = re.compile(r"\[([0-9]{1,2}):([0-9]{1,2})(?:[.:]([0-9]{1,3}))?\]")
+
+def _parse_lyric_with_timestamps(lyric: str):
+ """
+ Return [(start_time_s, text), ...] sorted by timestamp
+ """
+ result = []
+ for match in timestamp_pattern.finditer(lyric):
+ start_idx = match.end()
+ end_idx = lyric.find("[", start_idx)
+ text = lyric[start_idx:end_idx].strip() if end_idx != -1 else lyric[start_idx:].strip()
+ if not text:
+ continue
+ minute = int(match.group(1))
+ second = int(match.group(2))
+ ms = int(match.group(3)) if match.group(3) else 0
+ total_seconds = minute * 60 + second + ms / 1000
+ result.append((total_seconds, text))
+ return result
+
+def _load_audio_tokens(pt_file):
+ """
+ Load MuCodec encoding of audio
+ """
+ audio_ids = torch.load(pt_file, map_location="cpu").squeeze().long()
+ return audio_ids
+
+def _get_token_slice(audio_tokens, start_s, end_s):
+ """Split encoding by time segment"""
+ start_idx = int(start_s * TOKEN_PER_SECOND)
+ end_idx = int(end_s * TOKEN_PER_SECOND)
+ sliced = audio_tokens[start_idx:end_idx]
+ return "[SOA]" + "".join([f"" for i in sliced]) + "[EOA]"
+
+def _process_item(item, pt_dir:str):
+ song_name = item.get("song") or item.get("name")
+ song_name = song_name.split('.mp3')[0] # For mucodec, remove extension
+ pt_file = os.path.join(pt_dir, f"{song_name}.pt")
+ if not os.path.exists(pt_file):
+ return None
+
+ audio_tokens = _load_audio_tokens(pt_file)
+ tlyric_ = item.get('tlyric', "")
+ lyric_ = item.get('lyric', "")
+ lyric = tlyric_ if len(tlyric_) > len(lyric_) else lyric_
+ lyrics_ts = _parse_lyric_with_timestamps(lyric)
+
+ if not lyrics_ts:
+ # Skip if no lyrics
+ return None
+
+ rounds = []
+
+ # First generate a system message containing song information
+ intro_text = (
+ f"请生成一首歌曲,歌名为《{item.get('name', '')}》,风格是{item.get('style','')}"
+ f",情绪为{item.get('emotion','')},节奏:{item.get('rhythm','')},"
+ f"{item.get('description','')},由{item.get('singer','')}演唱,语言:{item.get('lang','')}。"
+ f"歌词如下:" + " ".join([text for _, text in lyrics_ts]) + "接下来我会逐句告诉你需要生成歌曲片段的歌词,\n请先生成前奏"
+ )
+ rounds.append({"role": "user", "content": intro_text})
+ rounds.append({"role": "assistant", "content": _get_token_slice(audio_tokens, 0, lyrics_ts[0][0])}) # Intro tokens
+
+ # Each lyric line corresponds to one round
+ for idx, (start_s, text) in enumerate(lyrics_ts[:-1]): ## Last line handled separately
+ end_s = lyrics_ts[idx + 1][0] if idx + 1 < len(lyrics_ts) else len(audio_tokens)/TOKEN_PER_SECOND # Last line to end of audio
+ rounds.append({"role": "user", "content": text})
+ rounds.append({"role": "assistant", "content": _get_token_slice(audio_tokens, start_s, end_s)})
+
+ # Tail processing logic
+ rounds.append({"role": "user", "content": f"请生成歌词{lyrics_ts[-1][1]}以及歌曲结尾"})
+ rounds.append({"role": "assistant", "content": _get_token_slice(audio_tokens, lyrics_ts[-1][0], len(audio_tokens)/TOKEN_PER_SECOND)})
+
+ return rounds
+
+# ===== External Interface =====
+
+def get_convert_convs(dataset:list[dict], pt_dir:str, save_path:str):
+ with open(save_path, "w", encoding="utf-8") as fout:
+ for item in tqdm(dataset, desc="Converting convs"):
+ rounds = _process_item(item, pt_dir)
+ if not rounds:
+ continue
+ fout.write(json.dumps({"messages": rounds}, ensure_ascii=False) + "\n")
\ No newline at end of file
diff --git a/data_pipeline/meta_process/convert_lyrics.py b/data_pipeline/meta_process/convert_lyrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..c00b4478a22cad8f4aeb0a414471a9b4c682a762
--- /dev/null
+++ b/data_pipeline/meta_process/convert_lyrics.py
@@ -0,0 +1,180 @@
+import os
+import re
+import json
+import copy
+from tqdm import tqdm
+from my_tool import dict_sort_print
+from collections import defaultdict
+from convert_convs import _parse_lyric_with_timestamps
+
+# ===== Lyric Parsing =====
+
+def _parse_lyrics(text:str) -> dict:
+ """Parse metadata, lyrics and timestamps from lyric information"""
+ segs = text.split("\n")
+ metadata = {
+ "lyrics_meta": {},
+ "lyrics": [],
+ "lyrics_time": [],
+ }
+ for seg in segs:
+ # Format: [time] metadata / lyrics
+ results = _parse_lyric_with_timestamps(seg)
+ for time, content in results:
+ if ":" in content or ":" in content:
+ # Metadata
+ pos1 = content.find(":")
+ pos2 = content.find(":")
+ pos = pos1 if pos1 != -1 else pos2
+ key = content[:pos].strip()
+ value = content[pos+1:].strip()
+ metadata["lyrics_meta"][key] = value
+ elif time == "00:00.00":
+ # Unstructured metadata at the beginning
+ continue
+ elif len(metadata['lyrics']) == 0 and "/" in content:
+ # Unstructured metadata at the beginning
+ continue
+ else:
+ # Only keep English and space punctuation
+ if len(content) == 0:
+ # Middle gap/end
+ if len(metadata['lyrics']) != 0 and metadata['lyrics'][-1] != "":
+ # If there's no previous segment (beginning), or previous segment is empty, don't record (merge)
+ metadata['lyrics'].append("")
+ metadata['lyrics_time'].append(time)
+ else:
+ if len(metadata['lyrics_time']) != 0 and metadata['lyrics_time'][-1] == time and time != "":
+ # Same timestamp means it's a translation (don't record)
+ continue
+ # Actual lyrics
+ metadata['lyrics'].append(content)
+ metadata['lyrics_time'].append(time)
+ return metadata
+
+# ===== Language Detection =====
+
+def _count_ch_nan(text:str):
+ """Count the number of Chinese and other non-English characters in a string"""
+ ch_num = 0
+ nan_num = 0
+ nan = ""
+ for c in text:
+ if '\u4e00' <= c <= '\u9fff':
+ ch_num += 1
+ elif ('a' <= c <= 'z') or ('A' <= c <= 'Z') or len(c.strip()) == 0:
+ continue
+ else:
+ nan_num += 1
+ nan += c
+ # if len(nan) > 0:
+ # print(nan)
+ return ch_num, nan_num
+
+def _lang_decide(lyrics:list[str], val_limit:int=5, word_limit=3) -> str:
+ """
+ Determine the language type of lyrics (en/zh/ez/instrument/nan)
+ - val_limit: Only count if there are at least this many sentences
+ - word_limit: Only count if a sentence has at least this many words
+ """
+ ch_lyrics = 0
+ en_lyrics = 0
+ nan_lyrics = 0
+ for lyric in lyrics:
+ lyric = copy.deepcopy(lyric)
+ if lyric.strip() == "":
+ continue
+ lyric = re.sub(r"[''¥·′´(),。?""!@#$%^&*()?.'/,=+_—— !…《》<>0-9~※~;-・\"、☆|△【】#「」‖{}\[\]-]", " ", lyric)
+ ch_num, nan_num = _count_ch_nan(lyric)
+
+ if nan_num > word_limit:
+ nan_lyrics += 1
+ continue
+ elif ch_num > word_limit:
+ ch_lyrics += 1
+
+ lyric = re.sub(r'[\u4e00-\u9fff]+', '', lyric)
+ # Count English words by space separation
+ en_num = len(lyric.split(" "))
+ if en_num > word_limit:
+ en_lyrics += 1
+
+ if nan_lyrics > val_limit:
+ return "nan"
+ if ch_lyrics > val_limit and en_lyrics > val_limit:
+ return "ez"
+ if ch_lyrics > val_limit:
+ return "zh"
+ if en_lyrics > val_limit:
+ return "en"
+ return "instrument"
+
+# ===== External Interface =====
+
+def get_convert_lyrics(dataset:list[dict], save_path:str, dir:str, src_subfix:str=""):
+ """Convert lyrics and annotate language type (need to locate corresponding song)"""
+ new_dataset = []
+ lang_count = defaultdict(int)
+ unmatch = []
+ with open(save_path, 'w', encoding='utf-8') as file:
+ for ele in tqdm(dataset, desc="Converting Lyrics"):
+ ele = copy.deepcopy(ele)
+ # Skip if no lyrics
+ if not ele['has_lyric']:
+ # Don't add to final result
+ continue
+ # Get lyrics
+ lyric = ele['lyric']
+ if lyric == "":
+ lyric = ele['tlyric']
+
+ # Parse lyrics
+ new_data = _parse_lyrics(lyric)
+
+ # Language detection
+ lang = _lang_decide(new_data['lyrics'])
+ lang_count[lang] += 1
+
+ # Remove redundant fields
+ del ele['artists']
+ del ele['lyric']
+ del ele['tlyric']
+ del ele['has_lyric']
+
+ # Add new fields
+ ele['lyric_lang'] = lang
+ ele['source'] += src_subfix
+ for key, value in new_data.items():
+ ele[key] = value
+
+ new_dataset.append(ele)
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+
+ dict_sort_print(lang_count)
+ return new_dataset, unmatch
+
+def get_match_music(music_data:list[dict], lyric_data:list[dict]):
+ """Get songs that match or don't match with lyrics"""
+ # 1. Build lookup set from songs
+ name_map = {}
+ for ele in tqdm(lyric_data, desc="Existing Lyrics"):
+ name = ele['name']
+ name = re.sub(" ", "", name)
+ artist = ele['artist']
+ complete_name = f"{name} - {artist}.mp3"
+ name_map[complete_name] = ele
+
+ # 2. Iterate through songs to find remaining ones
+ matches = []
+ unmatches = []
+ for ele in tqdm(music_data, desc="Check Matching"):
+ path = ele['path']
+ name = os.path.basename(path)
+ if name not in name_map:
+ unmatches.append(ele)
+ else:
+ meta = name_map[name]
+ meta['path'] = path
+ matches.append(meta)
+ return matches, unmatches
diff --git a/data_pipeline/meta_process/convert_messages.py b/data_pipeline/meta_process/convert_messages.py
new file mode 100644
index 0000000000000000000000000000000000000000..5c1325bbe54d0ead83ff97f6ff71ed2e482b1671
--- /dev/null
+++ b/data_pipeline/meta_process/convert_messages.py
@@ -0,0 +1,593 @@
+"""
+Generate multi-turn dialogue data based on segment descriptions (segment-by-segment generation).
+
+Input:
+- MuseData/data/meta_suno_cn.jsonl # Base meta (contains src_path, lyrics, lang, tag, etc.)
+- 3_block_80000_cn_desc.jsonl # Segment descriptions, contains startS/endS/text/desc for each section
+- mucodec pt directory: same as PT_DIR_CN in multi_data_suno.py
+
+Output:
+- MuseData/sft_dataset_suno_cn.jsonl # Multi-turn dialogues generated segment by segment
+
+Dialogue format example (refer to temp1.jsonl):
+- First user: Summary prompt (Chinese), explains "segment-by-segment" + provides [Intro dsec...] description
+- First assistant: Intro tokens (0 ~ first section's startS)
+- Subsequent segments:
+ user.content = "[{Section} dsec]{desc}\\n{text}"
+ assistant.content = corresponding time segment tokens
+"""
+
+import json
+import os
+import re
+from typing import Dict, List, Tuple, Optional
+
+import torch
+from tqdm import tqdm
+from my_tool import load_jsonl, clean_newlines
+
+# Language configuration
+LANG = "en"
+# Path configuration
+META_FILE = f"meta_suno_{LANG}.jsonl"
+# desc folder, read all *.jsonl files in it
+DESC_DIR = "desc"
+PT_DIR = f"suno_mucodec_{LANG}"
+# Output directory (different from meta directory), each desc file generates a set of three files
+OUTPUT_DIR = "outputs"
+OUTPUT_BASENAME = "minus_phonemes"
+
+TOKEN_PER_SECOND = 25
+
+
+LOG_FILE = os.path.join(OUTPUT_DIR, "section_mismatch_cn.log") # Place in output directory
+
+def _log_warning(msg: str):
+ """Print and save to file"""
+ print(msg, end='\n')
+ with open(LOG_FILE, 'a', encoding='utf-8') as f:
+ f.write(msg + '\n')
+
+
+# Timestamp parsing regex
+timestamp_pattern = re.compile(
+ r"\[([0-9]{1,2}):([0-9]{1,2})(?:[.:]([0-9]{1,3}))?\]"
+)
+
+
+def load_pt(pt_file: str) -> torch.Tensor:
+ return torch.load(pt_file, map_location="cpu").squeeze().long()
+
+
+def get_token_slice(audio: torch.Tensor, start_s: float, end_s: float) -> str:
+ if start_s < 0:
+ start_s = 0
+ if end_s < 0:
+ end_s = 0
+ s_idx = int(start_s * TOKEN_PER_SECOND)
+ e_idx = int(end_s * TOKEN_PER_SECOND)
+ s_idx = max(0, min(s_idx, audio.shape[0]))
+ e_idx = max(0, min(e_idx, audio.shape[0]))
+ if e_idx <= s_idx:
+ sliced = []
+ else:
+ sliced = audio[s_idx:e_idx]
+ return "[SOA]" + "".join(f"" for i in sliced) + "[EOA]"
+
+
+def infer_pt_path(src_path: str) -> Optional[str]:
+ if not src_path:
+ return None
+ stem = os.path.splitext(os.path.basename(src_path))[0]
+ return os.path.join(PT_DIR, f"{stem}.pt")
+
+
+def parse_lyric_with_timestamps(lyric: str) -> List[Tuple[float, str]]:
+ """
+ Parse [(start_time_s, text), ...] from lyrics with timestamps, sorted by time ascending.
+ The returned timestamps come from the lyrics field in the meta file.
+ """
+ result: List[Tuple[float, str]] = []
+ matches = list(timestamp_pattern.finditer(lyric))
+
+ for i, match in enumerate(matches):
+ start_idx = match.end()
+ if i + 1 < len(matches):
+ end_idx = matches[i + 1].start()
+ else:
+ end_idx = len(lyric)
+
+ text = lyric[start_idx:end_idx].strip()
+ minutes = int(match.group(1))
+ seconds = int(match.group(2))
+ ms_str = match.group(3) if match.group(3) else "0"
+
+ if len(ms_str) == 2:
+ fractional_seconds = int(ms_str) / 100.0
+ elif len(ms_str) == 3:
+ fractional_seconds = int(ms_str) / 1000.0
+ else:
+ fractional_seconds = int(ms_str) / 1000.0 if ms_str else 0.0
+
+ total_seconds = minutes * 60 + seconds + fractional_seconds
+ result.append((total_seconds, text))
+ return result
+
+
+def extract_section_from_text(text: str) -> Optional[str]:
+ """
+ Relaxed rule: As long as [] contains English words (≥2 letters), return the entire bracket content as-is.
+ """
+ # Match if English words appear in the first []
+ m = re.search(r'\[([A-Za-z][A-Za-z0-9\s\-\(\)]*)\]', text)
+ if m:
+ return m.group(1).strip() # Remove leading and trailing spaces
+ return None
+
+def format_section_label(sec_name: str) -> str:
+ """Keep original spaces, only trim leading and trailing whitespace."""
+ return sec_name.strip()
+
+
+def normalize_section_name(sec_name: str) -> str:
+ """
+ Normalize section name for matching:
+ - Remove all spaces
+ - Convert to lowercase
+ - Remove trailing digits (if any)
+ """
+ # Remove all spaces
+ normalized = sec_name.replace(" ", "").lower()
+ # Remove trailing digits (e.g., "verse1" -> "verse", "chorus1" -> "chorus")
+ normalized = re.sub(r"\d+$", "", normalized)
+ return normalized
+
+
+def clean_desc(desc: str) -> str:
+ """
+ Clean desc field:
+ 1. If starts with [desc], remove it
+ 2. If both ends are brackets, remove brackets
+ """
+ if not desc:
+ return desc
+
+ desc = desc.strip()
+
+ # If starts with [desc], remove it
+ if desc.startswith("[desc]"):
+ desc = desc[6:].strip()
+
+ # If both ends are brackets, remove brackets
+ if desc.startswith("[") and desc.endswith("]"):
+ desc = desc[1:-1].strip()
+
+ return desc
+
+
+def build_desc_map(desc_path_or_dir: str) -> Dict[Tuple[str, int], List[dict]]:
+ """
+ Support passing a single jsonl file or a directory containing multiple jsonl files.
+ In directory scenario, files are sorted by name and read sequentially, later records with the same key will overwrite earlier ones.
+ """
+ mapping: Dict[Tuple[str, int], List[dict]] = {}
+
+ paths: List[str] = []
+ if os.path.isdir(desc_path_or_dir):
+ for name in sorted(os.listdir(desc_path_or_dir)):
+ if name.endswith(".jsonl"):
+ paths.append(os.path.join(desc_path_or_dir, name))
+ else:
+ paths.append(desc_path_or_dir)
+
+ for path in paths:
+ with open(path, "r", encoding="utf-8") as f:
+ for line in f:
+ try:
+ obj = json.loads(line)
+ except Exception:
+ with open("error.txt", 'a', encoding='utf-8') as error_file:
+ error_file.write(line + "\n")
+ continue
+
+ song_id = obj.get("song_id")
+ track_idx = obj.get("track_index", 0)
+
+ # Change: Put the entire object
+ # sections = obj.get("sections", [])
+ # mapping[(song_id, track_idx)] = sections
+ mapping[(song_id, track_idx)] = obj
+ return mapping
+
+
+def extract_suffix_from_desc_path(desc_path: str, fallback_idx: int) -> str:
+ """
+ Extract suffix from desc filename for output file naming.
+ Rule: Try to extract from filename pattern "3_block__(cn|en)_desc.jsonl".
+ If no match, use fallback_idx (starting from 0) converted to string.
+ """
+ fname = os.path.basename(desc_path)
+ m = re.search(r"3_block_([^_]+)_(?:cn|en)_desc\.jsonl", fname, re.IGNORECASE)
+ if m:
+ return m.group(1)
+ return str(fallback_idx)
+
+
+def extract_suffix_num(desc_path: str, fallback_idx: int) -> int:
+ """
+ Extract sortable numeric suffix for processing desc files in numeric order.
+ Use fallback_idx if parsing fails.
+ """
+ suffix = extract_suffix_from_desc_path(desc_path, fallback_idx)
+ try:
+ return int(suffix)
+ except ValueError:
+ return fallback_idx
+
+
+def build_messages(item: dict, obj: dict, audio: torch.Tensor) -> Optional[dict]:
+ """
+ Use timestamps from meta to split audio, desc_sections provide desc and text.
+ """
+ if not obj:
+ return None
+ desc_sections = obj.get("sections", [])
+
+ # Parse timestamps from meta's lyrics
+ lyrics_raw = item.get("lyrics", "") or ""
+ meta_ts_list = parse_lyric_with_timestamps(lyrics_raw)
+ if not meta_ts_list:
+ return None
+
+ total_seconds = audio.shape[0] / float(TOKEN_PER_SECOND)
+
+ # Sort desc_sections (by startS, for matching)
+ desc_sections = sorted(desc_sections, key=lambda x: x.get("startS", 0.0))
+
+ # Get middle sections from desc_sections (skip first Intro and last Outro)
+ # Intro and Outro are not in meta's lyrics, need separate handling
+ # Middle sections are matched in order
+ middle_desc_sections = desc_sections[1:-1] if len(desc_sections) > 2 else desc_sections[1:] if len(desc_sections) > 1 else []
+
+ # Identify sections from meta timestamps and build mapping
+ # One section may contain multiple lyric lines, need to merge
+ section_timestamps: List[Tuple[str, float, float, str, str]] = [] # (section_name, start_s, end_s, text, desc)
+ current_section: Optional[Tuple[str, float, str]] = None # (section_name, start_s, accumulated_text)
+ desc_idx = 0 # For matching desc in order (from middle_desc_sections)
+
+ for idx, (start_s, text) in enumerate(meta_ts_list):
+ # Extract section name
+ section_name = extract_section_from_text(text)
+
+ if section_name:
+ # Encountered new section label
+ # First save previous section (if any)
+ if current_section:
+ # Determine end time of previous section (current timestamp)
+ prev_sec_name, prev_start_s, prev_text = current_section
+ # Only remove timestamp, keep all other content (including line breaks, section labels, etc.)
+ clean_prev_text = re.sub(r"\[([0-9]{1,2}):([0-9]{1,2})(?:[.:]([0-9]{1,3}))?\]", "", prev_text)
+
+ # Get desc in order (match from middle sections in order)
+ prev_desc = ""
+ if desc_idx < len(middle_desc_sections):
+ prev_desc = clean_desc(middle_desc_sections[desc_idx].get("desc", ""))
+ desc_idx += 1
+
+ section_timestamps.append((prev_sec_name, prev_start_s, start_s, clean_prev_text, prev_desc))
+
+ # Start new section
+ current_section = (section_name, start_s, text)
+ else:
+ # No section label, belongs to subsequent lines of current section
+ if current_section:
+ sec_name, sec_start, sec_text = current_section
+ # Preserve line breaks, connect with line breaks
+ current_section = (sec_name, sec_start, sec_text + "\n" + text)
+ # If no current_section, skip (might be empty line before Intro)
+
+ # Process last section
+ # Check if last timestamp is empty text (indicates end marker)
+ outro_start_s: Optional[float] = None
+ if meta_ts_list and not meta_ts_list[-1][1].strip():
+ # Last timestamp is empty text, indicates end marker
+ outro_start_s = meta_ts_list[-1][0]
+
+ if current_section:
+ sec_name, sec_start, sec_text = current_section
+ # If last timestamp is empty text, last section's end time should be this timestamp
+ # Otherwise use total duration
+ if outro_start_s is not None:
+ end_s = outro_start_s
+ else:
+ end_s = total_seconds
+
+ # Only remove timestamp, keep all other content
+ clean_text = re.sub(r"\[([0-9]{1,2}):([0-9]{1,2})(?:[.:]([0-9]{1,3}))?\]", "", sec_text)
+
+ # Get desc in order (match from middle sections in order)
+ desc = ""
+ if desc_idx < len(middle_desc_sections):
+ desc = clean_desc(middle_desc_sections[desc_idx].get("desc", ""))
+ desc_idx += 1
+
+ section_timestamps.append((sec_name, sec_start, end_s, clean_text, desc))
+
+ # Check if counts match (only check middle sections, excluding Intro and Outro)
+ if desc_idx != len(middle_desc_sections):
+ _log_warning(f"⚠️ Warning: Section count mismatch! meta has {len(section_timestamps)} sections, desc has {len(middle_desc_sections)} middle sections (excluding Intro and Outro) (song_id: {item.get('song_id')}, track_index: {item.get('track_index')})")
+
+ if not section_timestamps:
+ return None
+
+ # Intro segment: from 0 to first section's start time
+ # Intro's desc should have been obtained in sequential matching, but Intro itself is not in meta's lyrics
+ # So need to get from desc_sections' first section (usually Intro)
+ first_section_start = section_timestamps[0][1] if section_timestamps else total_seconds
+ intro_desc = ""
+ if desc_sections and desc_sections[0].get("section", "").lower() == "intro":
+ intro_desc = clean_desc(desc_sections[0].get("desc", ""))
+
+ # Change: Use desc tag
+ # tag = item.get("tag", "")
+ song_id:str = obj.get("song_id", "")
+ omni_tag = obj.get("omni", "")
+ style_tag = obj.get("style", "")
+
+ if song_id.find("cn") != -1:
+ # Chinese songs use omni directly
+ tag = omni_tag
+ else:
+ # English songs compare omni / style
+ style_sim = obj.get("style_sim", 0)
+ omni_sim = obj.get("omni_sim", 0)
+ try:
+ tag = omni_tag if omni_sim > style_sim else style_tag
+ except Exception as e:
+ # If sim score is invalid, default to omni
+ tag = omni_tag
+ print(f"Error: {song_id}, {e}")
+
+ # Change: English
+ intro_prompt = (
+ f"Please generate a song in the following style:{tag}.\n"
+ "Next, I will tell you the requirements and lyrics for the song fragment to be generated, section by section.\n"
+ f"[Intro][desc:{intro_desc}]"
+ )
+
+ messages: List[dict] = []
+ messages.append({"role": "user", "content": intro_prompt})
+ messages.append(
+ {
+ "role": "assistant",
+ "content": get_token_slice(audio, 0.0, first_section_start),
+ }
+ )
+
+ # Process segment by segment (using timestamps from meta)
+ for idx, (sec_name, start_s, end_s, text, desc) in enumerate(section_timestamps):
+ # user content: [Section dsec : desc][Section lyrics : ...] (preserve original spaces)
+ label = format_section_label(sec_name)
+ content = f"[{label}]"
+ content += f"[desc:{desc}]"
+ lyrics_text = re.sub(r'^\[.*?\]\s*\n?', '', text.strip())
+ if lyrics_text:
+ content += f"[lyrics:\n{lyrics_text}]"
+ messages.append({"role": "user", "content": content})
+ messages.append(
+ {
+ "role": "assistant",
+ "content": get_token_slice(audio, start_s, end_s),
+ }
+ )
+
+ # If last timestamp is empty text, add Outro segment
+ # Outro's desc should be obtained from desc_sections' last section (usually Outro)
+ if outro_start_s is not None and outro_start_s < total_seconds:
+ outro_desc = ""
+ if desc_sections and desc_sections[-1].get("section", "").lower() == "outro":
+ outro_desc = clean_desc(desc_sections[-1].get("desc", ""))
+
+ messages.append({"role": "user", "content": f"[Outro][desc:{outro_desc}]"})
+ messages.append(
+ {
+ "role": "assistant",
+ "content": get_token_slice(audio, outro_start_s, total_seconds),
+ }
+ )
+
+ sample = {
+ "song_id": item.get("song_id"),
+ "track_index": item.get("track_index"),
+ "src_path": item.get("src_path"),
+ "tag": item.get("tag"),
+ "lang": item.get("lang"),
+ "duration": item.get("duration"),
+ "messages": messages,
+ }
+ return sample
+
+
+def process_with_desc(desc_path: str, suffix: str) -> None:
+ """
+ Generate output files using a single desc file (messages-only / meta-only).
+ Does not generate main file.
+ """
+ desc_map = build_desc_map(desc_path)
+
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
+ # messages-only naming: remove suffix, only keep block number (e.g., ..._8000.jsonl)
+ out_msg = os.path.join(OUTPUT_DIR, f"{OUTPUT_BASENAME}_{suffix}.jsonl")
+ out_meta = os.path.join(OUTPUT_DIR, f"{OUTPUT_BASENAME}_{suffix}_meta_only.jsonl")
+
+ dataset = load_jsonl(META_FILE)
+ total = len(dataset)
+ kept = 0
+ skipped = 0
+
+ for path in [out_msg, out_meta]:
+ if os.path.exists(path):
+ # Change: use past
+ already_msg = load_jsonl(out_msg)
+ already_meta = load_jsonl(out_meta)
+ assert len(already_meta) == len(already_msg)
+ dataset = dataset[len(already_msg):]
+
+ with open(out_msg, "a", encoding="utf-8") as fout_msg, \
+ open(out_meta, "a", encoding="utf-8") as fout_meta:
+
+ for item in tqdm(dataset, desc=f"Processing meta_suno_{LANG}.jsonl (desc: {suffix})"):
+ key = (item.get("song_id"), item.get("track_index", 0))
+
+ obj = desc_map.get(key)
+
+ if not obj:
+ skipped += 1
+ continue
+
+ pt_path = infer_pt_path(item.get("src_path", ""))
+ if not pt_path or not os.path.exists(pt_path):
+ skipped += 1
+ continue
+
+ audio = load_pt(pt_path)
+ sample = build_messages(item, obj, audio)
+ if not sample:
+ skipped += 1
+ continue
+
+ # Write messages-only (remove _messages_only suffix)
+ messages_only = {"messages": sample.get("messages", [])}
+ fout_msg.write(json.dumps(messages_only, ensure_ascii=False) + "\n")
+ # Write meta-only
+ meta_only = {k: v for k, v in sample.items() if k != "messages"}
+ fout_meta.write(json.dumps(meta_only, ensure_ascii=False) + "\n")
+ kept += 1
+
+ print(f"✅ messages-only: {out_msg}")
+ print(f"✅ meta-only: {out_meta}")
+ print(f"Total {total}, kept {kept}, skipped {skipped}")
+
+
+def convert_train_valid():
+ # Collect desc files
+ if not os.path.isdir(DESC_DIR):
+ print(f"⚠️ DESC_DIR does not exist: {DESC_DIR}")
+ return
+
+ unsorted_files = [
+ os.path.join(DESC_DIR, name)
+ for name in os.listdir(DESC_DIR)
+ if name.endswith(".jsonl")
+ ]
+
+ # Sort by extracted numeric suffix, ensure 8000 comes before 16000
+ desc_files = sorted(
+ unsorted_files,
+ key=lambda p: extract_suffix_num(p, 0),
+ )
+ if not desc_files:
+ print(f"⚠️ No desc files found: {DESC_DIR}")
+ return
+
+ # Change
+ # for idx, desc_path in enumerate(desc_files):
+ # suffix = extract_suffix_from_desc_path(desc_path, idx)
+ # process_with_desc(desc_path, suffix)
+
+ for desc_path in desc_files:
+ name = os.path.splitext(os.path.basename(desc_path))[0]
+ if name.endswith(LANG):
+ process_with_desc(desc_path, name)
+
+# assistant needs, but content is empty
+# Then each section inside has 3 descs, you take the one corresponding to the maximum value
+# omni also has two, take the larger one (style is not used)
+
+from meta_phonemes import _get_lyrics, _trans_sentences
+
+def _form_section(section:dict, en:bool) -> str:
+ """Process inside a section, determine which desc to select"""
+ # Segment label
+ section_tag = f"[{section['section']}]"
+ # Segment description
+ descs = [section['desc1'], section['desc2'], section['desc3']]
+ sims = [section['desc1_sim'], section['desc2_sim'], section['desc3_sim']]
+ max_sim = max(sims)
+ max_index = sims.index(max_sim)
+ desc:str = descs[max_index]
+
+ if desc == "音频过短": # "Audio too short" - keep original Chinese as it's part of data processing logic
+ desc = "[desc:]"
+ else:
+ DESC_START = "[desc] "
+ if desc.startswith(DESC_START):
+ desc = desc[len(DESC_START):] + "]"
+ desc = "[desc:" + desc[1:]
+
+ # Lyrics & phonemes
+ text:str = section['text']
+ if text.find(']') != -1:
+ # Remove preceding segment label
+ start = text.rfind(']')
+ text = text[start+1:]
+ if len(text.strip()) == 0:
+ # Opening segment has no lyrics/phonemes
+ lyrics = ""
+ phonemes = ""
+ else:
+ if en:
+ lyrics = "[lyrics:\n" + clean_newlines(text) + "]"
+ else:
+ lyrics = "[lyrics:" + text + "]"
+
+ sentences, lyrics = _get_lyrics(lyrics)
+ phonemes = _trans_sentences(sentences)
+ return section_tag + desc + lyrics + phonemes
+
+def _form_intro(ele:dict) -> str:
+ """Process to get multi-turn dialogue opening"""
+ omni1 = ele['omni1']
+ omni2 = ele['omni2']
+ omni_sim1 = ele['omni1_sim']
+ omni_sim2 = ele['omni2_sim']
+ tag = omni1 if omni_sim1 > omni_sim2 else omni2
+ return (
+ f"Please generate a song in the following style:{tag}.\n"
+ "Next, I will tell you the requirements and lyrics for the song fragment to be generated, section by section.\n"
+ )
+
+def convert_test():
+ path = "filter.jsonl"
+ dataset = load_jsonl(path)
+ save_path = "messages.jsonl"
+ with open(save_path, 'w', encoding='utf-8') as file:
+ for ele in tqdm(dataset, desc=f"Converting {path}"):
+ messages = []
+ # Segment processing
+ sections = ele['sections']
+ id:str = ele['song_id']
+ english = id.startswith("suno_test_en")
+ for section in sections:
+ content = _form_section(section, english)
+ messages += [
+ {
+ "role": "user",
+ "content": content
+ },
+ {
+ "role": "assistant",
+ "content": ""
+ }
+ ]
+ # Initial addition
+ first_content = messages[0]['content']
+ intro = _form_intro(ele)
+ messages[0]['content'] = intro + first_content
+
+ data = {"messages": messages}
+ json.dump(data, file, ensure_ascii=False)
+ file.write("\n")
+
+if __name__ == "__main__":
+ convert_test()
\ No newline at end of file
diff --git a/data_pipeline/meta_process/convert_segments.py b/data_pipeline/meta_process/convert_segments.py
new file mode 100644
index 0000000000000000000000000000000000000000..4109bfaaabfc1b567c67eb6dbba850ebb13bb796
--- /dev/null
+++ b/data_pipeline/meta_process/convert_segments.py
@@ -0,0 +1,93 @@
+import os
+import json
+from tqdm import tqdm
+from my_tool import path_join, load_json
+from concurrent.futures import ProcessPoolExecutor, as_completed
+
+def _check_label(label:str, max_length:int=30) -> bool:
+ """Check if label is valid (non-empty, not timestamp, not long lyrics)"""
+ length = len(label.strip())
+ if length == 0:
+ # print("Error Label: Empty")
+ return False
+ if length > max_length:
+ # print(f"Error Label: Words - {label}")
+ return False
+ if label.find(":") != -1 and label.find(".") != -1:
+ # Considered as timestamp
+ # print(f"Error Label: Timestamp - {label}")
+ return False
+ return True
+
+def _convert_one(path:str):
+ """Segment a song's metadata, remove redundant content"""
+ data = load_json(path)
+ dir = os.path.dirname(path)
+ name = f"{data['song_id']}_{data['track_index']}.mp3"
+ path = path_join(dir, name)
+ new_data = {
+ "path": path,
+ "song_id": data['song_id'],
+ "segments": []
+ }
+ words_info = data['timestamped_lyrics']['alignedWords'] # Sentence-by-sentence information
+ seg_info = None
+
+ empty_head = False
+ for id, word_info in enumerate(words_info):
+ if not word_info['success']:
+ continue
+ word:str = word_info['word']
+
+ label = ""
+ if word.startswith('['):
+ if seg_info is not None:
+ new_data['segments'].append(seg_info)
+ label_end = word.find(']')
+ label = word[1:label_end]
+ if not _check_label(label):
+ label = ""
+
+ if label != "":
+ seg_info = {
+ "start": word_info['startS'],
+ "end": 0,
+ "label": label,
+ "word": word[label_end+2:]
+ }
+ elif seg_info is not None:
+ seg_info['end'] = word_info['endS']
+ seg_info['word'] += word
+ else:
+ empty_head = True
+ if seg_info is not None:
+ seg_info['end'] = word_info['endS']
+ seg_info['word'] += word
+ else:
+ empty_head = True
+ if empty_head:
+ # print(f"Empty Head, segment: {len(new_data['segments'])}, path: {path}")
+ pass
+ return new_data
+
+# ===== External Interface =====
+
+def get_convert_segments(data_dir:str, save_path:str, max_workers:int=10):
+ paths = []
+ for name in tqdm(os.listdir(data_dir), desc="Getting the JSON Paths"):
+ if name.endswith(".json"):
+ path = path_join(data_dir, name)
+ paths.append(path)
+
+ dataset = []
+ with open(save_path, 'w', encoding='utf-8') as file:
+ with ProcessPoolExecutor(max_workers=max_workers) as executor:
+ futures = [executor.submit(_convert_one, path) for path in paths]
+ with tqdm(total=len(futures), desc="Converting Segments") as pbar:
+ for future in as_completed(futures):
+ result = future.result()
+ dataset.append(result)
+ json.dump(result, file, ensure_ascii=False)
+ file.write("\n")
+ pbar.update(1)
+ return dataset
\ No newline at end of file
diff --git a/data_pipeline/meta_process/evaluate_polyphones.py b/data_pipeline/meta_process/evaluate_polyphones.py
new file mode 100644
index 0000000000000000000000000000000000000000..814b0c18579d4b5861026cbd8a29afb59ceeb25f
--- /dev/null
+++ b/data_pipeline/meta_process/evaluate_polyphones.py
@@ -0,0 +1,62 @@
+import re
+import jieba
+from tqdm import tqdm
+from my_tool import load_jsonl, save_json, load_json
+from pypinyin import pinyin, Style, load_phrases_dict
+from pypinyin_dict.phrase_pinyin_data import cc_cedict
+
+cc_cedict.load()
+re_special_pinyin = re.compile(r'^(n|ng|m)$')
+
+# Add
+reference = load_json("poly_correct.json")
+load_phrases_dict(reference)
+
+def _filter(dataset:list[dict]):
+ """Filter non-polyphone characters in test set"""
+ new_dataset = []
+ for ele in tqdm(dataset, desc="Filtering"):
+ pos = ele['pos']
+ sentence = ele['sentence']
+ word = sentence[pos]
+ phones = pinyin(word, style=Style.NORMAL, heteronym=True)[0]
+ if len(phones) > 1:
+ new_dataset.append(ele)
+ print(f"Filter non polyphone, {len(dataset)} -> {len(new_dataset)}")
+ return new_dataset
+
+def evaluate_polyphones(dataset:list[dict], save_fail:str):
+ """Check pinyin processing accuracy for polyphones"""
+ dataset = _filter(dataset)
+ total = len(dataset)
+ right = 0
+ correct_dic = {}
+ for ele in tqdm(dataset):
+ pos = ele['pos']
+ phone = ele['phone']
+ sentence = ele['sentence']
+ seg_list = jieba.cut(sentence)
+ length = 0
+ for seg in seg_list:
+ if length <= pos and length + len(seg) > pos:
+ delta = pos - length # Position in segment
+ break
+ length += len(seg)
+ pred_phones = pinyin(seg, style=Style.NORMAL)
+ pred_phone = pred_phones[delta][0]
+ if pred_phone == phone or pred_phone.endswith("v"):
+ right += 1
+ elif len(pred_phones) > 1:
+ # Corrected pronunciation (only meaningful for phrases)
+ pred_phones[delta] = [phone]
+ correct_dic[seg] = pred_phones
+ print(f"Acc: {(right / total):.2f}")
+
+ origin_dic = load_json(save_fail)
+ merge_dic = origin_dic | correct_dic
+ save_json(merge_dic, save_fail)
+
+if __name__ == "__main__":
+ path = "polyphones.jsonl"
+ dataset = load_jsonl(path)
+ evaluate_polyphones(dataset, "poly_correct.json")
\ No newline at end of file
diff --git a/data_pipeline/meta_process/filter.py b/data_pipeline/meta_process/filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f63e6dc6aeed25778368c36ea28cfd0c40a6186
--- /dev/null
+++ b/data_pipeline/meta_process/filter.py
@@ -0,0 +1,46 @@
+import librosa
+from tqdm import tqdm
+from concurrent.futures import ProcessPoolExecutor, as_completed
+
+def filter_lang(dataset:list[dict], langs:list[str]) -> list[dict]:
+ """Filter dataset, only keep items with lang tag matching specified languages"""
+ new_dataset = []
+ for ele in tqdm(dataset, desc="Filtering Lang"):
+ if 'lang' not in ele or ele['lang'] not in langs :
+ continue
+ new_dataset.append(ele)
+ print(f"filter: {len(dataset)} -> {len(new_dataset)}")
+ return new_dataset
+
+def _check_duration(ele, lower_bound, upper_bound):
+ """Subprocess task: Check if audio duration is within range"""
+ duration = librosa.get_duration(filename=ele['path'])
+ if lower_bound != -1 and duration < lower_bound:
+ return None
+ if upper_bound != -1 and duration > upper_bound:
+ return None
+ return ele
+
+def filter_length(dataset:list[dict], lower_bound:int=-1, upper_bound:int=-1, max_worker:int=4) -> list[dict]:
+ """Filter dataset, only keep items with length in [lower_bound, upper_bound], if set to -1 then no limit on that side"""
+ new_dataset = []
+ with ProcessPoolExecutor(max_workers=max_worker) as executor:
+ futures = [
+ executor.submit(_check_duration, ele, lower_bound, upper_bound)
+ for ele in dataset
+ ]
+ with tqdm(total=len(futures), desc="Filtering Length") as pbar:
+ for future in as_completed(futures):
+ result = future.result()
+ if result is not None:
+ new_dataset.append(result)
+ pbar.update(1)
+ # for ele in tqdm(dataset, desc="Filtering Length"):
+ # duration = librosa.get_duration(filename=ele['path'])
+ # if lower_bound != -1 and duration < lower_bound:
+ # continue
+ # if upper_bound != -1 and duration > upper_bound:
+ # continue
+ # new_dataset.append(ele)
+ print(f"filter: {len(dataset)} -> {len(new_dataset)}")
+ return new_dataset
\ No newline at end of file
diff --git a/data_pipeline/meta_process/main.py b/data_pipeline/meta_process/main.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad377faf5ac954e072203b7d3cea55ce8134fac8
--- /dev/null
+++ b/data_pipeline/meta_process/main.py
@@ -0,0 +1,77 @@
+from my_tool import (
+ load_json,
+ load_jsonl,
+ load_txt,
+ save_jsonl,
+ format_meta,
+ pure_name,
+ BASE_DIR,
+ compose_analyze,
+ get_sample,
+ get_field_suno,
+ tags_analyze,
+ find_json,
+ show_dir,
+ convert_mp3,
+ tar_dir,
+ tar_size_check,
+ clean_newlines,
+ dict_sort_print,
+)
+from meta_lang import load_asr_model, get_lang_meta
+from meta_tags import load_tag_model, get_tags_meta
+from meta_endpoints import get_endpoints_meta
+from meta_phonemes import get_phonemes_meta
+from filter import filter_lang, filter_length
+from convert_convs import get_convert_convs
+from convert_segments import get_convert_segments
+from convert_lyrics import get_convert_lyrics, get_match_music
+
+def pipeline():
+ import os
+ dir = "suno_batch"
+ name = pure_name(dir)
+ save_dir = BASE_DIR / f"data/{name}"
+
+ # Initialize paths (only once)
+ os.makedirs(save_dir, exist_ok=True)
+ raw_path = os.path.join(save_dir, "raw.jsonl")
+ if os.path.exists(raw_path):
+ dataset = load_jsonl(raw_path)
+ else:
+ dataset = format_meta(dir)
+ save_jsonl(dataset, raw_path)
+
+ # Length filtering
+ dataset = dataset[:1000]
+ max_workers = 10
+ dataset = filter_length(dataset, 120, 360, max_workers)
+
+ # Language tagging
+ # lang_bs = 8
+ # model = load_asr_model(lang_bs)
+ # lang_path = os.path.join(save_dir, "meta_lang.jsonl")
+ # dataset = get_lang_meta(model, dataset, lang_bs, lang_path)
+
+ # Language filtering
+ # dataset = filter_lang(dataset, ['zh', 'en'])
+
+ # Style tagging
+ tag_bs = 4
+ tag_path = os.path.join(save_dir, "meta_tags.jsonl")
+ model, processor = load_tag_model()
+ prompt_path = BASE_DIR / "prompts/new_tags.md"
+ prompt = load_txt(prompt_path)
+ get_tags_meta(model, processor, dataset, prompt, tag_bs, tag_path)
+
+def repeat(func):
+ while True:
+ try:
+ func()
+ break
+ except Exception as e:
+ print(f"Error: {e}")
+ continue
+
+if __name__ == "__main__":
+ repeat(pipeline)
\ No newline at end of file
diff --git a/data_pipeline/meta_process/meta_endpoints.py b/data_pipeline/meta_process/meta_endpoints.py
new file mode 100644
index 0000000000000000000000000000000000000000..8412aecb924c57edf05996fe641070fc510fc20d
--- /dev/null
+++ b/data_pipeline/meta_process/meta_endpoints.py
@@ -0,0 +1,118 @@
+import json
+import webrtcvad
+import collections
+from tqdm import tqdm
+from my_tool import dup_remove
+from pydub import AudioSegment
+from concurrent.futures import ProcessPoolExecutor, as_completed
+
+def _frame_generator(frame_duration_ms, audio, sample_rate):
+ """Split audio into frames"""
+ bytes_per_sample = 2
+ frame_size = int(sample_rate * frame_duration_ms / 1000.0) * bytes_per_sample
+ offset = 0
+ timestamp = 0.0
+ frame_duration = frame_duration_ms / 1000.0
+ while offset + frame_size < len(audio):
+ yield audio[offset:offset + frame_size], timestamp
+ timestamp += frame_duration
+ offset += frame_size
+
+def _vad_collector(sample_rate, frame_duration_ms, padding_duration_ms, vad, frames):
+ """Merge continuous vocal segments based on webrtcvad"""
+ num_padding_frames = int(padding_duration_ms / frame_duration_ms)
+ ring_buffer = collections.deque(maxlen=num_padding_frames)
+
+ triggered = False
+ speech_segments = []
+
+ for frame_bytes, timestamp in frames:
+ is_speech = vad.is_speech(frame_bytes, sample_rate)
+
+ if not triggered:
+ ring_buffer.append((frame_bytes, timestamp, is_speech))
+ num_voiced = len([f for f in ring_buffer if f[2]])
+ if num_voiced > 0.9 * ring_buffer.maxlen:
+ triggered = True
+ start_time = ring_buffer[0][1]
+ ring_buffer.clear()
+ else:
+ ring_buffer.append((frame_bytes, timestamp, is_speech))
+ num_unvoiced = len([f for f in ring_buffer if not f[2]])
+ if num_unvoiced > 0.9 * ring_buffer.maxlen:
+ end_time = timestamp + (frame_duration_ms / 1000.0)
+ speech_segments.append((start_time, end_time))
+ triggered = False
+ ring_buffer.clear()
+
+ # If still in speech state at the end, close the last segment
+ if triggered:
+ end_time = timestamp + (frame_duration_ms / 1000.0)
+ speech_segments.append((start_time, end_time))
+
+ return speech_segments
+
+def _one_process(path):
+ """Detect vocal segments in an audio"""
+ # 1. Compress audio
+ audio = AudioSegment.from_file(path)
+ audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
+ sample_rate = audio.frame_rate
+ audio_data = audio.raw_data
+
+ # 2. Initialize VAD (0-3, higher value means more likely to be considered as speech)
+ vad = webrtcvad.Vad(0)
+
+ # 3. Generate frames
+ frames = list(_frame_generator(30, audio_data, sample_rate))
+
+ # 4. Detect vocal intervals
+ segments = _vad_collector(sample_rate, 30, 300, vad, frames)
+
+ # If no vocals, set both start and end to -1
+ if len(segments) == 0:
+ return {
+ "start": -1,
+ "end": -1,
+ "segments": [],
+ }
+
+ return {
+ "start": segments[0][0],
+ "end": segments[-1][1],
+ "segments": segments,
+ }
+
+# ===== External Interface =====
+
+def get_endpoints_meta(dataset:list[dict], save_path:str, max_workers:int=4, save_middle:bool=True):
+ """
+ Add endpoint labels to each audio in dataset (mainly for separated vocal audio)
+ - Requires 'path' field in each data entry in dataset
+ - Write fields: endpoints.start/end
+ - Write to save_path in real-time
+ - save_middle determines whether to record each sentence's endpoints to save.segments field
+ """
+ dataset = dup_remove(dataset, save_path, 'path', 'endpoints')
+ new_dataset = []
+ with open(save_path, 'a', encoding='utf-8') as file:
+ with ProcessPoolExecutor(max_workers=max_workers) as executor:
+ futures = {executor.submit(_one_process, ele['path']): ele for ele in dataset}
+ for future in tqdm(as_completed(futures), desc="Detecting endpoints"):
+ ele = futures[future] # Get original element
+ try:
+ result = future.result()
+ ele['endpoints'] = {
+ "start": result['start'],
+ "end": result['end']
+ }
+ if save_middle:
+ if "save" not in ele:
+ ele['save'] = {}
+ ele['save']['segments'] = result['segments']
+ new_dataset.append(ele)
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+ except Exception:
+ pass
+ return new_dataset
diff --git a/data_pipeline/meta_process/meta_lang.py b/data_pipeline/meta_process/meta_lang.py
new file mode 100644
index 0000000000000000000000000000000000000000..78bfbaca461b7f3825b3134442e0769fea6341e4
--- /dev/null
+++ b/data_pipeline/meta_process/meta_lang.py
@@ -0,0 +1,125 @@
+import os
+import json
+from tqdm import tqdm
+from funasr import AutoModel
+from typing import List, Tuple
+from collections import defaultdict
+from my_tool import get_free_gpu, dup_remove
+
+# ===== ASR Model (External) =====
+
+def load_asr_model(bs:int):
+ """Load lyric recognition model"""
+ device = f"cuda:{get_free_gpu()}"
+ model = AutoModel(
+ model="iic/SenseVoiceSmall",
+ trust_remote_code=True,
+ vad_model="fsmn-vad",
+ vad_kwargs={"max_single_segment_time": 30000},
+ device=device,
+ batch_size=bs,
+ max_batch_size=bs * 2,
+ )
+ print(f"Using {device}")
+ return model
+
+# ===== ASR Parsing =====
+
+def _struct2lang(text: str) -> str:
+ """Extract language identifier from structured representation"""
+ start = text.find("|")
+ text = text[start+1:]
+ end = text.find("|")
+ return text[:end]
+
+def _struct2lyrics(text:str) -> str:
+ start = text.rfind(">")
+ lyric = text[start+1:]
+ return lyric
+
+def _struct_parse(text: str) -> Tuple[List[str], List[str]]:
+ """Split structured information sentence by sentence and then parse"""
+ texts = text.split(" <")
+ langs, lyrics = [], []
+ for ele in texts:
+ langs.append(_struct2lang(ele))
+ lyrics.append(_struct2lyrics(ele))
+ return langs, lyrics
+
+# ===== ASR Processing =====
+
+def _batch_asr(model, paths:List[str]) -> List[Tuple[List[str], List[str]]]:
+ """Batch speech recognition"""
+ outputs = model.generate(
+ input=paths,
+ cache=None,
+ language="auto",
+ use_itn=True,
+ batch_size_s=240,
+ merge_vad=True,
+ merge_length_s=15,
+ )
+ return [_struct_parse(output['text']) for output in outputs]
+
+# ===== Overall Language Detection =====
+
+def _lang_decide(lang_lyrics:list[tuple[str, str]], val_limit:int=5, word_limit=5) -> str:
+ """
+ Determine language based on sentence recognition information
+ - val_limit: Only count if there are at least this many sentences
+ - word_limit: Only count if a sentence has at least this many words
+ """
+ lang_count = defaultdict(int)
+ seg_langs, seg_lyrics = lang_lyrics
+ for lang, lyric in zip(seg_langs, seg_lyrics):
+ lyric = lyric.strip()
+ if lang == "en":
+ words_num = len(lyric.split())
+ else:
+ words_num = len(lyric)
+ if words_num >= word_limit:
+ lang_count[lang] += 1
+ langs = []
+ for lang, count in lang_count.items():
+ if count >= val_limit:
+ langs.append(lang)
+ if len(langs) == 0:
+ return "pure"
+ elif len(langs) == 1:
+ return langs[0]
+ else:
+ return "multi: " + " ".join(langs)
+
+# ===== External Interface =====
+
+def get_lang_meta(model, dataset:list[dict], bs:int, save_path:str, save_middle:bool=True) -> list[dict]:
+ """
+ Perform language recognition on a JSONL dataset
+ - Final language tag is saved to lang field, types include zh, en, ja, ko, yue, pure, multi, etc.
+ - save_middle determines whether to save intermediate recognition results (sentence languages and lyrics) to save.langs, save.lyrics
+ """
+ data_num = len(dataset)
+ dataset = dup_remove(dataset, save_path, 'path', 'lang')
+ new_dataset = []
+ with open(save_path, 'a', encoding='utf-8') as file:
+ for i in tqdm(range(0, data_num, bs), desc="Lang detecting"):
+ batch = []
+ paths = []
+ for ele in dataset[i:i+bs]:
+ path = ele['path']
+ if os.path.exists(path):
+ batch.append(ele)
+ paths.append(path)
+ lang_lyrics_lis = _batch_asr(model, paths)
+ langs = [_lang_decide(lang_lyrics) for lang_lyrics in lang_lyrics_lis]
+ for ele, (seg_langs, seg_lyrics), lang in zip(batch, lang_lyrics_lis, langs):
+ ele['lang'] = lang
+ if save_middle:
+ if 'save' not in ele:
+ ele['save'] = {}
+ ele['save']['langs'] = seg_langs
+ ele['save']['lyrics'] = seg_lyrics
+ new_dataset.append(ele)
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+ return new_dataset
\ No newline at end of file
diff --git a/data_pipeline/meta_process/meta_phonemes.py b/data_pipeline/meta_process/meta_phonemes.py
new file mode 100644
index 0000000000000000000000000000000000000000..f69a9393827c0a0698fc92da609b6e30e17fe35d
--- /dev/null
+++ b/data_pipeline/meta_process/meta_phonemes.py
@@ -0,0 +1,283 @@
+import os
+import re
+import json
+import copy
+import jieba
+import string
+from tqdm import tqdm
+from g2p_en import G2p
+from my_tool import BASE_DIR, load_json
+from pypinyin import pinyin, Style, load_phrases_dict
+from pypinyin_dict.phrase_pinyin_data import cc_cedict
+
+cc_cedict.load()
+re_special_pinyin = re.compile(r'^(n|ng|m)$')
+reference = load_json("poly_correct.json")
+load_phrases_dict(reference)
+
+# ===== Chinese Conversion =====
+
+def _split_py(py):
+ """Split pinyin with tone number into initial (sm) and final (ym) parts"""
+ tone = py[-1]
+ py = py[:-1]
+ sm = ""
+ ym = ""
+ suf_r = ""
+ if re_special_pinyin.match(py):
+ py = 'e' + py
+ if py[-1] == 'r':
+ suf_r = 'r'
+ py = py[:-1]
+
+ if len(py) == 0:
+ # rx
+ return "", suf_r + tone
+
+ if py == 'zi' or py == 'ci' or py == 'si' or py == 'ri':
+ sm = py[:1]
+ ym = "ii"
+ elif py == 'zhi' or py == 'chi' or py == 'shi':
+ sm = py[:2]
+ ym = "iii"
+ elif py == 'ya' or py == 'yan' or py == 'yang' or py == 'yao' or py == 'ye' or py == 'yong' or py == 'you':
+ sm = ""
+ ym = 'i' + py[1:]
+ elif py == 'yi' or py == 'yin' or py == 'ying':
+ sm = ""
+ ym = py[1:]
+ elif py == 'yu' or py == 'yv' or py == 'yuan' or py == 'yvan' or py == 'yue ' or py == 'yve' or py == 'yun' or py == 'yvn':
+ sm = ""
+ ym = 'v' + py[2:]
+ elif py == 'wu':
+ sm = ""
+ ym = "u"
+ elif py[0] == 'w':
+ sm = ""
+ ym = "u" + py[1:]
+ elif len(py) >= 2 and (py[0] == 'j' or py[0] == 'q' or py[0] == 'x') and py[1] == 'u':
+ sm = py[0]
+ ym = 'v' + py[2:]
+ else:
+ seg_pos = re.search('a|e|i|o|u|v', py)
+ try:
+ sm = py[:seg_pos.start()]
+ ym = py[seg_pos.start():]
+ if ym == 'ui':
+ ym = 'uei'
+ elif ym == 'iu':
+ ym = 'iou'
+ elif ym == 'un':
+ ym = 'uen'
+ elif ym == 'ue':
+ ym = 've'
+ except Exception:
+ sm = ym = ""
+ return sm, ym
+ ym += suf_r + tone
+ return sm, ym
+
+# All Chinese punctuation
+chinese_punctuation_pattern = r'[\u3002\uff0c\uff1f\uff01\uff1b\uff1a\u201c\u201d\u2018\u2019\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u2014\u2026\u3001\uff08\uff09]'
+
+def _has_ch_punc(text):
+ match = re.search(chinese_punctuation_pattern, text)
+ return match is not None
+
+def _has_en_punc(text):
+ return text in string.punctuation
+
+def _trans_cn(text:str, with_sp=True):
+ """Convert Chinese to phonemes"""
+ phonemes = []
+ # Word segmentation
+ seg_list = jieba.cut(text)
+ # Process word by word
+ for seg in seg_list:
+ # String validity
+ if seg.strip() == "": continue
+ # seg_tn = tn_chinese(seg)
+ # Convert to pinyin (without tone)
+ py =[_py[0] for _py in pinyin(seg, style=Style.TONE3, neutral_tone_with_five=True)]
+ # Punctuation detection (skip if present)
+ if any([_has_ch_punc(_py) for _py in py]) or any([_has_en_punc(_py) for _py in py]):
+ continue
+ # Split pinyin
+ # phonemes += _split_py(_py)
+ for _py in py:
+ sm, ym = _split_py(_py)
+ if sm != "":
+ phonemes.append(sm)
+ if ym != "":
+ phonemes.append(ym)
+ if with_sp:
+ phonemes += ["sp"]
+ return phonemes
+
+# ===== English Conversion =====
+
+def _read_lexicon(lex_path):
+ """Read English lexicon"""
+ lexicon = {}
+ with open(lex_path) as f:
+ for line in f:
+ temp = re.split(r"\s+", line.strip("\n"))
+ word = temp[0]
+ phones = temp[1:]
+ if word.lower() not in lexicon:
+ lexicon[word.lower()] = phones
+ return lexicon
+
+LEX_PATH = BASE_DIR / f"data/ref/lexion.txt"
+lexicon = _read_lexicon(LEX_PATH)
+
+g2p = G2p()
+
+def _trans_en(word:str, with_sp=True):
+ """Convert English (word) to phonemes"""
+ w_lower = word.lower()
+ phonemes = []
+ if w_lower in lexicon:
+ # Use lexicon if available (cannot directly get reference)
+ phonemes += lexicon[w_lower]
+ else:
+ # Use G2P if not in lexicon
+ phonemes = g2p(w_lower)
+ if not phonemes:
+ phonemes = []
+ # Add to lexicon
+ lexicon[w_lower] = phonemes
+ if len(phonemes) > 0 and with_sp:
+ phonemes.append("sp")
+ return phonemes
+
+# ===== Single Sentence Processing =====
+
+def _char_lang(c:str) -> int:
+ """
+ Check if a character is Chinese, English, or other
+ 0 - Chinese
+ 1 - English
+ 2 - Number
+ 3 - Other
+ """
+ if '\u4e00' <= c <= '\u9fff':
+ return 0
+ elif ('a' <= c <= 'z') or ('A' <= c <= 'Z'):
+ return 1
+ elif c.isdigit():
+ return 2
+ else:
+ return 3
+
+NUMBER_MAP = {
+ "0": "zero",
+ "1": "one",
+ "2": "two",
+ "3": "three",
+ "4": "four",
+ "5": "five",
+ "6": "six",
+ "7": "seven",
+ "8": "eight",
+ "9": "nine",
+}
+
+def _lang_seperate(text:str) -> list[str]:
+ """Split string by language"""
+ lang_segs = [] # Set of split strings
+ lang_tags = [] # Tags for each string segment
+ lang_seg = "" # Previous continuous language string
+ lang_tag = -1 # Language type of previous character
+ en_count = 0
+ for c in text:
+ lang = _char_lang(c)
+ if lang_tag != lang:
+ # Different from previous character type
+ if lang_seg != "":
+ lang_segs.append(lang_seg)
+ lang_tags.append(lang_tag)
+ if lang_tag == 1:
+ en_count += 1
+ lang_seg = ""
+ if lang == 2 and en_count >= 4:
+ # Number conversion in English
+ lang_segs.append(NUMBER_MAP[c])
+ lang_tags.append(1)
+ lang_tag = lang
+ if lang < 2:
+ lang_seg += c
+ if lang_seg != "":
+ # Last valid segment
+ lang_segs.append(lang_seg)
+ lang_tags.append(lang_tag)
+ return lang_segs, lang_tags
+
+def _phoneme_trans(text:str, with_sp=True):
+ """Convert a lyric segment to phonemes"""
+ # Split by language
+ lang_segs, lang_tags = _lang_seperate(text)
+ # Convert segment by segment
+ phonemes = []
+ for lang_seg, lang_tag in zip(lang_segs, lang_tags):
+ if lang_tag == 0:
+ # Chinese
+ phonemes += _trans_cn(lang_seg, with_sp)
+ else:
+ # English
+ phonemes += _trans_en(lang_seg, with_sp)
+ return phonemes
+
+# ===== Dynamic Adaptation =====
+
+def _get_lyrics(raw_content:str) -> list[str]:
+ """Extract lyric content from dialogue, format like '[stage][dsec:xxx][lyrics:xxx\nxxx]'"""
+ START_FORMAT = "[lyrics:"
+ start = raw_content.find(START_FORMAT)
+ if start == -1:
+ return None, None
+ content = raw_content[start+len(START_FORMAT):-1]
+ # Filter brackets
+ content = re.sub(r'\[.*?\]', '', content) # Complete brackets
+ content = re.sub(r'[\[\]]', '', content) # Unclosed brackets
+ # Split sentences
+ sentences = content.split("\n")
+ # Reconstruct
+ new_content = raw_content[:start] + START_FORMAT + content + "]"
+ return sentences, new_content
+
+def _trans_sentences(sentences:list[str], with_sp:bool=True) -> str:
+ """Convert sentence list to wrapped phoneme string"""
+ phonemes_lis = []
+ for sentence in sentences:
+ phonemes = _phoneme_trans(sentence, with_sp)
+ phonemes_lis.append(" ".join(phonemes))
+ # Wrap
+ phonemes_str = '\n'.join(phonemes_lis)
+ envelope = f"[phoneme:{phonemes_str}]"
+ envelope = re.sub(r'\d+', '', envelope) # Remove tones
+ return envelope
+
+# ===== External Interface =====
+
+def get_phonemes_meta(dataset:list[dict], save_path:str, with_sp:bool=True):
+ """Add phonemes to lyrics in dataset"""
+ new_dataset = []
+ with open(save_path, 'w', encoding='utf-8') as file:
+ for ele in tqdm(dataset, desc="Phoneme trans"):
+ ele = copy.deepcopy(ele)
+ messages = ele['messages']
+ # Skip first message, process subsequent ones sentence by sentence
+ for message in messages[1:]:
+ if message['role'] == "assistant":
+ continue
+ content = message['content']
+ sentences, new_content = _get_lyrics(content)
+ if sentences is None:
+ continue
+ phonemes = _trans_sentences(sentences, with_sp)
+ message['content'] = new_content + phonemes
+ new_dataset.append(ele)
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+ return new_dataset
diff --git a/data_pipeline/meta_process/meta_tags.py b/data_pipeline/meta_process/meta_tags.py
new file mode 100644
index 0000000000000000000000000000000000000000..752187f2f62cb489218eb8ada0c852a9a0b04203
--- /dev/null
+++ b/data_pipeline/meta_process/meta_tags.py
@@ -0,0 +1,124 @@
+import os
+import json
+import torch
+from tqdm import tqdm
+from transformers import (
+ Qwen3OmniMoeProcessor,
+ Qwen3OmniMoeForConditionalGeneration
+)
+from qwen_omni_utils import process_mm_info
+from my_tool import get_free_gpu, audio_cut, extract_json, dup_remove, BASE_DIR
+
+# ===== Tag Model and Processor (External) =====
+
+def load_tag_model():
+ """Load tag model"""
+ device = f"cuda:{get_free_gpu()}"
+ print(f"Using {device}")
+ model_name = "Qwen/Qwen3-Omni-30B-A3B-Instruct"
+ model = Qwen3OmniMoeForConditionalGeneration.from_pretrained(
+ model_name,
+ dtype=torch.bfloat16,
+ # local_files_only=True
+ ).to(device)
+ model.disable_talker()
+ model.eval()
+
+ processor = Qwen3OmniMoeProcessor.from_pretrained(
+ model_name,
+ # local_files_only=True
+ )
+ return model, processor
+
+# ===== Tag Annotation =====
+
+def _format_messages(prompt:str, path:str) -> list[dict]:
+ """Construct messages to pass to omni"""
+ messages = [
+ {
+ "role": "system",
+ "content": [
+ {"type": "text", "text": prompt}
+ ]
+ },
+ {
+ "role": "user",
+ "content": [
+ {"type": "audio", "audio": path},
+ ]
+ }
+ ]
+ return messages
+
+def _batch_tagging(model, processor, paths:list[str], prompt:str, mode="random"):
+ """Annotate a batch of songs"""
+ convs = []
+ middle_paths = []
+ output_dir = BASE_DIR / "data/temp"
+ for path in paths:
+ seg_path = audio_cut(path, mode, output_dir)
+ middle_paths.append(seg_path)
+ messages = _format_messages(prompt, seg_path)
+ convs.append(messages)
+
+ USE_AUDIO_IN_VIDEO = False
+
+ with torch.no_grad():
+ text = processor.apply_chat_template(convs, add_generation_prompt=True, tokenize=False)
+ audios, images, videos = process_mm_info(convs, use_audio_in_video=USE_AUDIO_IN_VIDEO)
+ inputs = processor(
+ text=text,
+ audio=audios,
+ padding=True,
+ images=images,
+ videos=videos,
+ return_tensors="pt",
+ use_audio_in_video=USE_AUDIO_IN_VIDEO
+ )
+ inputs = inputs.to(model.device).to(model.dtype)
+
+ text_ids = model.generate(
+ **inputs,
+ max_new_tokens=2048,
+ return_audio=False,
+ thinker_return_dict_in_generate=True,
+ use_audio_in_video=USE_AUDIO_IN_VIDEO
+ )
+ gene_texts = processor.batch_decode(
+ text_ids[0].sequences[:, inputs["input_ids"].shape[1] :],
+ skip_special_tokens=True,
+ clean_up_tokenization_spaces=False
+ )
+
+ torch.cuda.empty_cache()
+ # Delete audio segments
+ for path in middle_paths:
+ if os.path.exists(path):
+ os.remove(path)
+ return gene_texts
+
+# ===== External Interface =====
+
+def get_tags_meta(model, processor, dataset:list[dict], prompt:str, bs:int, save_path:str):
+ data_num = len(dataset)
+ dataset = dup_remove(dataset, save_path, 'path', 'tags')
+ new_dataset = []
+ with open(save_path, 'a', encoding="utf-8") as file:
+ for i in tqdm(range(0, data_num, bs)):
+ batch = []
+ paths = []
+ for ele in dataset[i:i+bs]:
+ path = ele['path']
+ if os.path.exists(path):
+ batch.append(ele)
+ paths.append(path)
+ contents = _batch_tagging(model, processor, paths, prompt)
+ for ele, content in zip(batch, contents):
+ ckeck, json_data = extract_json(content)
+ if not ckeck:
+ continue
+ ele['tags'] = json_data['tags']
+ new_dataset.append(ele)
+ json.dump(ele, file, ensure_ascii=False)
+ file.write('\n')
+ return new_dataset
\ No newline at end of file
diff --git a/data_pipeline/meta_process/meta_vocal.py b/data_pipeline/meta_process/meta_vocal.py
new file mode 100644
index 0000000000000000000000000000000000000000..da25cae23eaca007f4bba473396093155314ffcf
--- /dev/null
+++ b/data_pipeline/meta_process/meta_vocal.py
@@ -0,0 +1,141 @@
+import os
+import math
+from typing import List, Dict
+
+import torch
+import librosa
+import numpy as np
+
+from demucs.pretrained import get_model
+from demucs.apply import apply_model
+
+
+# ======================
+# Basic Configuration
+# ======================
+
+SAMPLE_RATE = 44100
+MAX_DURATION = 5 # Only take first 30 seconds
+BATCH_SIZE = 8 # Can adjust to 16~32 for 80G GPU memory
+VOCAL_DB_THRESHOLD = -35.0 # Vocal presence threshold (empirical value)
+# DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+DEVICE = "cuda:1"
+
+
+# ======================
+# Audio Loading (first 30s only)
+# ======================
+
+def load_audio_30s(path: str, sr: int = SAMPLE_RATE) -> torch.Tensor:
+ """
+ Returns shape: [channels=2, samples]
+ """
+ y, _ = librosa.load(
+ path,
+ sr=sr,
+ mono=False,
+ duration=MAX_DURATION
+ )
+
+ if y.ndim == 1:
+ y = np.stack([y, y], axis=0)
+
+ return torch.from_numpy(y)
+
+
+# ======================
+# dB Calculation
+# ======================
+
+def rms_db(wav: torch.Tensor) -> float:
+ """
+ wav: [channels, samples]
+ """
+ rms = torch.sqrt(torch.mean(wav ** 2))
+ db = 20 * torch.log10(rms + 1e-8)
+ return db.item()
+
+
+# ======================
+# Demucs Vocal Detection
+# ======================
+
+class DemucsVocalDetector:
+ def __init__(self):
+ self.model = (
+ get_model("htdemucs")
+ .to(DEVICE)
+ .eval()
+ )
+ self.vocal_idx = self.model.sources.index("vocals")
+
+ @torch.no_grad()
+ def batch_has_vocal(self, audio_paths: List[str]) -> Dict[str, bool]:
+ """
+ Input: List of audio paths
+ Output: {path: whether has vocals}
+ """
+ results = {}
+
+ batch_wavs = []
+ batch_paths = []
+
+ print("start load")
+ for path in audio_paths:
+ try:
+ wav = load_audio_30s(path)
+ batch_wavs.append(wav)
+ batch_paths.append(path)
+ except Exception as e:
+ results[path] = False
+ continue
+
+ print("finish load")
+ self._process_batch(batch_wavs, batch_paths, results)
+
+ return results
+
+ def _process_batch(self, wavs, paths, results):
+ max_len = max(w.shape[1] for w in wavs)
+ padded = []
+
+ for w in wavs:
+ if w.shape[1] < max_len:
+ w = torch.nn.functional.pad(w, (0, max_len - w.shape[1]))
+ padded.append(w)
+
+ batch = torch.stack(padded, dim=0).to(DEVICE)
+
+ print("start demucs")
+ torch.cuda.synchronize()
+
+ sources = apply_model(
+ self.model,
+ batch,
+ SAMPLE_RATE,
+ device=DEVICE,
+ split=False, # 🔥 Core
+ progress=False
+ )
+
+ torch.cuda.synchronize()
+ print("demucs done")
+
+ vocals = sources[:, self.vocal_idx]
+
+ for i, path in enumerate(paths):
+ db = rms_db(vocals[i])
+ results[path] = db > VOCAL_DB_THRESHOLD
+
+# ======================
+# Usage Example
+# ======================
+
+if __name__ == "__main__":
+ audio_list = []
+
+ detector = DemucsVocalDetector()
+ result = detector.batch_has_vocal(audio_list)
+
+ for k, v in result.items():
+ print(f"{k}: {'Has' if v else 'No'} vocals")
\ No newline at end of file
diff --git a/data_pipeline/meta_process/my_tool.py b/data_pipeline/meta_process/my_tool.py
new file mode 100644
index 0000000000000000000000000000000000000000..a0823e98a28d8ad584850d953355be6dfee12354
--- /dev/null
+++ b/data_pipeline/meta_process/my_tool.py
@@ -0,0 +1,551 @@
+import os
+import re
+import json
+import random
+import tarfile
+import subprocess
+import json_repair
+from tqdm import tqdm
+from pathlib import Path
+from pydub import AudioSegment
+from collections import defaultdict
+from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
+
+# ===== Macros =====
+
+BASE_DIR = Path(__file__).parent
+
+# ===== Helper Functions =====
+
+def pure_name(path:str):
+ """Get the original name of a file path (without extension)"""
+ basename = os.path.basename(path)
+ dot_pos = basename.rfind('.')
+ if dot_pos == -1:
+ return basename
+ return basename[:dot_pos]
+
+def extract_json(text: str) -> tuple[bool, dict]:
+ """Extract and repair JSON data from text (enhanced error-tolerant version)
+
+ Features:
+ 1. Automatically identify code block markers (```json``` or ```)
+ 2. Fix common JSON errors (mismatched quotes, trailing commas, etc.)
+ 3. Support lenient parsing mode
+
+ Returns: (success, parsed dictionary)
+ """
+ # Preprocessing: extract possible JSON content area
+ content = text
+
+ # Case 1: Check ```json``` code block
+ if '```json' in text:
+ start = text.find('```json')
+ end = text.find('```', start + 6)
+ content = text[start + 6:end].strip()
+ # Case 2: Check regular ``` code block
+ elif '```' in text:
+ start = text.find('```')
+ end = text.find('```', start + 3)
+ content = text[start + 3:end].strip()
+
+ # Clean common interference items in content
+ content = re.sub(r'^[^{[]*', '', content) # Remove unstructured content before JSON
+ content = re.sub(r'[^}\]]*$', '', content) # Remove unstructured content after JSON
+
+ # Try standard parsing
+ try:
+ json_data = json.loads(content)
+ return True, json_data
+ except json.JSONDecodeError as e:
+ standard_error = e
+
+ # Try to repair with json_repair
+ try:
+ repaired = json_repair.repair_json(content)
+ json_data = json.loads(repaired)
+ return True, json_data
+ except Exception as e:
+ repair_error = e
+ return False, {
+ "standard_error": standard_error,
+ "repair_error": repair_error
+ }
+
+def path_join(dir, name):
+ return os.path.join(dir, name)
+
+def dict_sort_print(dic:dict, value:bool=True, reverse=True):
+ """Sort a dictionary by value size and output"""
+ idx = 1 if value else 0
+ sorted_lis = sorted(dic.items(), key=lambda x: x[idx], reverse=reverse)
+ sorted_dic = {}
+ for key, value in sorted_lis:
+ sorted_dic[key] = value
+ print(json.dumps(sorted_dic, indent=4, ensure_ascii=False))
+
+def clean_newlines(text: str) -> str:
+ """
+ Clean lyric line breaks:
+ 1. Keep line breaks after punctuation
+ 2. Convert line breaks after non-punctuation → space
+ 3. Fix extra spaces after English apostrophes
+ 4. Merge redundant spaces
+ 5. Preserve paragraph structure, ensure line breaks after punctuation
+ """
+ if not text:
+ return ""
+
+ text = text.strip()
+
+ # First unify line breaks to \n
+ text = text.replace('\r\n', '\n').replace('\r', '\n')
+
+ # Merge non-empty lines into one sentence (remove original line breaks first)
+ lines = [line.strip() for line in text.split('\n')]
+ text = ' '.join(line for line in lines if line)
+
+ # Add line break after sentence-ending punctuation (Chinese and English punctuation)
+ text = re.sub(r'([.,!?:;,。!?;])\s*', r'\1\n', text)
+
+ # Fix spaces after English apostrophes
+ text = re.sub(r"'\s+", "'", text)
+
+ # Merge redundant spaces
+ text = re.sub(r'[ \t]+', ' ', text)
+
+ # Remove leading and trailing spaces from lines
+ text = '\n'.join(line.strip() for line in text.split('\n'))
+
+ return text.strip()
+
+# ===== Detection Functions =====
+def is_ch_char(char:str):
+ """Determine if a single character is a Chinese character"""
+ if len(char) != 1:
+ return False
+
+ # Unicode ranges for Chinese characters
+ # 1. Basic Chinese: 0x4E00-0x9FFF
+ # 2. Extension A: 0x3400-0x4DBF
+ # 3. Extension B: 0x20000-0x2A6DF
+ # 4. Extension C: 0x2A700-0x2B73F
+ # 5. Extension D: 0x2B740-0x2B81F
+ # 6. Extension E: 0x2B820-0x2CEAF
+
+ code = ord(char)
+
+ # Common check (covers most cases)
+ if 0x4E00 <= code <= 0x9FFF:
+ return True
+ # Extension A
+ if 0x3400 <= code <= 0x4DBF:
+ return True
+ # Other extensions not considered for now
+
+ return False
+
+# ===== File Operations =====
+
+def load_txt(path:str) -> str:
+ """Load a file as plain text"""
+ with open(path, 'r') as file:
+ content = file.read()
+ return content
+
+def load_json(path:str):
+ """Load a JSON file"""
+ if not os.path.exists(path):
+ return {}
+ with open(path, 'r') as file:
+ data = json.load(file)
+ return data
+
+def load_jsonl(path:str, limit=-1) -> list[dict]:
+ """Load a JSONL file"""
+ data = []
+ with open(path, 'r') as file:
+ for id, line in tqdm(enumerate(file), desc=f"Loading {path}"):
+ if limit != -1 and id == limit:
+ break
+ data.append(json.loads(line))
+ return data
+
+def save_json(data, path:str):
+ """Save a JSON file"""
+ with open(path, 'w', encoding='utf-8') as file:
+ json.dump(data, file, ensure_ascii=False, indent=4)
+
+def save_jsonl(data:list[dict], path:str, mode='w'):
+ """Save a JSONL file"""
+ with open(path, mode, encoding='utf-8') as file:
+ for ele in tqdm(data, desc=f"Saving to {path}"):
+ json.dump(ele, file, ensure_ascii=False)
+ file.write("\n")
+
+def audio_cut(input_path, mode:str, output_dir:str, segment_length:int=30000):
+ """
+ Extract a segment of specified length from an audio file
+ - mode: Cut type (random / middle)
+ - output_dir: Output folder
+ - segment_length: Segment length (milliseconds)
+ """
+ assert mode in ['random', 'middle']
+
+ # Check if file exists
+ if not os.path.exists(input_path):
+ raise FileNotFoundError(f"Audio file not found: {input_path}")
+
+ # Load audio file
+ audio = AudioSegment.from_file(input_path)
+ audio = audio.set_frame_rate(44100).set_channels(1) # Set sample rate and channels
+ audio_duration = len(audio) # Duration control
+
+ # If audio length is less than target segment length, use entire audio
+ if audio_duration <= segment_length:
+ print(f"Warning: Audio too short ({audio_duration}ms), using full audio: {input_path}")
+ segment = audio
+ else:
+ # Calculate slice position based on mode
+ if mode == "random":
+ # Random cut
+ max_start = max(0, audio_duration - segment_length)
+ start = random.randint(0, max_start)
+ end = start + segment_length
+ else:
+ # Cut from middle
+ middle_point = audio_duration // 2
+ start = max(0, middle_point - (segment_length // 2))
+ end = min(audio_duration, start + segment_length)
+
+ # If cutting from middle would exceed boundaries, adjust start position
+ if end > audio_duration:
+ end = audio_duration
+ start = end - segment_length
+ elif start < 0:
+ start = 0
+ end = segment_length
+
+ # Ensure slice range is valid
+ start = max(0, min(start, audio_duration))
+ end = max(0, min(end, audio_duration))
+
+ if start >= end:
+ raise ValueError(f"Invalid slice range: start={start}, end={end}, duration={audio_duration}")
+
+ # Execute slice
+ segment = audio[start:end]
+
+ # Generate output path
+ basename = pure_name(input_path)
+ output_path = os.path.join(output_dir, f"seg_{basename}.wav")
+
+ # Save segment
+ segment.export(
+ output_path,
+ format="wav",
+ codec="pcm_s16le", # 16-bit little-endian encoding
+ parameters=["-acodec", "pcm_s16le"] # ffmpeg parameters
+ )
+ return output_path
+
+def format_meta(dir:str, show:bool=True) -> list[dict]:
+ """Recursively get all audio paths (wav / mp3) in a folder and build JSONL"""
+ if not os.path.isdir(dir):
+ return []
+ dataset = []
+ if show:
+ for name in tqdm(os.listdir(dir), desc=f"Formating {dir}"):
+ path = os.path.join(dir, name)
+ if os.path.isdir(path):
+ dataset += format_meta(path, False)
+ elif name.endswith('.mp3') or name.endswith('.wav'):
+ dataset.append({"path": path})
+ else:
+ for name in os.listdir(dir):
+ path = os.path.join(dir, name)
+ if os.path.isdir(path):
+ dataset += format_meta(path, False)
+ elif name.endswith('.mp3') or name.endswith('.wav'):
+ dataset.append({"path": path})
+ return dataset
+
+def dup_remove(raw_data:list[dict], save_path:str, key:str, seg:str):
+ """
+ Remove already generated items from dataset
+ - key is the primary key in raw dataset, foreign key in save
+ - seg is the target field
+ """
+ if not os.path.exists(save_path):
+ print(f"Dup num: 0")
+ return raw_data
+ save_data = load_jsonl(save_path)
+ keys = set()
+ for ele in tqdm(save_data, desc="Constructing Dup Set"):
+ if seg in ele:
+ keys.add(ele[key])
+ rest_data = []
+ dup_count = 0
+ for ele in tqdm(raw_data, desc="Checking Dup"):
+ if ele[key] not in keys:
+ rest_data.append(ele)
+ else:
+ dup_count += 1
+ print(f"Dup num: {dup_count}")
+ return rest_data
+
+def tar_size_check(data_dir:str, subfixes:list[str], per:int, max_size:int):
+ """
+ Determine the number of files that can fit in a block before compression (assuming uniform file sizes)
+ - data_dir: Folder to compress
+ - subfixes: File suffixes to compress (e.g., .mp3)
+ - per: Check every N files on average
+ - max_size: Maximum limit in GB
+ """
+ names = sorted(list(os.listdir(data_dir)))
+ count = 0
+ size_sum = 0
+ for name in tqdm(names, desc="Size Checking"):
+ path = os.path.join(data_dir, name)
+ subfix = os.path.splitext(name)[1]
+ if subfix not in subfixes:
+ continue
+ count += 1
+ size_sum += os.path.getsize(path)
+ if count % per == 0:
+ gb_size = size_sum / 1024 / 1024 / 1024
+ if gb_size > max_size:
+ break
+ print(f"Count: {count}, Size: {gb_size:.2f}GB")
+
+def tar_dir(
+ data_dir:str,
+ subfixes:list[str],
+ save_dir:str,
+ group_size:int,
+ tmp_dir:str,
+ mark:str,
+ max_workers:int=10,
+ ):
+ """Compress files in a directory in chunks (non-recursive)"""
+ names = sorted(list(os.listdir(data_dir)))
+ file_num = len(names)
+ for i in range(0, file_num, group_size):
+ names_subset = names[i:i+group_size]
+ size_sum = 0
+ name_path = os.path.join(tmp_dir, f"name_{i}_{mark}")
+ with open(name_path, 'w', encoding='utf-8') as file:
+ for name in tqdm(names_subset, desc=f"Counting Block {i}"):
+ path = os.path.join(data_dir, name)
+ subfix = os.path.splitext(path)[1]
+ if subfix not in subfixes:
+ continue
+ file.write("./" + name + "\n")
+ size_sum += os.path.getsize(path)
+ gb_size = size_sum / 1024 / 1024 / 1024
+ print(f"Zipping block {i+1}, size: {gb_size:.2f}GB")
+
+ tar_cmd = [
+ 'tar',
+ '--no-recursion',
+ '--files-from', str(name_path),
+ '-cf', '-'
+ ]
+ pigz_cmd = ['pigz', '-p', str(max_workers), '-c']
+
+ tar_process = subprocess.Popen(tar_cmd, stdout=subprocess.PIPE, cwd=data_dir)
+ pigz_process = subprocess.Popen(pigz_cmd, stdin=tar_process.stdout, stdout=subprocess.PIPE, cwd=data_dir)
+
+ save_path = os.path.join(save_dir, f"block_{i}_{mark}.tar.gz")
+ with open(save_path, 'wb') as out_file:
+ while True:
+ data = pigz_process.stdout.read(4096)
+ if not data:
+ break
+ out_file.write(data)
+
+ tar_process.wait()
+ pigz_process.wait()
+
+ if tar_process.returncode == 0 and pigz_process.returncode == 0:
+ print(f"Compression completed: {save_path}")
+ else:
+ print(f"Compression failed: tar return code={tar_process.returncode}, pigz return code={pigz_process.returncode}")
+
+def music_avg_size(dir:str):
+ """Average music size (MB), length (s)"""
+ dataset = format_meta(dir)
+ dataset = dataset[:50]
+ size_sum = 0
+ length_sum = 0
+ for ele in tqdm(dataset, desc=f"Counting Music Size in {dir}"):
+ path = ele['path']
+ audio = AudioSegment.from_file(path)
+ length_sum += len(audio) / 1000.0
+ size_sum += os.path.getsize(path)
+ size_avg = size_sum / len(dataset) / 1024 / 1024
+ length_avg = length_sum / len(dataset)
+ return size_avg, length_avg
+
+def get_sample(path:str, save_path:str="tmp.jsonl", num:int=100):
+ """Get N records from a JSONL file"""
+ if not os.path.exists(path):
+ return
+ if path.endswith(".jsonl"):
+ dataset = load_jsonl(path)
+ elif path.endswith(".json"):
+ dataset = load_json(path)
+ else:
+ print(f"Unsupport file: {path}")
+ return
+ sub_dataset = random.sample(dataset, num)
+ save_jsonl(sub_dataset, save_path)
+
+def _get_field_one(path:str, field:str):
+ """Process data from one path"""
+ with open(path, 'r') as file:
+ data = json.load(file)
+ new_data = {
+ "id": f"{data['song_id']}_{data['track_index']}",
+ field: data[field]
+ }
+ return new_data
+
+def get_field_suno(dir:str, save_path:str, field:str, max_workers:int=8):
+ """Extract a specific field from scattered JSON files in suno"""
+ paths = []
+ for name in tqdm(os.listdir(dir), desc="Getting names"):
+ if not name.endswith(".json"):
+ continue
+ paths.append(os.path.join(dir, name))
+
+ with ProcessPoolExecutor(max_workers=max_workers) as executor:
+ futures = [executor.submit(_get_field_one, path, field) for path in paths]
+ with open(save_path, 'w', encoding='utf-8') as file:
+ with tqdm(total=len(paths), desc="Processing the JSONs") as pbar:
+ for future in as_completed(futures):
+ result = future.result()
+ json.dump(result, file, ensure_ascii=False)
+ file.write("\n")
+ pbar.update(1)
+
+def find_json(dir:str) -> list[str]:
+ """Find JSONL / JSON files in a folder"""
+ names = []
+ for name in tqdm(os.listdir(dir), desc="Finding JSON/JSONL"):
+ if name.endswith(".json") or name.endswith(".jsonl"):
+ names.append(name)
+ return names
+
+def show_dir(dir:str):
+ """Display all contents in a directory"""
+ if not os.path.isdir(dir):
+ return
+ for name in os.listdir(dir):
+ print(name)
+
+def _convert_mp3(path:str, dir:str):
+ """Process a single audio file"""
+ purename = pure_name(path)
+ output_path = os.path.join(dir, purename + ".mp3")
+ if os.path.exists(output_path):
+ # Already completed
+ return "pass"
+ try:
+ audio = AudioSegment.from_file(path)
+ except Exception:
+ # Failed to read file
+ print(f"fail to load {path}")
+ return "fail"
+ audio.export(output_path, format='mp3')
+ return "finish"
+
+def convert_mp3(meta_path:str, dir:str, max_workers:int=10):
+ """Convert all specified audio files to mp3 and save in specified directory"""
+ os.makedirs(dir, exist_ok=True)
+ dataset = load_jsonl(meta_path)
+ pass_num = 0
+ finish_num = 0
+ with ThreadPoolExecutor(max_workers=max_workers) as executor:
+ futures = [executor.submit(_convert_mp3, ele['path'], dir) for ele in dataset]
+ with tqdm(total=len(dataset), desc=f"Converting {meta_path}") as pbar:
+ for future in as_completed(futures):
+ res = future.result()
+ if res == "pass":
+ pass_num += 1
+ else:
+ finish_num += 1
+ pbar.update(1)
+ print(f"Finish {finish_num}, Pass {pass_num}")
+
+# ===== GPU and Models =====
+
+def get_free_gpu() -> int:
+ """Return the GPU ID with the least memory usage"""
+ cmd = "nvidia-smi --query-gpu=index,memory.free --format=csv,noheader,nounits"
+ result = subprocess.check_output(cmd.split()).decode().strip().split("\n")
+
+ free_list = []
+ for line in result:
+ idx, free_mem = line.split(",")
+ free_list.append((int(idx), int(free_mem))) # (GPU id, free memory MiB)
+
+ # Sort by remaining memory
+ free_list.sort(key=lambda x: x[1], reverse=True)
+ return free_list[0][0]
+
+# ===== Data Analysis =====
+
+def compose_analyze(dataset:list[dict]):
+ """Statistical analysis of music structure composition"""
+ # Label count
+ labels = defaultdict(int)
+ for ele in tqdm(dataset):
+ segments = ele['segments']
+ for segment in segments:
+ label = segment['label']
+ labels[label] += 1
+ print(f"Number of labels: {len(labels)}")
+ print(dict_sort_print(labels))
+
+ # Different combinations
+ label_combs = defaultdict(int)
+ for ele in tqdm(dataset):
+ segments = ele['segments']
+ labels = []
+ for segment in segments:
+ label = segment['label']
+ labels.append(label)
+ if len(labels) == 0:
+ continue
+ label_comb = " | ".join(labels)
+ label_combs[label_comb] += 1
+ print(f"Number of combinations: {len(label_combs)}")
+ print(dict_sort_print(label_combs))
+
+def _filter_tag(content:str) -> list[str]:
+ """Split and format tag fields"""
+ tags = []
+ raws = re.split(r'[,,.]', content)
+ for raw in raws:
+ raw = raw.strip().lower() # Remove spaces and convert to lowercase
+ if raw == "":
+ continue
+ seg_pos = raw.find(":")
+ if seg_pos != -1:
+ # If colon exists, only take the part after it
+ tag = raw[seg_pos+1:].strip()
+ else:
+ tag = raw
+ tags.append(tag)
+ return tags
+
+def tags_analyze(dataset:list[dict]):
+ """Song tag analysis"""
+ tag_count = defaultdict(int)
+ for ele in tqdm(dataset, desc="Tag analyzing"):
+ tags = _filter_tag(ele['style'])
+ for tag in tags:
+ tag_count[tag] += 1
+ print(f"Number of tags: {len(tag_count.keys())}")
+ print(dict_sort_print(tag_count))
\ No newline at end of file
diff --git a/data_pipeline/music_gene/__pycache__/suno_gene.cpython-311.pyc b/data_pipeline/music_gene/__pycache__/suno_gene.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..10fd3b3220dfacb2e112d85d0e4d2e41a16e621c
Binary files /dev/null and b/data_pipeline/music_gene/__pycache__/suno_gene.cpython-311.pyc differ
diff --git a/data_pipeline/music_gene/config.py b/data_pipeline/music_gene/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d622c71d7897f6452fa7a03e5003a69696d1c4a
--- /dev/null
+++ b/data_pipeline/music_gene/config.py
@@ -0,0 +1,70 @@
+"""
+Configuration file example
+Copy this file as config.py and fill in your actual configuration
+"""
+
+# ============== API Configuration ==============
+
+# Suno API key (obtain from https://sunoapi.org)
+SUNO_API_KEY = ""
+
+# API base URL (usually no need to modify)
+SUNO_API_BASE_URL = "https://api.sunoapi.org"
+
+# ============== Generation Configuration ==============
+
+# Default model version
+DEFAULT_MODEL_VERSION = "V5" # Options: V3_5, V4, V4_5, V4_5PLUS, V5
+
+# Whether to enable custom mode
+DEFAULT_CUSTOM_MODE = True
+
+# Whether to generate instrumental by default
+DEFAULT_INSTRUMENTAL = False
+
+# ============== Task Configuration ==============
+
+# Maximum wait time (seconds)
+MAX_WAIT_TIME = 300
+
+# Check interval (seconds)
+CHECK_INTERVAL = 10
+
+# Retry count
+MAX_RETRIES = 3
+
+# ============== File Configuration ==============
+
+# Music file save directory
+OUTPUT_DIRECTORY = "./generated_music"
+
+# Audio format
+AUDIO_FORMAT = "mp3" # Options: mp3, wav
+
+# ============== Batch Generation Configuration ==============
+
+# Concurrency for batch generation
+BATCH_CONCURRENCY = 5
+
+# Batch generation delay (seconds, to avoid rate limiting)
+BATCH_DELAY = 2
+
+# ============== Logging Configuration ==============
+
+# Log level
+LOG_LEVEL = "INFO" # Options: DEBUG, INFO, WARNING, ERROR
+
+# Log file path
+LOG_FILE = "./suno_api.log"
+
+# Whether to output to console
+LOG_TO_CONSOLE = True
+
+# ============== Webhook Configuration ==============
+
+# Webhook callback URL (optional)
+WEBHOOK_URL = None # Example: "https://your-domain.com/webhook"
+
+# Webhook secret (for verifying callback requests)
+WEBHOOK_SECRET = None
+
diff --git a/data_pipeline/music_gene/suno_gene.py b/data_pipeline/music_gene/suno_gene.py
new file mode 100644
index 0000000000000000000000000000000000000000..4319caa99daa1dad0287ba3e39548d42760f7f92
--- /dev/null
+++ b/data_pipeline/music_gene/suno_gene.py
@@ -0,0 +1,784 @@
+# -*- coding: utf-8 -*-
+"""
+Suno API Batch Generation - Enhanced Time Statistics Version
+Improvements:
+1. Precise rate control: Exactly 20 requests in 10 seconds
+2. Separate request submission and result waiting for improved concurrency efficiency
+3. Dynamic concurrency adjustment to avoid resource waste
+4. Comprehensive time statistics and performance analysis
+"""
+import json
+import time
+import requests
+import os
+import logging
+import csv
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from collections import deque
+from threading import Lock, Semaphore
+from tqdm import tqdm
+from config import SUNO_API_KEY
+
+
+# Configure logging
+def setup_logging(output_dir):
+ log_file = os.path.join(output_dir, f"run_log_{time.strftime('%Y%m%d_%H%M%S')}.txt")
+
+ # Create logger
+ logger = logging.getLogger('SunoBatch')
+ logger.setLevel(logging.INFO)
+
+ # Clear old handlers
+ if logger.hasHandlers():
+ logger.handlers.clear()
+
+ # File handler
+ file_handler = logging.FileHandler(log_file, encoding='utf-8')
+ file_handler.setFormatter(logging.Formatter('%(message)s'))
+ logger.addHandler(file_handler)
+
+ # Console handler
+ console_handler = logging.StreamHandler()
+ console_handler.setFormatter(logging.Formatter('%(message)s'))
+ logger.addHandler(console_handler)
+
+ return logger, log_file
+
+# Global logger
+logger = logging.getLogger('SunoBatch')
+
+# Replace print with logger.info
+def print_log(msg):
+ logger.info(msg)
+
+
+
+class SunoAPI:
+ """Simplified Suno API client"""
+
+ def __init__(self, api_key):
+ self.api_key = api_key
+ self.base_url = 'https://api.sunoapi.org/api/v1'
+ self.headers = {
+ 'Authorization': f'Bearer {api_key}',
+ 'Content-Type': 'application/json'
+ }
+
+ # Configure retry strategy
+ self.session = requests.Session()
+ retry_strategy = Retry(
+ total=5, # Maximum retry count
+ backoff_factor=1, # Retry interval (1s, 2s, 4s, 8s...)
+ status_forcelist=[500, 502, 503, 504], # Status codes that require retry
+ allowed_methods=["HEAD", "GET", "POST", "OPTIONS"] # Methods allowed for retry
+ )
+ adapter = HTTPAdapter(max_retries=retry_strategy)
+ self.session.mount("https://", adapter)
+ self.session.mount("http://", adapter)
+
+ def generate_music(self, prompt, model='V5', vocalGender=None, **options):
+ """Generate music"""
+ payload = {
+ 'prompt': prompt,
+ 'model': model,
+ 'callBackUrl': 'https://example.com/callback',
+ **options
+ }
+
+ if vocalGender:
+ payload['vocalGender'] = vocalGender
+
+ try:
+ response = self.session.post(
+ f'{self.base_url}/generate',
+ headers=self.headers,
+ json=payload,
+ timeout=30
+ )
+
+ # Check HTTP error
+ response.raise_for_status()
+
+ # Try to parse JSON
+ try:
+ result = response.json()
+ except json.JSONDecodeError:
+ raise Exception(f"API returned non-JSON response: {response.text[:200]}")
+
+ if result.get('code') != 200:
+ raise Exception(f"Generation failed: {result.get('msg', result)}")
+
+ return result['data']['taskId']
+
+ except requests.exceptions.RequestException as e:
+ raise Exception(f"Request exception: {str(e)}")
+
+ def get_task_status(self, task_id):
+ """Get task status"""
+ try:
+ response = self.session.get(
+ f'{self.base_url}/generate/record-info?taskId={task_id}',
+ headers={'Authorization': f'Bearer {self.api_key}'},
+ timeout=30
+ )
+ response.raise_for_status()
+ return response.json().get('data', {})
+ except Exception as e:
+ # Status query failure should not crash program, return empty dict or raise specific exception
+ # print_log(f"Failed to get status: {e}")
+ raise e
+
+ def get_timestamped_lyrics(self, task_id, audio_id):
+ """Get timestamped lyrics"""
+ payload = {
+ 'taskId': task_id,
+ 'audioId': audio_id
+ }
+
+ try:
+ response = self.session.post(
+ f'{self.base_url}/generate/get-timestamped-lyrics',
+ headers=self.headers,
+ json=payload,
+ timeout=30
+ )
+ response.raise_for_status()
+ return response.json()
+ except Exception:
+ return {} # Lyrics fetch failure is non-fatal error
+
+ def wait_for_completion(self, task_id, max_wait_time=600, check_interval=5):
+ """Wait for task completion, return results and polling statistics"""
+ start_time = time.time()
+ poll_count = 0
+ total_poll_time = 0
+
+ while time.time() - start_time < max_wait_time:
+ try:
+ poll_start = time.time()
+ status = self.get_task_status(task_id)
+ poll_time = time.time() - poll_start
+ poll_count += 1
+ total_poll_time += poll_time
+
+ current_status = status.get('status')
+
+ if current_status == 'SUCCESS':
+ return {
+ 'result': status.get('response'),
+ 'wait_time': time.time() - start_time,
+ 'poll_count': poll_count,
+ 'avg_poll_time': total_poll_time / poll_count if poll_count > 0 else 0
+ }
+ elif current_status == 'FAILED':
+ raise Exception(f"Task failed: {status.get('errorMessage')}")
+
+ time.sleep(check_interval)
+ except Exception as e:
+ if time.time() - start_time >= max_wait_time:
+ raise
+ time.sleep(check_interval)
+
+ raise Exception('Task timeout')
+
+ def download_file(self, url, save_path):
+ """Download file to local, return download statistics"""
+ try:
+ start_time = time.time()
+ downloaded_bytes = 0
+
+ # Use session to download
+ with self.session.get(url, stream=True, timeout=60) as r:
+ r.raise_for_status()
+ with open(save_path, 'wb') as f:
+ for chunk in r.iter_content(chunk_size=8192):
+ f.write(chunk)
+ downloaded_bytes += len(chunk)
+
+ download_time = time.time() - start_time
+ return {
+ 'success': True,
+ 'bytes': downloaded_bytes,
+ 'time': download_time,
+ 'speed': downloaded_bytes / download_time if download_time > 0 else 0
+ }
+ except Exception as e:
+ print_log(f"Download failed {url}: {e}")
+ return {'success': False, 'error': str(e)}
+
+
+# Result record lock
+result_lock = Lock()
+
+def save_result_record(output_dir, record):
+ """Save single result to CSV in real-time"""
+ file_path = os.path.join(output_dir, "generation_results.csv")
+ file_exists = os.path.isfile(file_path)
+
+ # Only record key information
+ row = {
+ 'song_id': record.get('song_id'),
+ 'task_id': record.get('task_id'),
+ 'status': 'SUCCESS' if record.get('success') else 'FAILED',
+ 'error': record.get('error', ''),
+ 'submit_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(record.get('submit_time', 0))),
+ 'total_time': f"{record.get('total_time', 0):.1f}",
+ 'tracks_count': record.get('tracks_count', 0)
+ }
+
+ with result_lock:
+ with open(file_path, 'a', newline='', encoding='utf-8') as f:
+ writer = csv.DictWriter(f, fieldnames=['song_id', 'task_id', 'status', 'error', 'submit_time', 'total_time', 'tracks_count'])
+ if not file_exists:
+ writer.writeheader()
+ writer.writerow(row)
+
+
+class ImprovedRateLimiter:
+ """Improved rate limiter (with statistics)
+
+ Precise control: Maximum 20 requests per 10 seconds
+ Uses sliding window algorithm to ensure no more than 20 requests in any 10-second time window
+ """
+
+ def __init__(self, max_requests=20, time_window=10):
+ self.max_requests = max_requests
+ self.time_window = time_window
+ self.request_times = deque()
+ self.lock = Lock()
+ self.semaphore = Semaphore(max_requests)
+
+ # Statistics
+ self.total_wait_time = 0
+ self.wait_count = 0
+ self.total_requests = 0
+
+ def acquire(self):
+ """Acquire request permission"""
+ with self.lock:
+ now = time.time()
+
+ # Clean expired request records
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+
+ # If limit reached, calculate wait time needed
+ wait_time = 0
+ if len(self.request_times) >= self.max_requests:
+ oldest_request = self.request_times[0]
+ wait_time = self.time_window - (now - oldest_request) + 0.05 # Add buffer
+
+ if wait_time > 0:
+ print_log(f" [Rate Limit] Waiting {wait_time:.2f} seconds...")
+ time.sleep(wait_time)
+
+ # Record wait time
+ self.total_wait_time += wait_time
+ self.wait_count += 1
+
+ # Re-clean
+ now = time.time()
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+
+ # Record this request time
+ self.request_times.append(time.time())
+ self.total_requests += 1
+
+ def get_current_rate(self):
+ """Get current rate (number of requests in last 10 seconds)"""
+ with self.lock:
+ now = time.time()
+ while self.request_times and now - self.request_times[0] >= self.time_window:
+ self.request_times.popleft()
+ return len(self.request_times)
+
+ def get_stats(self):
+ """Get statistics"""
+ with self.lock:
+ return {
+ 'total_requests': self.total_requests,
+ 'total_wait_time': self.total_wait_time,
+ 'wait_count': self.wait_count,
+ 'avg_wait_time': self.total_wait_time / self.wait_count if self.wait_count > 0 else 0
+ }
+
+
+# Global rate limiter
+rate_limiter = ImprovedRateLimiter(max_requests=20, time_window=10)
+
+
+def submit_generation_task(api, song_index, data):
+ """Phase 1: Submit generation task (rate limited)"""
+ # Use suno_test_cn_000001 format
+ song_id = data.get("id", f"suno_test_cn_{song_index:06d}")
+
+ try:
+ description = data.get("description", "")
+ lyrics = data.get("lyrics", "")
+ vocal_gender = data.get("vocalGender")
+
+ print_log(f"[Song {song_id}] Submitting task... (Current rate: {rate_limiter.get_current_rate()}/20)")
+
+ # Record request start time
+ request_start = time.time()
+
+ # Rate limiting
+ rate_limiter.acquire()
+
+ # Submit task
+ submit_start = time.time()
+ task_id = api.generate_music(
+ prompt=lyrics,
+ style=description,
+ title=f"Song_{song_id}",
+ model='V5',
+ customMode=True,
+ instrumental=False,
+ vocalGender=vocal_gender
+ )
+ request_time = time.time() - submit_start
+
+ print_log(f"[Song {song_id}] ✓ Task submitted, ID: {task_id}")
+
+ return {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'data': data,
+ 'submit_time': time.time(),
+ 'request_time': request_time,
+ 'success': True
+ }
+
+ except Exception as e:
+ print_log(f"[Song {song_id}] ✗ Submission failed: {e}")
+ # If submission fails, also record it (though not yet at download stage)
+ # Temporarily not recording to generation_results.csv, as that file is mainly for final results
+ # But if full audit is needed, records can be added here
+ return {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'success': False,
+ 'error': str(e)
+ }
+
+
+def wait_and_download_result(api, task_info, output_dir):
+ """Phase 2: Wait for results and download (not rate limited)"""
+ if not task_info['success']:
+ return task_info
+
+ song_id = task_info['song_id']
+ song_index = task_info['song_index']
+ task_id = task_info['task_id']
+ data = task_info['data']
+ start_time = task_info['submit_time']
+
+ try:
+ original_lyrics = data.get("original_lyrics", data.get("lyrics", ""))
+ lyrics = data.get("lyrics", "")
+ description = data.get("description", "")
+
+ print_log(f"[Song {song_id}] Waiting for generation to complete...")
+
+ # Wait for completion (returns detailed statistics)
+ wait_result = api.wait_for_completion(task_id, max_wait_time=600, check_interval=8)
+ result = wait_result['result']
+
+ # Process returned results
+ tracks = []
+ if isinstance(result, dict):
+ if 'data' in result:
+ tracks = result['data']
+ elif 'sunoData' in result:
+ tracks = result['sunoData']
+ else:
+ for key, value in result.items():
+ if isinstance(value, list) and len(value) > 0 and 'audioUrl' in value[0]:
+ tracks = value
+ break
+
+ if not tracks:
+ raise Exception("Audio track data not found")
+
+ # Download phase statistics
+ download_start = time.time()
+ downloaded_files = []
+ total_download_bytes = 0
+ download_count = 0
+
+ # Process each track
+ for track_idx, track in enumerate(tracks):
+ audio_url = track.get('audioUrl') or track.get('audio_url')
+ audio_id = track.get('id')
+
+ base_filename = f"{song_id}_{track_idx}"
+ audio_path = os.path.join(output_dir, f"{base_filename}.mp3")
+ lyrics_path = os.path.join(output_dir, f"{base_filename}_lyrics.json")
+
+ # Download audio
+ if audio_url:
+ download_result = api.download_file(audio_url, audio_path)
+ if download_result['success']:
+ downloaded_files.append(audio_path)
+ total_download_bytes += download_result['bytes']
+ download_count += 1
+
+ # Get timestamped lyrics
+ timestamped_lyrics_data = None
+ if audio_id:
+ try:
+ lyrics_response = api.get_timestamped_lyrics(task_id, audio_id)
+ if lyrics_response.get('code') == 200:
+ timestamped_lyrics_data = lyrics_response.get('data')
+ except Exception as e:
+ print_log(f"[Song {song_id}] Track {track_idx+1}: Failed to get lyrics: {e}")
+
+ # Save lyrics and metadata
+ lyrics_content = {
+ "song_id": song_id,
+ "song_index": song_index,
+ "track_index": track_idx,
+ "original_lyrics": original_lyrics,
+ "cleaned_lyrics": lyrics,
+ "timestamped_lyrics": timestamped_lyrics_data,
+ "style": description,
+ "full_track_data": track
+ }
+
+ with open(lyrics_path, 'w', encoding='utf-8') as f:
+ json.dump(lyrics_content, f, ensure_ascii=False, indent=2)
+ downloaded_files.append(lyrics_path)
+
+ download_time = time.time() - download_start
+ total_time = time.time() - start_time
+
+ print_log(f"[Song {song_id}] ✓ Complete! {len(tracks)} tracks, took {total_time:.1f} seconds")
+
+ final_result = {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'success': True,
+ 'tracks_count': len(tracks),
+ 'files': downloaded_files,
+ 'total_time': total_time,
+ 'submit_time': start_time,
+ 'wait_time': wait_result['wait_time'],
+ 'poll_count': wait_result['poll_count'],
+ 'avg_poll_time': wait_result['avg_poll_time'],
+ 'download_time': download_time,
+ 'download_bytes': total_download_bytes,
+ 'download_count': download_count,
+ 'avg_download_speed': total_download_bytes / download_time if download_time > 0 else 0
+ }
+
+ # Save results in real-time
+ save_result_record(output_dir, final_result)
+ return final_result
+
+ except Exception as e:
+ total_time = time.time() - start_time
+ print_log(f"[Song {song_id}] ✗ Processing failed: {e} (took {total_time:.1f} seconds)")
+
+ error_result = {
+ 'song_id': song_id,
+ 'song_index': song_index,
+ 'task_id': task_id,
+ 'success': False,
+ 'error': str(e),
+ 'total_time': total_time,
+ 'submit_time': start_time
+ }
+
+ # Save results in real-time
+ save_result_record(output_dir, error_result)
+ return error_result
+
+
+def format_bytes(bytes_size):
+ """Format byte size"""
+ for unit in ['B', 'KB', 'MB', 'GB']:
+ if bytes_size < 1024.0:
+ return f"{bytes_size:.2f} {unit}"
+ bytes_size /= 1024.0
+ return f"{bytes_size:.2f} TB"
+
+
+def format_speed(bytes_per_sec):
+ """Format speed"""
+ return f"{format_bytes(bytes_per_sec)}/s"
+
+
+def main():
+ """Main program - two-phase concurrent processing"""
+ input_file = "test_invalid.json"
+ output_dir = "test_invalid"
+ # Create output directory
+ os.makedirs(output_dir, exist_ok=True)
+
+ # Initialize logging
+ global logger
+ logger, log_file = setup_logging(output_dir)
+
+ print_log("=" * 70)
+ print_log("Suno API Batch Generation - Enhanced Time Statistics Version")
+ print_log("Strategy: Fast submission (20/10s) + Parallel waiting + Detailed performance analysis")
+ print_log(f"Log file: {log_file}")
+ print_log("=" * 70)
+
+ # Read input file
+ try:
+ all_data = []
+ if input_file.endswith('.jsonl'):
+ try:
+ with open(input_file, 'r', encoding='utf-8') as f:
+ # Try reading first line to determine format
+ first_line = f.readline().strip()
+ if first_line.startswith('['):
+ # Looks like normal JSON array
+ f.seek(0)
+ all_data = json.load(f)
+ else:
+ # Try reading line by line
+ f.seek(0)
+ for i, line in enumerate(f):
+ # # Limit reading to items 11-20 (indices 10-19)
+ # if i < 10:
+ # continue
+ # if i >= 20:
+ # break
+
+ line = line.strip()
+ if line:
+ all_data.append(json.loads(line))
+ except json.JSONDecodeError:
+ # If above parsing fails, try reading as regular JSON one more time
+ print_log(f"Note: Failed to parse {input_file} as JSONL format, trying to read as regular JSON...")
+ with open(input_file, 'r', encoding='utf-8') as f:
+ all_data = json.load(f)
+ else:
+ with open(input_file, 'r', encoding='utf-8') as f:
+ all_data = json.load(f)
+
+ # Remove 20000 item limit
+ # if len(all_data) > 20000:
+ # print_log(f"Data exceeds 20000 items, truncating to first 20000.")
+ # all_data = all_data[:20000]
+
+ except FileNotFoundError:
+ print_log(f"File {input_file} not found.")
+ return
+ except json.JSONDecodeError as e:
+ print_log(f"JSON parsing error: {e}")
+ return
+
+ # Create output directory
+ # os.makedirs(output_dir, exist_ok=True) # Already moved above
+
+ # Initialize API
+ api = SunoAPI(SUNO_API_KEY)
+
+ print_log(f"\nReady to generate {len(all_data)} songs...")
+ print_log(f"Start time: {time.strftime('%H:%M:%S')}\n")
+
+ overall_start_time = time.time()
+
+ # ===== Phase 1: Batch Submission =====
+ print_log("\n" + "=" * 70)
+ print_log("Phase 1: Batch Submission (starting from song 20181)")
+ print_log("=" * 70 + "\n")
+
+ submit_start_time = time.time()
+ submitted_tasks = []
+ total_request_time = 0
+
+ # Adjust rate limit: Maximum 10 requests per 10 seconds
+ rate_limiter.max_requests = 10
+ rate_limiter.time_window = 10
+ rate_limiter.request_times.clear()
+ print_log(f"Rate limit: {rate_limiter.max_requests} requests / {rate_limiter.time_window} seconds")
+
+ # Only submit tasks that need to run
+ tasks_to_run = []
+ for i, data in enumerate(all_data, 1):
+ tasks_to_run.append((i, data))
+
+ print_log(f"Number of tasks to submit: {len(tasks_to_run)}")
+
+ # Use thread pool for submission
+ # Submission concurrency is controlled by rate_limiter, can be set to 10
+ with ThreadPoolExecutor(max_workers=10) as executor:
+ submit_futures = {
+ executor.submit(submit_generation_task, api, idx, data): idx
+ for idx, data in tasks_to_run
+ }
+
+ with tqdm(total=len(tasks_to_run), desc="Submitting tasks", unit="song") as pbar:
+ for future in as_completed(submit_futures):
+ result = future.result()
+ submitted_tasks.append(result)
+ if result.get('success') and 'request_time' in result:
+ total_request_time += result['request_time']
+ pbar.update(1)
+
+ submit_phase_time = time.time() - submit_start_time
+ success_submits = sum(1 for t in submitted_tasks if t['success'])
+
+ # Get rate limit statistics
+ rate_limit_stats = rate_limiter.get_stats()
+
+ print_log(f"\nSubmission phase complete: {success_submits}/{len(tasks_to_run)} successful")
+ print_log(f" Total time: {submit_phase_time:.1f} seconds")
+ print_log(f" Actual request time: {total_request_time:.2f} seconds")
+ print_log(f" Rate limit waiting: {rate_limit_stats['total_wait_time']:.2f} seconds ({rate_limit_stats['wait_count']} times)")
+ if rate_limit_stats['wait_count'] > 0:
+ print_log(f" Average wait time: {rate_limit_stats['avg_wait_time']:.2f} seconds/time")
+
+ # ===== Phase 2: Parallel Waiting and Download =====
+ print_log("\n" + "=" * 70)
+ print_log("Phase 2: Wait for Generation and Download")
+ print_log("=" * 70 + "\n")
+
+ wait_start_time = time.time()
+ final_results = []
+
+ # Use more threads for parallel waiting (not rate limited)
+ with ThreadPoolExecutor(max_workers=20) as executor:
+ download_futures = {
+ executor.submit(wait_and_download_result, api, task, output_dir): task
+ for task in submitted_tasks if task['success']
+ }
+
+ # Add failed submission tasks to results
+ for task in submitted_tasks:
+ if not task['success']:
+ final_results.append(task)
+
+ with tqdm(total=len(download_futures), desc="Downloading results", unit="song") as pbar:
+ for future in as_completed(download_futures):
+ result = future.result()
+ final_results.append(result)
+ pbar.update(1)
+
+ wait_phase_time = time.time() - wait_start_time
+
+ # ===== Detailed Statistics and Report =====
+ overall_time = time.time() - overall_start_time
+
+ print_log("\n" + "=" * 70)
+ print_log("Batch Generation Complete - Detailed Performance Report")
+ print_log("=" * 70)
+
+ success_count = sum(1 for r in final_results if r.get('success'))
+ fail_count = len(final_results) - success_count
+ total_tracks = sum(r.get('tracks_count', 0) for r in final_results if r.get('success'))
+
+ successful_results = [r for r in final_results if r.get('success')]
+
+ # Basic Statistics
+ print_log(f"\n[Basic Statistics]")
+ print_log(f" Total songs: {len(all_data)}")
+ print_log(f" Successful: {success_count}")
+ print_log(f" Failed: {fail_count}")
+ print_log(f" Total tracks: {total_tracks}")
+ if success_count > 0:
+ avg_tracks = total_tracks / success_count
+ print_log(f" Average tracks per song: {avg_tracks:.2f}")
+
+ # Time Statistics
+ print_log(f"\n[Time Statistics]")
+ print_log(f" ├── Submission phase: {submit_phase_time:.1f} seconds")
+ print_log(f" │ ├── Actual request time: {total_request_time:.2f} seconds")
+ print_log(f" │ └── Rate limit waiting: {rate_limit_stats['total_wait_time']:.2f} seconds")
+ print_log(f" ├── Generation waiting phase: {wait_phase_time:.1f} seconds")
+
+ if successful_results:
+ wait_times = [r.get('wait_time', 0) for r in successful_results if 'wait_time' in r]
+ download_times = [r.get('download_time', 0) for r in successful_results if 'download_time' in r]
+
+ if wait_times:
+ avg_wait = sum(wait_times) / len(wait_times)
+ min_wait = min(wait_times)
+ max_wait = max(wait_times)
+ print_log(f" │ ├── Average wait time: {avg_wait:.1f} seconds/song")
+ print_log(f" │ ├── Fastest: {min_wait:.1f} seconds")
+ print_log(f" │ └── Slowest: {max_wait:.1f} seconds")
+
+ if download_times:
+ total_download_time = sum(download_times)
+ avg_download = total_download_time / len(download_times)
+ print_log(f" ├── Download phase: {total_download_time:.1f} seconds")
+ print_log(f" │ └── Average download time: {avg_download:.2f} seconds/song")
+
+ print_log(f" └── Total time: {overall_time:.1f} seconds ({overall_time/60:.1f} minutes)")
+
+ # Single Song Generation Statistics
+ if successful_results:
+ total_times = [r.get('total_time', 0) for r in successful_results if 'total_time' in r]
+ if total_times:
+ print_log(f"\n[Single Song Generation Statistics]")
+ avg_time = sum(total_times) / len(total_times)
+ min_time = min(total_times)
+ max_time = max(total_times)
+ print_log(f" Average total time per song: {avg_time:.1f} seconds")
+ print_log(f" Fastest generation: {min_time:.1f} seconds")
+ print_log(f" Slowest generation: {max_time:.1f} seconds")
+
+ # Download Statistics
+ total_download_bytes = sum(r.get('download_bytes', 0) for r in successful_results)
+ total_download_count = sum(r.get('download_count', 0) for r in successful_results)
+
+ if total_download_bytes > 0:
+ print_log(f"\n[Download Statistics]")
+ print_log(f" Total download: {format_bytes(total_download_bytes)}")
+ print_log(f" Number of files: {total_download_count}")
+ print_log(f" Average file size: {format_bytes(total_download_bytes / total_download_count)}")
+
+ download_speeds = [r.get('avg_download_speed', 0) for r in successful_results if r.get('avg_download_speed', 0) > 0]
+ if download_speeds:
+ avg_speed = sum(download_speeds) / len(download_speeds)
+ print_log(f" Average download speed: {format_speed(avg_speed)}")
+
+ # Polling Statistics
+ poll_counts = [r.get('poll_count', 0) for r in successful_results if 'poll_count' in r]
+ if poll_counts:
+ total_polls = sum(poll_counts)
+ avg_polls = total_polls / len(poll_counts)
+ print_log(f"\n[Polling Statistics]")
+ print_log(f" Total polling count: {total_polls}")
+ print_log(f" Average polling per song: {avg_polls:.1f}")
+
+ # Efficiency Analysis
+ print_log(f"\n[Efficiency Analysis]")
+ if success_count > 0:
+ throughput = success_count / (overall_time / 60)
+ print_log(f" Actual throughput: {throughput:.2f} songs/minute")
+
+ # Theoretical fastest time (assuming no rate limit)
+ if wait_times:
+ ideal_time = submit_phase_time - rate_limit_stats['total_wait_time'] + max(wait_times)
+ efficiency = (ideal_time / overall_time) * 100
+ print_log(f" Theoretical fastest time: {ideal_time:.1f} seconds")
+ print_log(f" Concurrency efficiency: {efficiency:.1f}%")
+
+ # Show failed songs
+ if fail_count > 0:
+ print_log("\n" + "=" * 70)
+ print_log("Failed Songs List")
+ print_log("=" * 70)
+ for r in sorted(final_results, key=lambda x: x.get('song_index', 0)):
+ if not r.get('success'):
+ song_id = r.get('song_id', r.get('song_index', 'Unknown'))
+ print_log(f" [{song_id}] {r.get('error', 'Unknown error')}")
+
+ print_log("\n" + "=" * 70)
+ print_log(f"All files saved to: {os.path.abspath(output_dir)}")
+ print_log("=" * 70)
+
+
+if __name__ == '__main__':
+ main()
\ No newline at end of file
diff --git a/eval_pipeline/api_key.py b/eval_pipeline/api_key.py
new file mode 100644
index 0000000000000000000000000000000000000000..58fd4ffb316120e8c8eb6cef109e70d49bfa1985
--- /dev/null
+++ b/eval_pipeline/api_key.py
@@ -0,0 +1,4 @@
+def get_key():
+ # Please fill in your DashScope API Key here
+ return "YOUR_API_KEY_HERE"
+
diff --git a/eval_pipeline/calc_per.py b/eval_pipeline/calc_per.py
new file mode 100644
index 0000000000000000000000000000000000000000..c9f0cc874d40d925aa618831d8734e85c6457ec0
--- /dev/null
+++ b/eval_pipeline/calc_per.py
@@ -0,0 +1,117 @@
+#!/usr/bin/env python3
+"""
+PER Calculation: Calculate Phoneme Error Rate based on transcription results and GT lyrics
+Usage: python calc_per.py --hyp_file --gt_file --model_name --output
+"""
+import argparse, json, os, re
+from tqdm import tqdm
+import phoneme_utils
+
+def extract_idx(filename):
+ """Extract index from filename"""
+ matches = re.findall(r'\d+', os.path.splitext(filename)[0])
+ return int(matches[-1]) if matches else None
+
+def signal_filter(text:str):
+ """Symbol processing, convert all to spaces"""
+ pattern = r'[ ,。",:;&—''\'.\]\[()?\n-]'
+ text = re.sub(pattern, ' ', text)
+ while text.find(" ") != -1:
+ text = text.replace(" ", " ")
+ return text
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--hyp_file", required=True, help="Transcription file (jsonl)")
+ parser.add_argument("--gt_file", required=True, help="GT lyrics file (jsonl)")
+ parser.add_argument("--model_name", required=True, help="Model name")
+ parser.add_argument("--output", required=True, help="Output result file")
+ parser.add_argument("--offset", type=int, default=0, help="Index offset")
+ args = parser.parse_args()
+
+ # Load GT
+ gt = {}
+ with open(args.gt_file, 'r', encoding='utf-8') as f:
+ for line in f:
+ try:
+ rec = json.loads(line)
+ idx = rec.get('file_index')
+ if idx is not None:
+ gt[idx] = rec['lyrics']
+ except:
+ continue
+ print(f"Loaded {len(gt)} ground truth lyrics")
+
+ # Load transcription and calculate PER
+ per_scores = []
+ details = []
+
+ with open(args.hyp_file, 'r', encoding='utf-8') as f:
+ for id, line in tqdm(enumerate(f), desc="Calculating PER"):
+ try:
+ rec = json.loads(line)
+ hyp_text = rec.get('hyp_text', '')
+ filename = rec.get('file_name', '')
+
+ idx = rec.get('file_idx')
+ if idx is None:
+ idx = extract_idx(filename)
+ if idx is None:
+ continue
+
+ gt_idx = idx + args.offset
+ if gt_idx not in gt:
+ continue
+
+ ref_text = gt[gt_idx]
+
+ # Punctuation processing
+ ref_text = signal_filter(ref_text)
+ hyp_text = signal_filter(hyp_text)
+
+ # Extract common length
+ min_len = min(len(ref_text), len(hyp_text))
+ ref_text = ref_text[:min_len]
+ hyp_text = hyp_text[:min_len]
+
+ # Convert to phonemes and calculate PER
+ ref_phonemes = phoneme_utils.get_phonemes(ref_text)
+ hyp_phonemes = phoneme_utils.get_phonemes(hyp_text)
+ per = phoneme_utils.calc_per(ref_phonemes, hyp_phonemes)
+
+ per_scores.append(per)
+ details.append({
+ "file": filename,
+ "idx": idx,
+ "per": per,
+ "ref_text": ref_text,
+ "hyp_text": hyp_text,
+ "ref_phonemes": " ".join(ref_phonemes),
+ "hyp_phonemes": " ".join(hyp_phonemes)
+ })
+ except:
+ continue
+
+ # Save results
+ os.makedirs(os.path.dirname(args.output), exist_ok=True)
+ avg_per = sum(per_scores) / len(per_scores) if per_scores else 1.0
+
+ with open(args.output, 'w', encoding='utf-8') as f:
+ json.dump({
+ "model": args.model_name,
+ "metrics": {"PER": avg_per},
+ "count": len(per_scores)
+ }, f, indent=2)
+
+ # Save detailed results
+ details_file = args.output.replace('.json', '_details.jsonl')
+ with open(details_file, 'w', encoding='utf-8') as f:
+ for d in details:
+ f.write(json.dumps(d, ensure_ascii=False) + '\n')
+
+ print(f"Average PER: {avg_per:.4f} ({len(per_scores)} samples)")
+ print(f"Saved: {args.output}")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/eval_pipeline/config.sh b/eval_pipeline/config.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1de84251d9094827abf8df43e90298b134ddaca3
--- /dev/null
+++ b/eval_pipeline/config.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+# ========================================
+# Evaluation Configuration File
+# Usage: source config.sh
+# ========================================
+
+# Models to evaluate (name:audio_directory)
+MODELS=(
+ "levo:xxx/SongGeneration/data/levo"
+ # "ace-step:xxx/ACE-Step/data/ace-step"
+ # "yue:xxx/YuE/inference/yue"
+ # "diffRhythm2:xxx/diffrhythm2/results/diffRhythm2"
+ # "..."
+)
+
+# Base paths
+BASE_DIR="eval_pipeline"
+RESULTS_DIR="$BASE_DIR/results"
+GT_DIR="$BASE_DIR/gt_lyrics"
+PROMPTS_DIR="$BASE_DIR/prompts"
+
+# Model checkpoints
+SONGEVAL_CKPT="xxx/SongEval/ckpt/model.safetensors"
+SONGEVAL_CONFIG="xxx/SongEval/config.yaml"
+MUQ_MODEL="MuQ-large-msd-iter"
+MULAN_MODEL="MuQ-MuLan-large"
+AUDIOBOX_CKPT="xxx/audiobox-aesthetics_ckpt/checkpoint.pt"
+
+# Conda environments
+ENV_SONGEVAL="xxx"
+ENV_AUDIOBOX="xxx"
+ENV_MULAN="xxx"
+ENV_PER="xxx"
+
+# Default GPU
+DEFAULT_GPU=0
diff --git a/eval_pipeline/convert_lyrics.py b/eval_pipeline/convert_lyrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..92cb6f7c65ebedea3a498ba75570aae6a850660b
--- /dev/null
+++ b/eval_pipeline/convert_lyrics.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+"""
+Convert existing lyric files to transcription.jsonl format
+Usage: python convert_lyrics.py --input_dir --output
+
+Input format (xxx.txt):
+ First line: Chinese/English (optional, will be ignored)
+ Second line and after: Lyric content
+
+Output format (transcription.jsonl):
+ {"file_path": "...", "file_name": "xxx.mp3", "file_idx": 1, "hyp_text": "lyrics"}
+"""
+import argparse, json, os, re, glob
+from pathlib import Path
+
+def extract_idx(filename):
+ """Extract index from filename (last number sequence)"""
+ matches = re.findall(r'\d+', os.path.splitext(filename)[0])
+ return int(matches[-1]) if matches else None
+
+def read_lyrics(txt_path):
+ """Read txt file and extract lyrics"""
+ with open(txt_path, 'r', encoding='utf-8') as f:
+ lines = f.readlines()
+
+ # Skip first line if it's a language identifier
+ if lines and lines[0].strip().lower() in ['chinese', 'english', 'zh', 'en']:
+ lines = lines[1:]
+
+ # Merge remaining lines as lyrics
+ lyrics = ' '.join(line.strip() for line in lines if line.strip())
+ return lyrics
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--input_dir", required=True, help="Directory containing txt lyric files")
+ parser.add_argument("--output", default="", help="Output file (default: input_dir/transcription.jsonl)")
+ args = parser.parse_args()
+
+ input_dir = Path(args.input_dir)
+ output_file = args.output if args.output else input_dir / "transcription.jsonl"
+
+ # Find all txt files
+ txt_files = sorted(glob.glob(str(input_dir / "*.txt")))
+ print(f"Found {len(txt_files)} txt files in {input_dir}")
+
+ records = []
+ for txt_path in txt_files:
+ txt_name = os.path.basename(txt_path)
+ idx = extract_idx(txt_name)
+
+ # Infer corresponding audio filename
+ base_name = os.path.splitext(txt_name)[0]
+ # Try to find corresponding audio file
+ audio_name = None
+ for ext in ['.mp3', '.wav']:
+ candidate = input_dir / f"{base_name}{ext}"
+ if candidate.exists():
+ audio_name = f"{base_name}{ext}"
+ break
+ if not audio_name:
+ audio_name = f"{base_name}.mp3" # Default
+
+ lyrics = read_lyrics(txt_path)
+
+ rec = {
+ "file_path": str(input_dir / audio_name),
+ "file_name": audio_name,
+ "file_idx": idx,
+ "hyp_text": lyrics
+ }
+ records.append(rec)
+
+ # Sort by index
+ records.sort(key=lambda x: x["file_idx"] if x["file_idx"] is not None else 999999)
+
+ # Write output
+ with open(output_file, 'w', encoding='utf-8') as f:
+ for rec in records:
+ f.write(json.dumps(rec, ensure_ascii=False) + '\n')
+
+ print(f"Converted {len(records)} files -> {output_file}")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/eval_pipeline/cut_audio.py b/eval_pipeline/cut_audio.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb8d064db04c6265e7da689dbe1339b3b229a00e
--- /dev/null
+++ b/eval_pipeline/cut_audio.py
@@ -0,0 +1,22 @@
+import os
+from tqdm import tqdm
+from pydub import AudioSegment
+
+def cut_dir(dir:str, save_dir:str):
+ os.makedirs(save_dir, exist_ok=True)
+ for name in tqdm(os.listdir(dir), desc="Cutting Audios"):
+ if name.endswith(".txt") or name.endswith(".jsonl"):
+ continue
+ path = os.path.join(dir, name)
+ audio = AudioSegment.from_file(path)
+ three_minutes = 60 * 1000
+ audio_3min = audio[:three_minutes]
+
+ new_path = os.path.join(save_dir, name)
+ audio_3min.export(new_path, format="mp3")
+
+dirs = ["./audio/yue_cn", "./audio/yue_en", "./audio/ace-step_cn", "./audio/ace-step_en"]
+save_dirs = ["./audio/yue_cut2_cn", "./audio/yue_cut2_en", "./audio/ace-step_cut2_cn", "./audio/ace-step_cut2_en"]
+
+for dir, save_dir in zip(dirs, save_dirs):
+ cut_dir(dir, save_dir)
\ No newline at end of file
diff --git a/eval_pipeline/eval_audiobox.py b/eval_pipeline/eval_audiobox.py
new file mode 100644
index 0000000000000000000000000000000000000000..40253ca0b5d4320cf814c755685394a97af2de40
--- /dev/null
+++ b/eval_pipeline/eval_audiobox.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+"""
+Audiobox Evaluation: Evaluate audio aesthetic scores
+Usage: python eval_audiobox.py --input_dir --model_name --output
+Output: Summary results + _details.jsonl detailed results
+Requires sao environment
+"""
+import argparse, json, os, glob, subprocess, tempfile, re
+
+def extract_idx(filename):
+ matches = re.findall(r'\d+', os.path.splitext(filename)[0])
+ return int(matches[-1]) if matches else None
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--input_dir", required=True)
+ parser.add_argument("--model_name", required=True)
+ parser.add_argument("--output", required=True)
+ parser.add_argument("--ckpt", default="xxx/audiobox-aesthetics_ckpt/checkpoint.pt")
+ parser.add_argument("--batch_size", type=int, default=4)
+ args = parser.parse_args()
+
+ # Collect files
+ files = sorted(glob.glob(f"{args.input_dir}/*.wav") + glob.glob(f"{args.input_dir}/*.mp3"))
+
+ # Write temporary file
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
+ for p in files:
+ f.write(json.dumps({"path": os.path.abspath(p)}) + '\n')
+ paths_file = f.name
+
+ # Run audio-aes
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
+ scores_file = f.name
+
+ cmd = f'audio-aes "{paths_file}" --batch-size {args.batch_size}'
+ if os.path.exists(args.ckpt):
+ cmd += f' --ckpt "{args.ckpt}"'
+
+ with open(scores_file, 'w') as out:
+ subprocess.run(cmd, shell=True, stdout=out)
+
+ # Parse results - match input files in order
+ metrics = {"CE": [], "CU": [], "PC": [], "PQ": []}
+ details = []
+
+ # Read all valid score lines
+ score_records = []
+ with open(scores_file) as f:
+ for line in f:
+ if not line.strip(): continue
+ try:
+ rec = json.loads(line)
+ # Check if contains score fields
+ if all(k in rec for k in ["CE", "CU", "PC", "PQ"]):
+ score_records.append(rec)
+ except: pass
+
+ # Match files and scores in order
+ for i, file_path in enumerate(files):
+ if i >= len(score_records):
+ break
+
+ rec = score_records[i]
+ filename = os.path.basename(file_path)
+
+ file_scores = {
+ "CE": rec["CE"],
+ "CU": rec["CU"],
+ "PC": rec["PC"],
+ "PQ": rec["PQ"]
+ }
+ file_scores["Score"] = sum(file_scores.values())
+
+ for k in ["CE", "CU", "PC", "PQ"]:
+ metrics[k].append(rec[k])
+
+ details.append({
+ "file": filename,
+ "idx": extract_idx(filename),
+ "scores": file_scores
+ })
+
+ # Calculate average
+ avg = {k: sum(v)/len(v) if v else 0 for k, v in metrics.items()}
+ avg["Score"] = sum(avg.values())
+
+ os.makedirs(os.path.dirname(args.output), exist_ok=True)
+ with open(args.output, 'w') as f:
+ json.dump({"model": args.model_name, "metrics": avg, "count": len(files)}, f, indent=2)
+
+ # Save detailed results
+ details_file = args.output.replace('.json', '_details.jsonl')
+ with open(details_file, 'w', encoding='utf-8') as f:
+ for d in details:
+ f.write(json.dumps(d, ensure_ascii=False) + '\n')
+
+ # Cleanup
+ os.unlink(paths_file)
+ os.unlink(scores_file)
+ print(f"Saved: {args.output}")
+ print(f"Details: {details_file}")
+
+if __name__ == "__main__":
+ main()
diff --git a/eval_pipeline/eval_mulan_t.py b/eval_pipeline/eval_mulan_t.py
new file mode 100644
index 0000000000000000000000000000000000000000..32396bb512d772b85288661cd54782f5e05f527d
--- /dev/null
+++ b/eval_pipeline/eval_mulan_t.py
@@ -0,0 +1,76 @@
+#!/usr/bin/env python3
+"""
+Mulan-T Evaluation: Calculate similarity between audio and text prompts
+Usage: python eval_mulan_t.py --input_dir --prompts --model_name --output
+Output: Summary results + _details.jsonl detailed results
+"""
+import argparse, json, os, re, sys, glob
+import librosa, torch
+from tqdm import tqdm
+
+sys.path.append("Music_eval")
+from muq import MuQMuLan
+
+def extract_idx(filename):
+ matches = re.findall(r'\d+', os.path.splitext(filename)[0])
+ return int(matches[-1]) if matches else None
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--input_dir", required=True)
+ parser.add_argument("--prompts", required=True)
+ parser.add_argument("--model_name", required=True)
+ parser.add_argument("--output", required=True)
+ parser.add_argument("--model", default="MuQ-MuLan-large")
+ parser.add_argument("--gpu", type=int, default=0)
+ args = parser.parse_args()
+
+ device = f"cuda:{args.gpu}" if torch.cuda.is_available() else "cpu"
+
+ with open(args.prompts) as f:
+ prompts = json.load(f)
+
+ model = MuQMuLan.from_pretrained(args.model).to(device).eval()
+
+ files = sorted(glob.glob(f"{args.input_dir}/*.wav") + glob.glob(f"{args.input_dir}/*.mp3"))
+ scores = []
+ details = []
+
+ for f in tqdm(files, desc="Mulan-T"):
+ idx = extract_idx(os.path.basename(f))
+ if idx is None or idx >= len(prompts): continue
+
+ try:
+ wav, _ = librosa.load(f, sr=24000)
+ wavs = torch.tensor(wav).unsqueeze(0).to(device)
+ with torch.no_grad():
+ audio_emb = model(wavs=wavs)
+ text_emb = model(texts=[prompts[idx]])
+ sim = model.calc_similarity(audio_emb, text_emb).item()
+ scores.append(sim)
+
+ details.append({
+ "file": os.path.basename(f),
+ "idx": idx,
+ "prompt": prompts[idx],
+ "similarity": sim
+ })
+ except Exception as e:
+ print(f"Error {f}: {e}")
+
+ os.makedirs(os.path.dirname(args.output), exist_ok=True)
+ avg = sum(scores)/len(scores) if scores else 0
+ with open(args.output, 'w') as f:
+ json.dump({"model": args.model_name, "metrics": {"Mulan-T": avg}, "count": len(scores)}, f, indent=2)
+
+ # Save detailed results
+ details_file = args.output.replace('.json', '_details.jsonl')
+ with open(details_file, 'w', encoding='utf-8') as f:
+ for d in details:
+ f.write(json.dumps(d, ensure_ascii=False) + '\n')
+
+ print(f"Saved: {args.output}")
+ print(f"Details: {details_file}")
+
+if __name__ == "__main__":
+ main()
diff --git a/eval_pipeline/eval_songeval.py b/eval_pipeline/eval_songeval.py
new file mode 100644
index 0000000000000000000000000000000000000000..58abd44cb1a350cd9f1af6ade332fc31a1e3568f
--- /dev/null
+++ b/eval_pipeline/eval_songeval.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+"""
+SongEval Evaluation: Evaluate audio quality in 5 dimensions
+Usage: python eval_songeval.py --input_dir --model_name --output
+Output: Summary results + _details.jsonl detailed results
+"""
+import argparse, json, os, sys, glob, re
+import librosa, torch
+from tqdm import tqdm
+
+sys.path.append("SongEval")
+sys.path.append("xxx/MuQ/src")
+from hydra.utils import instantiate
+from muq import MuQ
+from omegaconf import OmegaConf
+from safetensors.torch import load_file
+
+METRICS = ['Coherence', 'Musicality', 'Memorability', 'Clarity', 'Naturalness']
+
+def extract_idx(filename):
+ matches = re.findall(r'\d+', os.path.splitext(filename)[0])
+ return int(matches[-1]) if matches else None
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--input_dir", required=True)
+ parser.add_argument("--model_name", required=True)
+ parser.add_argument("--output", required=True)
+ parser.add_argument("--ckpt", default="xxx/SongEval/ckpt/model.safetensors")
+ parser.add_argument("--config", default="xxx/SongEval/config.yaml")
+ parser.add_argument("--muq", default="xxx/MuQ-large-msd-iter")
+ parser.add_argument("--gpu", type=int, default=0)
+ args = parser.parse_args()
+
+ device = f"cuda:{args.gpu}" if torch.cuda.is_available() else "cpu"
+
+ # Load model
+ config = OmegaConf.load(args.config)
+ model = instantiate(config.generator).to(device).eval()
+ model.load_state_dict(load_file(args.ckpt, device="cpu"), strict=False)
+ muq = MuQ.from_pretrained(args.muq).to(device).eval()
+
+ # Evaluate
+ files = sorted(glob.glob(f"{args.input_dir}/*.wav") + glob.glob(f"{args.input_dir}/*.mp3"))
+ scores_all = {m: [] for m in METRICS}
+ details = []
+
+ for f in tqdm(files, desc="SongEval"):
+ try:
+ wav, _ = librosa.load(f, sr=24000)
+ audio = torch.tensor(wav).unsqueeze(0).to(device)
+
+ with torch.no_grad():
+ features = muq(audio, output_hidden_states=True)["hidden_states"][6]
+ scores = model(features).squeeze(0)
+
+ file_scores = {}
+ for i, m in enumerate(METRICS):
+ val = scores[i].item()
+ scores_all[m].append(val)
+ file_scores[m] = val
+
+ details.append({
+ "file": os.path.basename(f),
+ "idx": extract_idx(os.path.basename(f)),
+ "scores": file_scores
+ })
+ except Exception as e:
+ print(f"Error {f}: {e}")
+ torch.cuda.empty_cache()
+
+ # Save summary
+ os.makedirs(os.path.dirname(args.output), exist_ok=True)
+ avg = {m: sum(v)/len(v) if v else 0 for m, v in scores_all.items()}
+ with open(args.output, 'w') as f:
+ json.dump({"model": args.model_name, "metrics": avg, "count": len(files)}, f, indent=2)
+
+ # Save detailed results
+ details_file = args.output.replace('.json', '_details.jsonl')
+ with open(details_file, 'w', encoding='utf-8') as f:
+ for d in details:
+ f.write(json.dumps(d, ensure_ascii=False) + '\n')
+
+ print(f"Saved: {args.output}")
+ print(f"Details: {details_file}")
+
+if __name__ == "__main__":
+ main()
diff --git a/eval_pipeline/fill_missing.py b/eval_pipeline/fill_missing.py
new file mode 100644
index 0000000000000000000000000000000000000000..fceecd17802ce95100d671d1842ee6c0d76f04ad
--- /dev/null
+++ b/eval_pipeline/fill_missing.py
@@ -0,0 +1,145 @@
+#!/usr/bin/env python3
+"""
+Fill missing transcriptions: Check for missing transcriptions in split directory and call ASR to fill them
+Usage: python fill_missing.py [--api_key KEY]
+Example: python fill_missing.py ./audio/sunov4_5_cn
+ Check for missing entries in transcription.jsonl, call ASR on missing audio and fill them
+"""
+import argparse, json, os, re, glob, subprocess, sys
+from pathlib import Path
+from tqdm import tqdm
+
+# Import API key
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from api_key import get_key
+
+def extract_idx(filename):
+ matches = re.findall(r'\d+', os.path.splitext(filename)[0])
+ return int(matches[-1]) if matches else None
+
+def transcribe(audio_path, api_key):
+ """Call qwen3-asr and filter redundant output"""
+ try:
+ result = subprocess.run(
+ ['qwen3-asr', '-i', audio_path, '-key', api_key],
+ capture_output=True, text=True, timeout=120
+ )
+ output = result.stdout.strip()
+
+ # Filter redundant logs
+ lines = output.split('\n')
+ transcription = ""
+ for line in lines:
+ line = line.strip()
+ if not line:
+ continue
+ # Filter log lines
+ if any(skip in line for skip in [
+ "Loaded wav duration:", "DETECTED LANGUAGE", "Detected Language:",
+ "FULL TRANSCRIPTION OF", "Wav duration is longer than",
+ "Silero VAD model for segmenting", "saved to", "Retry",
+ "status_code", "Throttling.RateQuota"
+ ]):
+ continue
+ # Handle Full Transcription: prefix
+ if "Full Transcription:" in line:
+ parts = line.split("Full Transcription:", 1)
+ if len(parts) > 1:
+ line = parts[1].strip()
+ else:
+ continue
+ # Handle Segmenting done line
+ if "Segmenting done, total segments" in line:
+ if "segments:" in line:
+ parts = line.split("segments:", 1)
+ remaining = parts[1].strip()
+ match = re.match(r'^\d+\s*(.*)', remaining)
+ if match and match.group(1):
+ line = match.group(1)
+ else:
+ continue
+ transcription += line + " "
+
+ return transcription.strip()
+ except Exception as e:
+ print(f"ASR Error {audio_path}: {e}")
+ return ""
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("input_dir", help="Split directory (contains audio and transcription.jsonl)")
+ parser.add_argument("--api_key", default="", help="API Key (default: read from api_key.py)")
+ args = parser.parse_args()
+
+ # Get API key
+ api_key = args.api_key if args.api_key else get_key()
+ args.api_key = api_key
+
+ input_dir = Path(args.input_dir)
+ trans_file = input_dir / "transcription.jsonl"
+
+ # Get all audio files
+ audio_files = sorted(glob.glob(str(input_dir / "*.mp3")) + glob.glob(str(input_dir / "*.wav")))
+ audio_indices = {}
+ for f in audio_files:
+ idx = extract_idx(os.path.basename(f))
+ if idx is not None:
+ audio_indices[idx] = f
+
+ print(f"Found {len(audio_indices)} audio files")
+
+ # Read existing transcriptions
+ existing = set()
+ records = []
+ if trans_file.exists():
+ with open(trans_file, 'r', encoding='utf-8') as f:
+ for line in f:
+ try:
+ rec = json.loads(line)
+ records.append(rec)
+ idx = rec.get('file_idx')
+ if idx is not None:
+ existing.add(idx)
+ except:
+ continue
+
+ print(f"Existing transcriptions: {len(existing)}")
+
+ # Find missing ones
+ missing = [idx for idx in audio_indices if idx not in existing]
+ missing.sort()
+
+ if not missing:
+ print("No missing transcriptions!")
+ return
+
+ print(f"Missing {len(missing)} transcriptions: {missing}")
+
+ # Transcribe missing ones
+ new_records = []
+ for idx in tqdm(missing, desc="Transcribing missing"):
+ audio_path = audio_indices[idx]
+ hyp_text = transcribe(audio_path, args.api_key)
+
+ rec = {
+ "file_path": audio_path,
+ "file_name": os.path.basename(audio_path),
+ "file_idx": idx,
+ "hyp_text": hyp_text
+ }
+ new_records.append(rec)
+
+ # Merge and sort
+ all_records = records + new_records
+ all_records.sort(key=lambda x: x.get("file_idx", 999999))
+
+ # Write back
+ with open(trans_file, 'w', encoding='utf-8') as f:
+ for rec in all_records:
+ f.write(json.dumps(rec, ensure_ascii=False) + '\n')
+
+ print(f"Added {len(new_records)} transcriptions, total: {len(all_records)}")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/eval_pipeline/gt_lyrics/en.jsonl b/eval_pipeline/gt_lyrics/en.jsonl
new file mode 100644
index 0000000000000000000000000000000000000000..605e5ce42e1e369b3d8357b7b4206f5517fe5079
--- /dev/null
+++ b/eval_pipeline/gt_lyrics/en.jsonl
@@ -0,0 +1,50 @@
+{"id": 0, "file_origin": "final_en_test_multi.jsonl", "file_index": 0, "lyrics": "Stepping into shadows, where the secrets lie A dance between the whispers and the silent cry The city never sleeps, it pulses through the night Each heartbeat tells a story, wrapped in city light We’re searching for the answers, in the moon’s embrace Footsteps on the pavement, in this timeless space We are the wanderers, lost along the way Chasing all the echoes, where the shadows play We are the dreamers, in a world so wide Finding all the colors, in the gray inside The stories of the streetlights, illuminating fate The laughter and the heartache, every twist and weight In the depths of longing, we ignite the fire Painting life in motion, fueled with pure desire So let’s walk this journey, hand in hand we’ll roam Between the shades of twilight, we’ll find our way home We are the wanderers, lost along the way Chasing all the echoes, where the shadows play We are the dreamers, in a world so wide Finding all the colors, in the gray inside In the tangled alleys, secrets intertwine Every twist and turn, the past and dreams combine And in this fleeting moment, magic comes alive With every single heartbeat, we learn to survive We are the wanderers, lost along the way Chasing all the echoes, where the shadows play We are the dreamers, in a world so wide Finding all the colors, in the gray inside We are the wanderers, reaching for the sky With hearts wide open, we will learn to fly ... We are the dreamers, painting life anew With colors of the sunset, we’ll write our own view Finding in the shadows, there's a light inside We are the wanderers, where our dreams collide"}
+{"id": 1, "file_origin": "final_en_test_multi.jsonl", "file_index": 1, "lyrics": "In a world of roses, I saw thorns Where laughter hides behind the scorn Each raindrop holds a secret pain Yet comes a chance to dance again The streets are weary with the past But hope is flickering, I grasp I won't let shadows steal my dreams I'll find the gold in hidden seams Underneath, the grit and grime There lies a rhythm, a pulse of time A river flows, carrying my heart Navigating where we fell apart Each wave brings lessons soft as lace Lessons taught in every trace Yet in the twilight, there’s a song Of resilience, where we belong I won't let shadows steal my dreams I'll find the gold in hidden seams Underneath, the grit and grime There lies a rhythm, a pulse of time The daffodils bloom where the sun meets rain A dance of joy born from the pain Let every heartbeat drum with love We rise, like stars in skies above I won't let shadows steal my dreams I'll find the gold in hidden seams Underneath, the grit and grime There lies a rhythm, a pulse of time I won’t let shadows dim my light I’ll keep on dancing through the night With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds Of dreams fulfilled, my heart is freed With every tear, I sow the seeds"}
+{"id": 2, "file_origin": "final_en_test_multi.jsonl", "file_index": 2, "lyrics": "In the amber glow of twilight's embrace Memories dance like shadows in the evening Whispers of laughter echo through the air Every heartbeat a fragment of what we've known Worn-out stories tethered to the stars With every echo, a part of the past returns The horizon holds dreams we once let go A tapestry woven with threads of the heart And as the sun dips low into the sea I can feel the warmth lingering on my skin A promise of tomorrows wrapped in the night Dreams flicker like fireflies in the dark So here's to the nights we won't forget Where love was a poem penned by the moon With every heartbeat, a secret to share In the quiet moments, we found who we are Through the winding paths of the forest deep We chased the echoes that time tried to steal With hands intertwined, we wove our fate A symphony sung beneath the cover of leaves Each heartbeat a note in the song of our lives As seasons change, we gather the pieces The colors may fade, but the love will remain A chorus of hearts that forever beats true And as the stars emerge to light the sky I'll wrap your dreams in a silken embrace The journey ahead, filled with whispers of hope Together we’ll rise like the dawn’s gentle kiss So here's to the nights we won't forget Where love was a poem penned by the moon With every heartbeat, a secret to share In the quiet moments, we found who we are In the silence between our words, we found a truth The paths we’ve wandered shaped us like the rivers Every tear shed has been a stepping stone To the laughter that glimmers in our soulful eyes The future ahead won't hold us back If we stand united, hearts open wide So here's to the nights we won't forget Where love was a poem penned by the moon With every heartbeat, a secret to share In the quiet moments, we found who we are In the quiet moments, we found who we are In the quiet moments, we found who we are In the quiet moments, we found who we are In the quiet moments, we found who we are"}
+{"id": 3, "file_origin": "final_en_test_multi.jsonl", "file_index": 3, "lyrics": "We met under neon signs at midnight Lost in the rhythm of a city alive Your laughter was the spark igniting flames In a world where dreams collided like stars Every step on the pavement shines with hope Let’s ride this wave until the break of dawn For every moment we chased felt like a dream With you, I’m dancing on the edges of time Raise your hands and let the music take control Feel the heartbeat of the night fueling our souls With every beat, we’re closer to the stars Together we can light up everything in sight So let’s run wild under the starlit sky With colors exploding, we’re forever young In this moment, nothing can hold us back With our hearts beating as one Splashing through puddles in the pouring rain Every drop is a sign we’re living free We’ll paint the town with laughter and fire Creating memories that no one can erase Hands held tight on this beautiful ride We’ll carve our names into the fabric of night In every heartbeat, a brand new beginning Together we’ll write our never-ending story Raise your hands and let the music take control Feel the heartbeat of the night fueling our souls With every beat, we’re closer to the stars Together we can light up everything in sight So let’s run wild under the starlit sky With colors exploding, we’re forever young In this moment, nothing can hold us back With our hearts beating as one Feel the rush, don’t let go of this high We’re invincible, shining under the stars Every step is a chance to chase our fate This dance we create, a love mantra in moonlight With the world watching, we’ll never back down This night will be ours until the last call So let’s run wild under the starlit sky With colors exploding, we’re forever young In this moment, nothing can hold us back With our hearts beating as one So let’s run wild under the starlit sky With colors exploding, we’re forever young In this moment, nothing can hold us back With our hearts beating as one With our hearts beating as one With our hearts beating as one With our hearts beating as one With our hearts beating as one With our hearts beating as one With our hearts beating as one With our hearts beating as one With our hearts beating as one"}
+{"id": 4, "file_origin": "final_en_test_multi.jsonl", "file_index": 4, "lyrics": "When the day fades away In the golden haze of twilight Memories dance through my mind As the stars begin their flight I wander through the fields of time Each whisper a gentle chime Oh, can you feel the echoes of our days? In every laugh, in every gaze A tapestry of dreams unspooled In these moments, we're forever fueled The river flows, it never stays But in our hearts, the glow remains With every step, the world unfolds A story in the silence told Each note of life a thread we weave In times of joy, when we believe Oh, can you feel the echoes of our days? In every laugh, in every gaze A tapestry of dreams unspooled In these moments, we're forever fueled Underneath the weight of the night I find your spirit, holding tight In the shadows, we draw light A compass guiding us through the fight Oh, can you feel the echoes of our days? In every laugh, in every gaze A tapestry of dreams unspooled In these moments, we're forever fueled With every heartbeat, we will rise Through the storms and the clear skies In this journey, hand in hand We’ll follow the melodies we planned Oh, can you feel it weaving through our nights? Together forever, our souls ignite From dusk till dawn, we’ll make our way In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay In the echoes of love, we’ll forever stay"}
+{"id": 5, "file_origin": "final_en_test_multi.jsonl", "file_index": 5, "lyrics": "In a faded photograph, we smile together Sunlight dances on your curls, forever bright Whispers of the past linger in the breeze While the world around us slowly blurs from sight Oh, time may wear its footprints on our hearts Yet memories like threads, weaves a fabric so fine Sing to me the lullabies of yesterday Where innocence was a game we played Let me soak in the echoes of that melody As I dance beneath the echoing stars' parade Every step we took, in that secret park Each corner held a story written in the dark As autumn leaves would swirl, painting the ground With colors of laughter, in happiness we drowned Oh, the time may wear its footprints on our hearts Yet memories like threads, weaves a fabric so fine Sing to me the lullabies of yesterday Where innocence was a game we played Let me soak in the echoes of that melody As I dance beneath the echoing stars' parade But seasons fade to gray and dreams unwind Yet deep within this heart, the warmth will linger No more shall I fear those shadows of time For our melody, forever, we will remember So sing to me the lullabies of yesterday Where innocence was a game we played Let me soak in the echoes of that melody As I dance beneath the echoing stars' parade Till we meet again, through waves of dreams Our laughter woven into the fabric of dreams As I send my love to the skies that gleam Forever in this song, forever it redeems Oh, sing to me again, in the twilight’s glow For in every note, our story unfolds In every shadowed tear, our love grew bold And in this endless dance, our hearts will hold So sing, sing to me, for we are never apart Every love letter written upon my heart Till I close my eyes, and let my dreams restart With the whispers of you eternally in my art In the heart's whisper, our love will prolong As echoes of the past, forever in this song So sing, sing to me, say we will belong In symphonies of love, we will write our tale so strong."}
+{"id": 6, "file_origin": "final_en_test_multi.jsonl", "file_index": 6, "lyrics": "Wake me up, the world's on fire Dancing shadows under city lights, Echoes of our laughter fill the air, Electric pulses guide my every flight Take my hand, we’ll chase the night, Together breaking all the rules, We were born to shine, oh oh, In this cosmic race, we’ll run Feel the rhythm of our hearts collide, We’ll light the sky until the sun is done Through the chaos, we’ll find a way, Every heartbeat echoing the thrill, Lost in the music, we find our place, Every dream ignited, every heart we fill Take my hand, we’ll chase the night, Together breaking all the rules, We were born to shine, oh oh, In this cosmic race, we’ll run Feel the rhythm of our hearts collide, We’ll light the sky until the sun is done And when the stars begin to fade, We’ll write our story in the boldest way, With every beat, we elevate, Together dancing on the edge of fate We were born to shine, oh oh, In this cosmic race, we’ll run Feel the rhythm of our hearts collide, We’ll light the sky until the sun is done In the last glow of twilight's kiss, Let the music fade into the night, Hold me close, and let’s reminisce, For we were stars that were never meant to fight Every glance a spark, every whisper a flame, In this universe, we’ll forever remain, Let the melody guide us home, In this dance, no need to feel alone We were born to shine, oh oh, In this cosmic race, we’ll run Feel the rhythm of our hearts collide, We’ll light the sky until the sun is done Every heartbeat pounding, every note aligned, In the depths of silence, our spirits combined, For love is our rhythm, and dreams are our song, Together forever, where we belong In this melody, we are the light, Shining through the darkness, forever bright. Won't stop the music; let it play, We were born to shine, in a cosmic play."}
+{"id": 7, "file_origin": "final_en_test_multi.jsonl", "file_index": 7, "lyrics": "In the dawn's first light, shadows break away A whisper in the air, secrets of the day Clouds drift like thoughts, floating far and wide Chasing every dream, I feel alive inside With every heartbeat, I can feel the change Colors bleed together, nothing feels the same So let the sun shine down on me Wash away the doubts I'm free Together we can write this song Harmony where we all belong In fields of wildflowers, we laugh and play Every little moment, these memories stay Through valleys and mountains, we journey on The strength of the bond, it keeps us strong With every heartbeat, I can feel the change Colors bleed together, nothing feels the same So let the sun shine down on me Wash away the doubts I'm free Together we can write this song Harmony where we all belong In this crowded world, it's easy to lose sight But I’ll keep your dreams safe, hold them tight Like fireflies in jars, lighting up the night We’ll dance in the dark until morning light So let the sun shine down on me Wash away the doubts I'm free Together we can write this song Harmony where we all belong So let the sun shine down on me Wash away the doubts I'm free Together we can write this song Harmony where we all belong In the dawn's first light, we'll dance on the breeze Writing endless stories, hearts at ease In the glow of a daydream, we rise above This moment's ours forever, filled with love In this beautiful world, we’ll find our way Chasing every sunset, living in the sway So let the sun shine down, let the story unfold Every little heartbeat is a tale to be told In fields of wildflowers, where the laughter flows We’ll cherish these moments, as the journey grows With every heartbeat, I can feel the change Together we can write this song, never to be estranged So let the sun shine down on me And let the harmony set us free Together we can write this song In the dawn's first light, we all belong So let the sun shine down on me Wash away the doubts I'm free Together we can write this song Harmony where we all belong."}
+{"id": 8, "file_origin": "final_en_test_multi.jsonl", "file_index": 8, "lyrics": "Underneath the fading stars, memories intertwine Whispers of forgotten nights, like vintage wine Walking through the echoes, where shadows dance Every step a flicker, a fleeting glance And every corner turned holds a tale untold The stories of our youth in the heart behold Oh, show me the way back home tonight Where the laughter lingers in the pale moonlight Underneath the stars, where our dreams took flight Take me back to those days, hold me tight In photographs where laughter echoes softly still Every smile a reminder, time’s gentle thrill Through autumn leaves, a whisper of the past Time sways like branches, memories meant to last And every corner turned holds a tale untold The stories of our youth in the heart behold Oh, show me the way back home tonight Where the laughter lingers in the pale moonlight Underneath the stars, where our dreams took flight Take me back to those days, hold me tight The dreams of yesterday, they echo in my soul Like waves crashing softly, they’re part of the whole In the quiet moments, I can still feel you near A whisper of love that we used to hold dear Oh, show me the way back home tonight Where the laughter lingers in the pale moonlight Underneath the stars, where our dreams took flight Take me back to those days, hold me tight Oh, show me the way back home tonight Where the laughter lingers in the pale moonlight Underneath the stars, where our dreams took flight Take me back to those days, hold me tight In the silence where the shadows softly breathe I find solace in the memories we weave With every step, I trace the path we walked Recalling every moment, all the dreams we talked The warmth of your smile, it lingers in my heart In the fleeting echoes, you’re never far apart Tell me the stories, let them dance in my mind As the stars twinkle softly, our past intertwined Oh, show me the way back home tonight Where the laughter lingers in the pale moonlight Underneath the stars, where our dreams took flight Take me back to those days, hold me tight Oh, show me the way back home tonight Where the laughter lingers in the pale moonlight Underneath the stars, where our dreams took flight Take me back to those days, hold me tight. In the fading twilight, we’ll make things right Back to the laughter, the warmth, the light."}
+{"id": 9, "file_origin": "final_en_test_multi.jsonl", "file_index": 9, "lyrics": "In the shadows of this derelict town Where dreams lay shattered, scattered around I hear the echoes of lives left behind Finding my way through the dark of my mind Rising up to break the silence Fighting back from the violent balance Hold on tight, don’t lose your grip We’ll take this road, let it rip Through the chaos, we’ll ignite An anthem for our restless fight With every heartbeat, I can feel the charge Chasing the future, never looking far In a world so cold, I’ll gather my flame Celebrating life, never feeling shame Shatter the boundaries, break the mold With each scream, I’ll be bold Hold on tight, don’t lose your grip We’ll take this road, let it rip Through the chaos, we’ll ignite An anthem for our restless fight These ties that bind are fraying fast In the fire, I’ll reclaim my past Rising higher, fueled by the pain No more shadows, I’ll break every chain Hold on tight, don’t lose your grip We’ll take this road, let it rip Through the chaos, we’ll ignite An anthem for our restless fight Through the darkness, we’ll find our light In every heart, we’ll fan the flame Echoing through, we’ll rise from the same With every heartbeat, never constrained Together we’ll break, together we’ll reign With every scream and every vow We’ll stand together, stronger now On this road, we find our flight An anthem for our restless fight Through every moment, sharp as a knife We’ll sing together, claiming our life In the fire, we burn so bright We’ll be the stars that light the night We’ll be the stars that light the night Hold on tight; we’re ready to ignite An anthem for our restless fight Together we’ll rise; we’ll find our flight In every heart, we’ll fan the flame"}
+{"id": 10, "file_origin": "final_en_test_multi.jsonl", "file_index": 10, "lyrics": "In the morning light, dreams unfold Whispers of stories, waiting to be told Golden fields stretch, horizons wide Chasing the shadows, where memories hide Feeling the rhythm of the breeze Nature sings softly, puts my heart at ease With every heartbeat, I'm alive In this journey, hope will thrive Casting away all fears and doubt In the music of life, we’ll find our route Clouds drift above, painting the sky A canvas of moments, never passing by Through every tear, a lesson learned In the dance of the flames, passion burned Every heartbeat echoes through the night A symphony of feelings, reaching new heights With every heartbeat, I'm alive In this journey, hope will thrive Casting away all fears and doubt In the music of life, we’ll find our route Stars guide us through the darkest days With dreams as our compass, we'll find our ways In every sorrow, joy will bloom Together we'll rise, chasing the moon With every heartbeat, I'm alive In this journey, hope will thrive Casting away all fears and doubt In the music of life, we’ll find our route With love as our echo, we'll dance along In the melody of being, we forever belong With every heartbeat, I’m alive. We’ll find our route. With every heartbeat, I'm alive. Together we’ll thrive. In the dance of the stars. In the light of our hearts. With every heartbeat, I’m alive. We’ll find our dreams. Forever we’ll flow. Together, we will grow. And we’ll always know."}
+{"id": 11, "file_origin": "final_en_test_multi.jsonl", "file_index": 11, "lyrics": "Underneath the willow tree Lost in whispers of the breeze Memories of times long gone Echo softly in the dawn Holding onto faded dreams Rivers flowing through our seams Time moves on but stays forever In the heart, we are together Dance with me until the night Feel the stars and catch the light Whirling in this sweet refrain Letting go of all the pain Days are drifting like the clouds Hidden dreams beneath the shrouds Walking paths we used to share Searching for the love we dare Every laugh still lingers near In the silence, we can hear Beauty in the moment's grace In this fleeting, sacred space Dance with me until the night Feel the stars and catch the light Whirling in this sweet refrain Letting go of all the pain And if the shadows come to call We will rise above it all Hand in hand, we’ll face the storm In each other, we are warm Dance with me until the night Feel the stars and catch the light Whirling in this sweet refrain Letting go of all the pain Dance with me, we’ll find our way In the dawn of a brand new day Together, love will always stay In our hearts, forever sway Underneath the willow tree We will share our destiny Holding on through thick and thin In our love, we always win With the sun, forever shine Join your heart forever mine Listen to the joyful song In our arms, we both belong Underneath the stars above In this sweet, eternal love"}
+{"id": 12, "file_origin": "final_en_test_multi.jsonl", "file_index": 12, "lyrics": "In the twilight of fading dreams Whispers of time dance with the moon Promises linger like shadows cast Upon the fields where we once bloomed Through the echoes of laughter at dawn We chased the glimmers of sunlight Oh, how the seasons silently turn Carrying stories held in the breeze Every moment a page in the book Of love that bends beneath the trees Footsteps fade on cobblestone paths Where wildflowers once leaned for a kiss But in the silence, there's a whisper still An echo of memories I can't dismiss Like fireflies twirling in the night Their light dances, soft and bright Oh, how the seasons silently turn Carrying stories held in the breeze Every moment a page in the book Of love that bends beneath the trees What if we could pause this time? To relive every heartbeat, every sigh In the tapestry woven fine Together we would dare to fly Oh, how the seasons silently turn Carrying stories held in the breeze Every moment a page in the book Of love that bends beneath the trees Oh, how the seasons silently turn In the heart of this endless embrace With every breath we learn to breathe A melody lost in time and space A melody lost in time and space In the twilight of fading dreams Whispers of time dance with the moon Promises linger like shadows cast Upon the fields where we once bloomed Through the echoes of laughter at dawn We chased the glimmers of sunlight Oh, how the seasons silently turn Carrying stories held in the breeze Every moment a page in the book Of love that bends beneath the trees Footsteps fade on cobblestone paths Where wildflowers once leaned for a kiss But in the silence, there's a whisper still An echo of memories I can't dismiss Like fireflies twirling in the night Their light dances, soft and bright Oh, how the seasons silently turn Carrying stories held in the breeze Every moment a page in the book Of love that bends beneath the trees What if we could pause this time? To relive every heartbeat, every sigh In the tapestry woven fine"}
+{"id": 13, "file_origin": "final_en_test_multi.jsonl", "file_index": 13, "lyrics": "On the streets where shadows lay We chase our dreams, day by day The city lights, they call our name In this wild life, we're all the same Voices echo in the mad rush Finding solace in the hush We wander far, we wander wide Beneath the fireworks, we confide We'll rise like the sun at dawn Leaving behind what’s overrun Through the chaos, color the night Together we’ll seize the light A tapestry woven with laughter and tears Through every moment, confronting our fears In the rhythm of hearts, in every stride A symphony of dreams we will not hide Each step forward, a beacon of hope Learning to dance, learning to cope Life is a song, let’s sing it loud In the silence, we'll stand proud We'll rise like the sun at dawn Leaving behind what’s overrun Through the chaos, color the night Together we’ll seize the light Every heartbeat is a chance to grow Breaking the chains, letting it flow We are the dreamers who dare to fly In this world, we won’t ask why We'll rise like the sun at dawn Leaving behind what’s overrun Through the chaos, color the night Together we’ll seize the light In this dance of life, hear our song In every step, where we belong We'll light the path for those who strayed With hope and love, we won't be swayed So take my hand, let’s run away Into the night where dreams hold sway We’ll rise together, hearts aglow In this wild life, let our spirits flow."}
+{"id": 14, "file_origin": "final_en_test_multi.jsonl", "file_index": 14, "lyrics": "Riding down this road, skies painting dreams With the rumble of the engine, hear the freedom’s screams Every mile a story, colors splashed in time Chasing sunsets, with rhythm and rhyme Ghosts of memories drift, but the road is clear With every sunset, I’m drawing near Take a chance, leave your fears behind Follow the stars, let your heart unwind Across the valleys, let your spirit soar In the moments we share, find what we’re looking for Echoes of laughter dance in the breeze Conversations linger like autumn leaves Underneath the sky, we’ll make our mark Light up the night, igniting the spark Life is a journey, a wondrous ride With every heartbeat, you’ll be my guide Take a chance, leave your fears behind Follow the stars, let your heart unwind Across the valleys, let your spirit soar In the moments we share, find what we’re looking for Scaling mountains high, riding tides of change With every step we take, nothing feels strange Each heartbeat a promise, each voice a song In the tapestry of us, it’s where we belong Take a chance, leave your fears behind Follow the stars, let your heart unwind Across the valleys, let your spirit soar In the moments we share, find what we’re looking for Oh, take a chance, leave your fears behind Follow the stars, let your heart unwind Across the valleys, let your spirit soar In the moments we share, forever we explore With every heartbeat, we’re writing our lore And in every journey, we’ll open new doors Life’s an adventure, so boldly we’ll steer With the heart as our compass, nothing to fear Through winding roads, under starlit skies In the freedom of love, our spirits rise Together forever, this path we’ll trace Through every beat, we’ll find our place In the journey of life, you’ll always find In the tapestry of us, we’re intertwined With every step forward, let our hopes align And in every heartbeat, you’ll be mine In the journey of life, you’ll always find In the tapestry of us, we’re intertwined"}
+{"id": 15, "file_origin": "final_en_test_multi.jsonl", "file_index": 15, "lyrics": "In the shimmering light of dawn Whispers of dreams beckon us near A vibrant world awaits our step With every heartbeat, let’s disappear We chase the colors of the wind With laughter echoing bright and clear So take my hand and fly away Dance with the stars, feel the sway Lose ourselves in this endless play Love will guide us, come what may Through the meadows, fields of gold We’ll leave behind all our fears With each step, our hearts unfold Collecting memories through the years We chase the colors of the wind As shadows dance, we persevere So take my hand and fly away Dance with the stars, feel the sway Lose ourselves in this endless play Love will guide us, come what may Every moment, like a spark Ignites the fire within our hearts With every dream, we leave the dark Together, we’ll never drift apart So take my hand and fly away Dance with the stars, feel the sway Lose ourselves in this endless play Love will guide us, come what may So take my hand and fly away Chasing sunsets, where hopes stay In this moment, we’ll forever sway Hold on to love, come what may ------ ------"}
+{"id": 16, "file_origin": "final_en_test_multi.jsonl", "file_index": 16, "lyrics": "Waking from a slumber deep The morning sun brings golden cheer In dreams that linger, shadows seep Echoes of laughter fill the air Remember when the days were bright With whispered dreams beneath the stars As time danced on, we took flight With heartbeats merging in our cars Run wild, let the rivers flow In fields where wildflowers grow Embrace the change of every season With love, we find our truest reason Through valleys wide and hills so tall We journey on, a path uncharted With every rise, we’ll never fall Together, mended, never parted Remember nights by the firelight When dreams took flight on wings of gold With every story spun so bright In every moment, life unfolds Run wild, let the rivers flow In fields where wildflowers grow Embrace the change of every season With love, we find our truest reason The hands of time spin like a wheel Holding us close, then letting go Through every scar, we learn to heal In futures bright from seeds we sow Run wild, let the rivers flow In fields where wildflowers grow Embrace the change of every season With love, we find our truest reason Run wild, let the rivers flow In fields where wildflowers grow Embrace the change of every season With love, we find our truest reason In the nights of firelight dreams Whispers echo through the trees With every heartbeat, life redeems Our loving bond that flows with ease So here we stand, hand in hand Facing futures bright and clear From every tear, we make a stand In every moment, hold you near Let laughter rise and fill the skies As melodies dance through the night In every hug, a sweet surprise Together, we will face the light Run wild, let the rivers flow In fields where wildflowers grow Embrace the change of every season With love, we find our truest reason"}
+{"id": 17, "file_origin": "final_en_test_multi.jsonl", "file_index": 17, "lyrics": "In the quiet corners of my mind Where shadows dance with light so kind I saw the colors fade away And dreams we made begin to fray But then you came with open hands To take me back to distant lands Where laughter filled the empty space And we could find our special place Hold on tight, don't let go now We'll chase the stars with hearts unbowed Every step, we'll find our way Through shadows that will never sway In the fields where daisies swayed We whispered secrets, unafraid With winds of change that sang our song A melody where we belong But time can twist, it weaves a thread Yet through it all, our hearts are fed With endless dreams and hopes anew With every glance, I see us through Hold on tight, don't let go now We'll chase the stars with hearts unbowed Every step, we'll find our way Through shadows that will never sway And when the night begins to fall We'll dance together, hear the call In every heartbeat, every sigh We'll weave our dreams into the sky Hold on tight, don’t let go now We'll chase the stars with hearts unbowed Every step, we’ll find our way Through shadows that will never sway With every laugh, and every tear Together we’ll erase all fear Hold on tight, don’t let go now With you, I know our love's a vow In the quiet corners of my mind Where shadows dance with light so kind I saw the colors fade away And dreams we made begin to fray But then you came with open hands To take me back to distant lands Where laughter filled the empty space And we could find our special place Hold on tight, don't let go now We'll chase the stars with hearts unbowed Every step, we'll find our way Through shadows that will never sway In the fields where daisies swayed We whispered secrets, unafraid With winds of change that sang our song A melody where we belong But time can twist, it weaves a thread Yet through it all, our hearts are fed With endless dreams and hopes anew With every glance, I see us through Hold on tight, don’t let go now"}
+{"id": 18, "file_origin": "final_en_test_multi.jsonl", "file_index": 18, "lyrics": "In the dawn where whispers fade A pathway forged in memories made The echoes of a summer's song Call me back where I belong Worn-out sneakers on the ground Every heartbeat shares the sound Of laughter ringing in the breeze A spirit wild, it longs to be Take me home, where the heart can find The threads of love that bind the mind Through the storms, and through the rain There’s a light that breaks the chain On these roads where shadows roam I feel the pull of cherished home Where every laugh just seems to bloom And every breath clears out the gloom The dusty trails, they lead me there To all my hopes and every care With every memory etched in time I hold them close, they feel so fine Take me home, where the heart can find The threads of love that bind the mind Through the storms, and through the rain There’s a light that breaks the chain And as the sun begins to fall I hear your voice, it calls my all In every heartbeat, I can feel The warmth of love that’s oh so real Take me home, where the heart can find The threads of love that bind the mind Through the storms, and through the rain There’s a light that breaks the chain To the laughter and the tears we’ve shared In this journey, I know you’ve cared In the dawn where whispers fade A pathway forged in memories made The echoes of a summer’s song Call me back where I belong Worn-out sneakers on the ground Every heartbeat shares the sound Of laughter ringing in the breeze A spirit wild, it longs to be Take me home, where the heart can find The threads of love that bind the mind Through the storms, and through the rain There’s a light that breaks the chain On these roads where shadows roam I feel the pull of cherished home Where every laugh just seems to bloom And every breath clears out the gloom The dusty trails, they lead me there To all my hopes and every care With every memory etched in time I hold them close, they feel so fine Take me home, where the heart can find"}
+{"id": 19, "file_origin": "final_en_test_multi.jsonl", "file_index": 19, "lyrics": "In the twilight of a fading day Whispers echo where shadows play Beneath the stars that gleam above Lies a heart yearning for lost love The silence hums a bittersweet tune As the moon cradles dreams in its embrace Oh, carry me where the wild winds blow Through fields of gold where the rivers flow I'd trade the world for a fleeting glance Of the life we lived, a forgotten dance In every tear, a tale unfolds Of sunlit days and nights so cold Each memory like a feather light Drifting gently into the night I hear your laughter in the breeze An echo of hope, a soft reprise Oh, carry me where the wild winds blow Through fields of gold where the rivers flow I'd trade the world for a fleeting glance Of the life we lived, a forgotten dance Though the road ahead is steep and long I’ll find the strength in our old song With every heartbeat, every sigh I’ll chase the shadows, I’ll learn to fly Oh, carry me where the wild winds blow Through fields of gold where the rivers flow I'd trade the world for a fleeting glance Of the life we lived, a forgotten dance In fading twilight, we’ll take our chance In the dance of dreams, in the dance of dreams In the dance of dreams we knew We'll find our way back home to you We'll find our way back home to you"}
+{"id": 20, "file_origin": "final_en_test_multi.jsonl", "file_index": 20, "lyrics": "In a vibrant night sky, Stars whisper through the veil. A melody calls out, To hearts that never pale. Chasing echoes softly, We glide on moonlit streams. The world is wide and open, Wrapped in technicolor dreams. So take my hand, we're soaring, Into the unknown bright. With every pulse, we're dancing, Through shadows into light. Feel the rush, ignite the fire, With every beat, we amplify. Together we rise, never tire, Chasing stars that can’t deny. In the rhythm of the heartbeat, Life unfolds like magic threads. Every moment painted golden, In whispers of what’s said. Through the swirling labyrinth, We find the paths untold. Caught in a sweet surrender, Our story yet unfolds. So take my hand, we're soaring, Into the unknown bright. With every pulse, we're dancing, Through shadows into light. Feel the rush, ignite the fire, With every beat, we amplify. Together we rise, never tire, Chasing stars that can’t deny. Close your eyes, embrace the night, Let the wonder guide your soul. In this dance, we'll find our light, With dreams that make us whole. The universe is calling, In harmony we blend, With every upward rising, Our spirits will ascend. Feel the rush, ignite the fire, With every beat, we amplify. Together we rise, never tire, Chasing stars that can’t deny. In a vibrant night sky, Stars whisper through the veil. A melody calls out, To hearts that never pale. Together we rise, forever higher, In this dream, we'll never die. Chasing on, our spirits fire, In the night, we'll touch the sky. Oh-oh, oh-oh. Chasing stars that can’t deny. Oh-oh, oh-oh."}
+{"id": 21, "file_origin": "final_en_test_multi.jsonl", "file_index": 21, "lyrics": "Beneath the willow's bending, I carve my dreams in stone, Where the river plays and wanders, In whispers all alone. The stories of the ancients, Leak through the cracks of time, In every leaf and shadow, A tale in rhythm and rhyme. The wind sings of the seasons, The changes we all face, But in this sacred moment, I feel the universe's grace. Take me back to simpler days, When laughter filled the light, In fields where dreams would dance, And the stars shone ever bright. With every step I'm taking, The past would guide my way, In the footprints left behind me, The echoes softly sway. I trace the paths forgotten, Through laughter, tears, and pain, In the hope that, just maybe, The light will shine again. The wind sings of the seasons, The changes we all face, But in this sacred moment, I feel the universe's grace. Take me back to simpler days, When laughter filled the light, In fields where dreams would dance, And the stars shone ever bright. Let the melody surround me, Like the rain on thirsty ground, With every note, a heartbeat, The echoes all around. The love and pain entwined here, Like the branches overhead, The stories never-ending, In the dreams that I've been fed. Take me back to simpler days, When laughter filled the light, In fields where dreams would dance, And the stars shone ever bright. So here beneath the willow, I carve my dreams in stone, In whispers of the past, In memories I've known. Take me back to those days, When hearts would intertwine, And laughter filled the air, In moments pure and fine. Oh-oh, oh-oh. We'll find our way, in time. Oh-oh, oh-oh."}
+{"id": 22, "file_origin": "final_en_test_multi.jsonl", "file_index": 22, "lyrics": "Underneath the willow tree Whispers of the autumn breeze Memories are bittersweet Dancing on the amber leaves I can hear your laughter still Echoes of a time we knew Every shadow holds your name In the golden hour's hue Hold me close, don’t let me go With you, I feel the world unfold Whispers floating on the breeze Telling tales that won't let me be Remember when we chased the stars Counting all the dreams we had Fingers intertwined in fate A promise that could never fade I can feel your heartbeat slow Silhouettes in twilight’s glow Every moment strikes a chord In the echo of the world Hold me close, don’t let me go With you, I feel the world unfold Whispers floating on the breeze Telling tales that won't let me be Time can paint us in shades of gray But the memories, they never sway Like a river running deep, I carry you, in dreams I keep Hold me close, don’t let me go With you, I feel the world unfold Whispers floating on the breeze Telling tales that won't let me be Hold me close, you’re still my home In every heartbeat, I am never alone In these shadows, you'll remain Forever in my heart's refrain In every shadow, in every name Time can’t erase what we became Love remains in whispers’ claim Tears may fall on stained glass nights But deep inside, your light ignites Underneath the willow tree Awaits a bond that’s meant to be A garden where our dreams take flight In the hush of the fading light With every breath, I’ll carry you In the dance of the twilight blue Hold me close, I’ll never go In the night where starlight glows Forever echoes in the trees Stay with me, on every breeze Whispers floating through the years You’re the melody in my tears."}
+{"id": 23, "file_origin": "final_en_test_multi.jsonl", "file_index": 23, "lyrics": "In the shadows of the morning light Where whispers dance among the trees I hear the echo of dreams left behind Beneath the weight of memories Every step I took was a page unturned But now I’m lost in this endless maze And every heartbeat sings a tune Of yearning hope amidst the haze Take me back to where the sun first kissed the sea In that meadow where we danced so free Moments crafted in the essence of our youth Cradled softly in the heart of truth With every crack of dawn, I rise and fall Fading sunlight drapes the sky in gold I chase the shadows that stretch on the wall In search of stories yet untold Through the veil of night, I hold your name A fleeting spark in a darkened space But promises linger, embedded in time Guiding me home to your embrace Take me back to where the sun first kissed the sea In that meadow where we danced so free Moments crafted in the essence of our youth Cradled softly in the heart of truth And when the stars fade into dawn I’ll find my way through shadows past With every heartbeat, love still shines In memories locked, forever cast Take me back to where the sun first kissed the sea In that meadow where we danced so free Moments crafted in the essence of our youth Cradled softly in the heart of truth Take me back to where the sun first kissed the sea In that meadow where we danced so free Moments crafted in the essence of our youth Cradled softly in the heart of truth In the whispers of the autumn leaves I find the laughter of days gone by A melody hung on the evening breeze Sings of love that can never die Every path we carved in the soft, warm earth Still speaks to souls that wander near Echoes of promises ring in the air As we hold close what we hold dear Take me back to where the sun first kissed the sea In that meadow where we danced so free Moments crafted in the essence of our youth Cradled softly in the heart of truth And through the seasons, love remains A constant in an ever-changing world In the tapestry of time we weave Are stories of the hearts unfurled Take me back to where the sun first kissed the sea In that meadow where we danced so free Moments crafted in the essence of our youth Cradled softly in the heart of truth Cradled softly in the heart of truth"}
+{"id": 24, "file_origin": "final_en_test_multi.jsonl", "file_index": 24, "lyrics": "In the shadow of the mountains Memories linger on the breeze Time whispers secrets softly Echoes dance through the trees I chase the sun dipped in gold Searching for stories yet untold Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high Cross the river, feel the current Waves beckon with a longing sigh The world outside is breathing While the night slowly drifts by I found a path in the clutter Treading lightly on sacred ground Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high In the silence where whispers bloom Our souls dance in the quiet gloom A symphony of heartbeats calls As shadows waltz upon the walls Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high In the heart of the forest Where the wildflowers bloom anew I’ll weave my thoughts like ribbons In a tapestry for two The moon cradles the night’s embrace Illuminating every trace Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high When the dawn begins to whisper Drawing shadows from the past We’ll carry on forever And let this dream last The rivers run with stories old In every heartbeat, truth unfolds Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high In the stillness where shadows weep Our dreams awaken from their sleep A melody softly calls my name And nothing will ever be the same Hold me close like the twilight Let the stars paint the night sky With a heart full of dreams We’ll reach up and fly high"}
+{"id": 25, "file_origin": "final_en_test_multi.jsonl", "file_index": 25, "lyrics": "Riding on the waves of tomorrow We’re breaking free from the chains With every step, our spirits soar Dancing in the pouring rain We’re the dreamers, we’ve come alive With hearts of fire and a hunger to thrive Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite Chasing dreams down the boulevard Where the city never sleeps We’re racing through the tangled streets Where the night laughs and we leap All the stories we’ve yet to tell Are waiting in the wishing well Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite We are the voices in the crowd Singing out loud, standing proud With the rhythm of our hearts combined We’ll rise above and never fall behind Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite Building dreams on the horizon Where the sun begins to rise Every moment a new beginning Count the stars in the endless skies We’ll gather strength from what we feel Breaking limits, this is real Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite With every heartbeat, we’re stronger Together we'll face the tide Riding waves of endless wonder With the universe as our guide In these moments, we embrace Every challenge, every face Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite We are the voices in the crowd Singing out loud, standing proud With the rhythm of our hearts combined We’ll rise above and never fall behind Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite Lift your hands to the skyline Feel the pulse of the night With every heartbeat, we’ll find our way Into the starlight, let’s ignite"}
+{"id": 26, "file_origin": "final_en_test_multi.jsonl", "file_index": 26, "lyrics": "Underneath the city lights Where the heartbeats find their way In the rhythm of our lives We hold on to what we say Glass and steel reaching high Like our dreams that kiss the sky Every hope caught in the breeze We are stitched with memories Walking on the edge of night With our shadows long and free Listening to soft delight In the echo of your plea And I’ll fly on the wings of time With the stars as my guide In this beautiful design With you right by my side Every moment feels like gold In the gallery of the night Painting dreams as we grow old Every tingle, every light The canvas of our story flows In the laughter and the tears Every heartbeat, every glow Crafted through the passing years Walking on the edge of night With our shadows long and free Listening to soft delight In the echo of your plea And I’ll fly on the wings of time With the stars as my guide In this beautiful design With you right by my side In the silent corners of your heart Hold the dreams we've yet to find In every story, every part In the love that we rewind With every breath, we take a chance To dance upon this open road With hands entwined in sweet romance We are limitless, bestowed And I’ll fly on the wings of time With the stars as my guide In this beautiful design With you right by my side In the tapestry we weave In the laughter and light In every moment, we believe Together, we ignite Together, we ignite"}
+{"id": 27, "file_origin": "final_en_test_multi.jsonl", "file_index": 27, "lyrics": "Whispers of the past in the quiet night Like shadows dancing in the moonlight Memories flutter like leaves in the breeze Painted skies hold secrets that never cease Every heartbeat echoes in the silent dark Chasing dreams that once ignited the spark Oh, the stories that we told, so long ago In the gale of youth, the seeds we sowed Time stands still, but we keep moving on In the dance of life, our light will shine strong Every road we've walked carved into our fate The laughter shared, the battles we create In the garden of our souls, wildflowers bloom A tapestry of colors breaks through the gloom Every whisper beckons a distant dream Drawing us forth, like a flowing stream Oh, the stories that we told, so long ago In the gale of youth, the seeds we sowed Time stands still, but we keep moving on In the dance of life, our light will shine strong And when the stars align in the velvet sky We'll find our way, not afraid to fly With every beat, our hearts declare In this beautiful chaos, we are laid bare Oh, the stories that we told, so long ago In the gale of youth, the seeds we sowed Time stands still, but we keep moving on In the dance of life, our light will shine strong In the dance of life, our light will shine strong In the dance of life, our light will shine strong I walk the line of dreams, in faded hues In the gallery of my heart, where memories accrue Each moment a brushstroke in the canvas wide With whispers of love, forever a guide Every heartbeat echoes in the silent dark Chasing dreams that once ignited the spark Oh, the stories that we told, so long ago In the gale of youth, the seeds we sowed Time stands still, but we keep moving on In the dance of life, our light will shine strong Every road we've walked carved into our fate The laughter shared, the battles we create In the garden of our souls, wildflowers bloom A tapestry of colors breaks through the gloom Every whisper beckons a distant dream Drawing us forth, like a flowing stream Oh, the stories that we told, so long ago In the gale of youth, the seeds we sowed Time stands still, but we keep moving on In the dance of life, our light will shine strong And when the stars align in the velvet sky We'll find our way, not afraid to fly With every beat, our hearts declare In this beautiful chaos, we are laid bare Oh, the stories that we told, so long ago In the gale of youth, the seeds we sowed Time stands still, but we keep moving on In the dance of life, our light will shine strong In the dance of life, our light will shine strong In the dance of life, our light will shine strong"}
+{"id": 28, "file_origin": "final_en_test_multi.jsonl", "file_index": 28, "lyrics": "In the quiet of the morning light I seek the solace, chasing dreams in flight The whispers of your name linger in the air With every heartbeat, I know you’re somewhere near With every thought, I’m lost in time In pieces of you, our worlds entwined Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged Through the laughter and the tears we share In this journey, I’ll find you everywhere Every turn teaches me a new refrain In echoes of joy, in whispers of pain Through the sunsets and the dawns of grace In the tapestry of life, I’ve found my place With every breath, our spirits sing In the warmth of love, life’s offering Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged Through the laughter and the tears we share In this journey, I’ll find you everywhere And when the stars fall in the velvet night I’ll gather their light and make it right With every promise that we make and break In the rhythm of love, we’ll find our fate Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged Through the laughter and the tears we share In this journey, I’ll find you everywhere Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged In the quiet of the morning light I seek the solace, chasing dreams in flight The whispers of your name linger in the air With every heartbeat, I know you’re somewhere near With every thought, I’m lost in time In pieces of you, our worlds entwined Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged Through the laughter and the tears we share In this journey, I’ll find you everywhere Every turn teaches me a new refrain In echoes of joy, in whispers of pain Through the sunsets and the dawns of grace In the tapestry of life, I’ve found my place With every breath, our spirits sing In the warmth of love, life’s offering Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged Through the laughter and the tears we share In this journey, I’ll find you everywhere And when the stars fall in the velvet night I’ll gather their light and make it right With every promise that we make and break In the rhythm of love, we’ll find our fate Oh, I’ll hold on, through the tides of change In the waves of love, I’ll never feel estranged Through the laughter and the tears we share In this journey, I’ll find you everywhere In this journey, I’ll find you everywhere"}
+{"id": 29, "file_origin": "final_en_test_multi.jsonl", "file_index": 29, "lyrics": "In the midnight glow, stars collide Whispers of dreams that never hide Caught in a moment, the world fades Chasing the shadows, where time cascades Feel the pulse, it ignites the night We are the rhythm, hearts taking flight So let the light guide us through In every color, I see you With every heartbeat, we dance together In the storm, love is our tether Fragments of laughter echo the street In every stranger, a soul to meet Lost in a symphony, dreams intertwine Finding the pathways that twist and align Feel the warmth, in the air we breathe Through the chaos, we choose to believe So let the light guide us through In every color, I see you With every heartbeat, we dance together In the storm, love is our tether Time is a canvas, we paint our way With every stroke, we greet the day And if the shadows dare to creep Hold on to dreams, take the leap So let the light guide us through In every color, I see you With every heartbeat, we dance together In the storm, love is our tether So let the light guide us through With every heartbeat, I choose you Together we shine, forever and ever In the storm, love is our tether In the storm, love is our tether Silence lingers, but dreams unfold Every story, a treasure to hold With the night as our blanket, so wide In the glow, it's you by my side Every moment, a step in the dance The universe spins in a beautiful trance With every heartbeat, we're writing our song In this journey, we found where we belong So let the memories pave our way In every whisper, in every sway Together we rise, together we fall Hand in hand, we'll conquer it all In the silence that aches for a song In this moment, we both belong With every heartbeat, we dance forever In a dream that binds us together In a dream that binds us together In every heartbeat, we'll dance forever Underneath the starlit sky, oh so bright We are the dreams that bloom in the night We are the dreams that bloom in the night"}
+{"id": 30, "file_origin": "final_en_test_multi.jsonl", "file_index": 30, "lyrics": "Underneath the willow tree, I sit so still Whispers of the past, echo through the hills Counting on the fingers of fate's old hand Dreams like leaves are scattered on the land With shadows dancing on the ground The haunting echoes of a love I found Take me back to where we began Paths entwined beneath this burning sun In your arms, I felt the world sway Now I'm lost and slowly fade away Every sunrise brings the bittersweet Memories linger, where our hearts would meet In the twilight, laughter feels so far Searching for the traces in the evening star With whispers drifting on the breeze The longing echoes through the swaying trees Take me back to where we began Paths entwined beneath this burning sun In your arms, I felt the world sway Now I'm lost and slowly fade away Time may weather every stone Yet love's true essence remains our own And even through the years that fly I'll hold onto this goodbye Take me back to where we began Paths entwined beneath this burning sun In your arms, I felt the world sway Now I'm lost and slowly fade away Fade away, let the shadows drown In every heartbeat, I hear the sound Of love that lingers like a fleeting sigh In the essence of the midnight sky Underneath the willow's embrace Where every tear fell from my face In the silence, I hear your name But the echoes fade, and time feels the same Guess we were stars caught in the storm But in this daylight, I feel the warm With weary footsteps, I turn to the past In the tapestry of days that never last Take me back, oh, let me roam To the space where the heart found home In the shadows of the memories swell In every story, I'm under your spell So here's the song that we once knew A melody that still rings true Carry these words, let them fly In the echoes of the long goodbye With every note that strums along I'll hold your heart inside this song And in the fading light, we'll see the way As the night unfolds into another day As the night unfolds into another day"}
+{"id": 31, "file_origin": "final_en_test_multi.jsonl", "file_index": 31, "lyrics": "In the morning light, you fade away Whispers echo in the softest breeze Memories linger like the evening stars Time, a thief in the shadows of trees And I reach for moments, lost in the haze Each breath a reminder of how it was In dreams, we wander through open spaces Where laughter dances, like ripples on waves Holding onto whispers, the warmth embraces In the heart of memories, love still paves Every sunset paints a canvas of gold And I trace the outlines of what we had Songs we once shared, now echo like tales Of joy and sorrow, the happy, the sad With every heartbeat, a story unfolds Caught in the rhythm of life’s sweet parade In dreams, we wander through open spaces Where laughter dances, like ripples on waves Holding onto whispers, the warmth embraces In the heart of memories, love still paves And if I could turn back the sands of time I’d find you waiting, under the willow tree In echoes of laughter, our fingers entwined In the twilight glow, just you and me In dreams, we wander through open spaces Where laughter dances, like ripples on waves Holding onto whispers, the warmth embraces In the heart of memories, love still paves In dreams we wander, forever we stay In the heart of memories, we won't drift away The stars will guide us, through night’s gentle gaze In the heart of memories, love still paves In the heart of memories, love still paves In the heart of memories, love will always stay In the heart of memories, we find our way And we will find our way Beneath the silver skies, our love will never fray In every sunrise, a promise we will say In the heart of memories, love will always stay Through the years and all the spaces In the heart of dreams, we'll always replay And I will find you, in all the right places In the heart of memories, love still paves In the heart of memories, love still paves In the heart of memories, it forever stays In the heart of memories, we will never stray In the heart of memories, love still paves In the heart, love forever stays In the heart, love forever stays In the heart, love forever stays"}
+{"id": 32, "file_origin": "final_en_test_multi.jsonl", "file_index": 32, "lyrics": "In the shadow of the trees Where the whispers of the breeze Carry tales from days of old Stories waiting to be told Footsteps echo on the ground Memories lost, yet to be found Underneath the arched skyline Painting dreams in soft sunshine Time, it dances like a flame Each second plays a different game Familiar faces, distant sights Still they haunt the starry nights Oh, take me back where the rivers flow To the places I used to know Where the laughter fills the air And the burdens seem so rare In the alleys of forgotten dreams Life is woven at the seams Chasing shadows, spinning light In the magic of the night Every corner holds a tale Of the heart that set sail Echoes linger, sweet refrain In every loss, in every gain Time, it paints a vivid hue Memories, old and new Stitching life with threads of gold In the stories left untold Oh, take me back where the rivers flow To the places I used to know Where the laughter fills the air And the burdens seem so rare In the silence, hear the dreams Flow like water, bursting at the seams Reaching for the stars above Memories wrapped in songs of love Oh, take me back where the rivers flow To the places I used to know Where the dreams live and the heart can soar Take me back, I need it more Oh, take me back where the rivers flow To the places I used to know Where the laughter fills the air And the burdens seem so rare As the sun fills the air, I swear Oh, take me back, take me there"}
+{"id": 33, "file_origin": "final_en_test_multi.jsonl", "file_index": 33, "lyrics": "In the quiet of the night Whispers call through shadows deep Stars ablaze with wishes bright Holding secrets we must keep Every heartbeat syncs with time Every breath a step to sway The melody in rhythm's climb Guides me where the dreams come play Lift your eyes to the skies so blue Let the moonlight dance on you In this moment, it's just us two Together, we'll chase the morning dew Through the valleys, we will roam With every step, we carve our song Rivers call us to their home Where echoes of our hearts belong Every dream we've spun like silk Threads of hope, we intertwine In the night where love is built Every spark ignites divine Lift your eyes to the skies so blue Let the moonlight dance on you In this moment, it's just us two Together, we'll chase the morning dew Time won't steal our light away With each heartbeat, we'll embrace In the dawn where dreams are made Forever swaying in this place Lift your eyes to the skies so blue Let the moonlight dance on you In this moment, it's just us two Together, we'll chase the morning dew Lift your eyes, oh, lift them high As the stars begin to die Together dreaming ‘neath the sky With our hearts, forever fly Hearts entwined like vines that climb In gardens where the wildflowers bloom Each petal holds a story's rhyme In the fragrant night, the world in tune Feel the whispers of the breeze Carrying our laughter through the air In this dance, we'll move with ease Lost in moments, we are rare Lift your eyes to the skies so blue Let the moonlight dance on you In this moment, it's just us two Together, we'll chase the morning dew We'll write our names in cosmic dust As galaxies spin with dreams above In this journey, we must trust For our hearts are woven deep in love Every step leads us closer now Every breath ignites the fire We'll take a vow, we’ll learn just how To reach the heights of our desire Lift your eyes to the skies so blue Let the moonlight dance on you In this moment, it's just us two Together, we'll chase the morning dew So hold my hand; we'll run away Into the dawn, we find our place In dreams where love will always stay Together wrapped in warm embrace Lift your eyes to the skies so blue Let the moonlight dance on you In this moment, it's just us two Together, we'll chase the morning dew"}
+{"id": 34, "file_origin": "final_en_test_multi.jsonl", "file_index": 34, "lyrics": "In a world that spins so fast We lose sight of what we know Chasing shadows of the past Through the tides of ebb and flow Every glance holds a story untold Every corner, a mystery wreathed In the silence where secrets unfold We find treasures, our spirits freed So break the chains that hold you down Embrace the fire, feel the sound Let the echoes of truth resound In this journey, we’re unbound With every step, we redefine What it means to stand our ground In a chorus of ancient lines Where the lost and found are crowned Every heartbeat screams for life Every moment a chance to rise From the struggles, find the light In the chaos, wisdom lies So break the chains that hold you down Embrace the fire, feel the sound Let the echoes of truth resound In this journey, we’re unbound When the shadows loom so near And hope feels like it’s fading fast Stand your ground, show no fear This storm, like all, won’t last So break the chains that hold you down Embrace the fire, feel the sound Let the echoes of truth resound In this journey, we’re unbound So break the chains above the ground Feel the pulse of hallowed ground With our voices, we’ll drown the sound In this end, we’re homeward bound With the dawn upon the crest We begin to paint the skies Finding solace, heart at rest As the morning sun will rise Every heartbeat syncs with fate Every whisper is a call In the quiet, we await To break free from it all So break the chains that hold you down Embrace the fire, feel the sound Let the echoes of truth resound In this journey, we’re unbound Past the ruins of our fears We will forge a path anew Through the struggles and the tears In the light, find something true Every glance holds a story untold Every corner, a mystery wreathed In the silence where secrets unfold We find treasures, our spirits freed So break the chains that hold you down Embrace the fire, feel the sound Let the echoes of truth resound In this journey, we’re unbound When the shadows loom so near And hope feels like it’s fading fast Stand your ground, show no fear This storm, like all, won’t last So break the chains that hold you down Embrace the fire, feel the sound Let the echoes of truth resound In this journey, we’re unbound"}
+{"id": 35, "file_origin": "final_en_test_multi.jsonl", "file_index": 35, "lyrics": "When darkness falls, the stars align I feel the weight of time's design Amidst the whispers of the night Your memory shines, a guiding light In dreams we chase, the moments lost For every heartbeat, we pay the cost In shadows deep, where silence lays I find your voice in the haunting haze Through every teardrop, love will rise A tale of longing in the skies Across the ocean, tides of grace I'll search the world for your embrace The echoes of your laughter dance In every moment, a fleeting chance In dreams we chase, the moments lost For every heartbeat, we pay the cost In shadows deep, where silence lays I find your voice in the haunting haze Through every teardrop, love will rise A tale of longing in the skies When the dawn breaks, and darkness fades I hold you close in the warmth of days Through every moment, your spirit stays In memories shared, in love’s deep maze In shadows deep, where silence lays I find your voice in the haunting haze Through every teardrop, love will rise A tale of longing in the skies In a world turned dark, I seek a flame Your laughter echoes, still the same With every heartbeat, I call your name Fate's woven threads, they play this game In dreams we chase, the moments lost For every heartbeat, we pay the cost In shadows deep, where silence lays I find your voice in the haunting haze Through every teardrop, love will rise A tale of longing in the skies As stars descend, I find my way With every whisper, you softly sway Your presence lingers, though far apart In every beat, you dwell in my heart In dreams we chase, the moments lost For every heartbeat, we pay the cost In shadows deep, where silence lays I find your voice in the haunting haze Through every teardrop, love will rise A tale of longing in the skies In every dawn, the promise calls A love transcending through time's walls I'll carry on with hope ablaze In every heartbeat, love always stays In shadows deep, where silence lays I find your voice in the haunting haze Through every teardrop, love will rise A tale of longing in the skies Yes, a tale of longing in the skies"}
+{"id": 36, "file_origin": "final_en_test_multi.jsonl", "file_index": 36, "lyrics": "In the twilight glow, I wander slow Through the whispers of the pines, they call my name The moonlight dances on the surface of the stream Every shadow tells a tale, a haunting game I breathe in secrets of the night so wild Every moment carries dreams of a forgotten child Oh, breathe with me, under the starlit sky Feel the world unravel, let the echoes lie In the arms of silence, we’ll find our way Chasing shadows, until the break of day In the forest deep, where echoes softly sweep I hear the laughter of the leaves in the breeze Every footstep on this path, a memory to keep Whispers mingling with the hum of honeybees I slip between the realms of dusk and dawn Every heartbeat drumming to a haunting song Oh, breathe with me, under the starlit sky Feel the world unravel, let the echoes lie In the arms of silence, we’ll find our way Chasing shadows, until the break of day And in the stillness, I will find my peace In the stories of the night, my soul will cease Moonlit paths will guide me through the dark A symphony of stars igniting my heart Oh, breathe with me, under the starlit sky Feel the world unravel, let the echoes lie In the arms of silence, we’ll find our way Chasing shadows, until the break of day Oh, breathe with me, we shine like a flame Together in the night, we’ll never be the same In this dreamy world, we’ll make our stand Forever in the twilight, hand in hand We’ll chase the shadows, until the break of day In this wild reverie, love will always stay Oh, breathe with me, with every breath we take In the quiet of the night, our hearts will awake With the moon as our witness, we’ll ride the night tide Through every memory, together, side by side In the twilight glow, let our souls fly high Chasing dreams forever, beneath the endless sky"}
+{"id": 37, "file_origin": "final_en_test_multi.jsonl", "file_index": 37, "lyrics": "In a sunlit field, the dreams do sway Whispers of the past, calling out to play Faded photographs in a weathered book Faces ever smiling, yet life took a look Time blooms like petals, drifting soft and slow Memories like rivers, where will they flow? Let's chase the echoes through the misty past Holding on to whispers, memories that last In the heart of the meadow, where the sky is vast We'll find our way home, despite the shadows cast Underneath the stars, stories intertwine With each word we weave, the universe aligns Light of the moon, guiding us tonight Promises unbroken, shining ever bright Time blooms like petals, drifting soft and slow Memories like rivers, where will they flow? Let's chase the echoes through the misty past Holding on to whispers, memories that last In the heart of the meadow, where the sky is vast We'll find our way home, despite the shadows cast Every step, we're dancing through the years With every step, let go of all the fears Love the tangled paths, that led us here today In this vast expanse, we'll find our way Let's chase the echoes through the misty past Holding on to whispers, memories that last In the heart of the meadow, where the sky is vast We'll find our way home, despite the shadows cast We'll find our way home, despite the shadows cast In the heart of the meadow, where the sky is vast Holding on to whispers, memories that last Let's chase the echoes through the misty past Embrace every moment, don’t let it pass In the sunlit field, where dreams do sway We'll find our way back, we'll find our way Underneath the stars, stories intertwine Whispers of the night, guiding the divine In a timeless dance, we'll never lose our track Through the ages and seasons, we’ll always come back Through the ages and seasons, we’ll always come back With every fleeting moment, we’re building our stack In the heart of our stories, we’re never alone Echoes through the meadows, leading us home Holding on to whispers, memories that last Let's chase the echoes through the misty past We'll find our way home, despite the shadows cast In the heart of the meadow, where the sky is vast"}
+{"id": 38, "file_origin": "final_en_test_multi.jsonl", "file_index": 38, "lyrics": "In the twilight shadow, dreams start to fade Whispers in the breeze, secrets cascade Footprints in the dust, stories of old Each step echoes softly, stories unfold Time, like a river, flows unconfined Carrying memories, we cannot rewind So we laugh, we cry, beneath the starlit sky Chasing fleeting moments, as the years go by In the heart of the night, where love intertwines Our souls whisper softly, in mesmerizing rhymes Through the rising dawn, promises made In the warmth of your gaze, fears start to fade With each gentle heartbeat, our worlds align Every breath shared, like a vintage wine Time, like a canvas, paints what we feel Coloring our moments, so vividly real So we laugh, we cry, beneath the starlit sky Chasing fleeting moments, as the years go by In the heart of the night, where love intertwines Our souls whisper softly, in mesmerizing rhymes When shadows grow long, and the stars ignite We dance in the silence, guardians of light In each single tear, there's beauty to find In the stories we share, we're forever entwined So we laugh, we cry, beneath the starlit sky Chasing fleeting moments, as the years go by In the heart of the night, where love intertwines Our souls whisper softly, in mesmerizing rhymes Forever in echoes, in love we rely Together we stand, watching dreams fly by By the candle's glow, let our spirits soar Hand in hand, we wander, forever explore In the twilight shadow, where hopes reside We'll keep on singing, with love as our guide With memories painted, in colors so sweet Our hearts keep on dancing, to this timeless beat With every sunset, a story to tell In the rhythm of life, let our love swell Through the echoes of time, our voices will rise For deep in the night, our spirits will fly Into the light of a brand new dawn Where hopes and dreams together are drawn So we smile, we dream, as the night draws near In the melody of life, forever sincere So let’s laugh, let’s cry, under this vast sky In the heart of our stories, as our spirits fly"}
+{"id": 39, "file_origin": "final_en_test_multi.jsonl", "file_index": 39, "lyrics": "Riding the midnight train, beneath the moon's gleam Chasing shadows down, fueled by a dream With the wind on my face, and guitar in hand Singing the blues, making my stand Echoes in the night, searching for the truth In this restless journey, reclaiming my youth So let the music play, let it drown my pain In the rhythm and the sound, I’ll break these chains With every note I strike, I’m alive once more In the heart of the storm, I’m ready to roar City lights flicker, like stars on the ground Lost souls wander, everywhere around But I won't give in, to this weary night Fuel the fire within, ignite my fight Every whispered doubt, I’ll turn into breeze Guiding my way, as I dance with ease So let the music play, let it drown my pain In the rhythm and the sound, I’ll break these chains With every note I strike, I’m alive once more In the heart of the storm, I’m ready to roar Take me to the edge, let me feel the fire Let the sound consume me, fuel my desire With electric dreams, I’m ready to fly In the heat of the moment, my spirit won’t die So let the music play, let it drown my pain In the rhythm and the sound, I’ll break these chains With every note I strike, I’m alive once more In the heart of the storm, I’m ready to roar On the edge of tomorrow, where shadows collide I’ll keep on moving, with the world as my guide In the pulse of the city, the heart of the night I’ll stand in the spotlight, bathed in the light With every heartbeat, the passion ablaze In the echoing halls, where hope always stays Let me break these borders; let me take flight In the chaos of sound, I’ll find my right So let the music play, let it wash away tears In the melody’s arms, I’ll conquer my fears As the stage lights dim and the crowd starts to cheer This moment is mine, I can finally steer With the power of music, I’ll soar above In the rhythm of life, I’ll find what I love Every strum from the strings, every beat in my soul Takes me farther than dreams, where I’m finally whole So let it all out, let it flow with the tide In the bluesy embrace, forever I’ll ride In this journey of sound, where the wild hearts play I’ll keep on pushing, no fear in the way For the music will guide me, until the end of time And in every last note, I’ll find my prime"}
+{"id": 40, "file_origin": "final_en_test_multi.jsonl", "file_index": 40, "lyrics": "Underneath the willow's sway I found memories that drift away Whispers of laughter, echoes of past Moments like shadows that never last Time like a river flows so free Carrying dreams, lost bits of me Oh, sweet nostalgia, take me back To those golden days, where life’s in full track With every song that fills the air I feel you near, I feel you there Walking through these streets so bare Each corner whispers secrets of care I chase the laughter, I chase the light But darkness sometimes steals the night Time like a river flows so free Carrying dreams, lost bits of me Oh, sweet nostalgia, take me back To those golden days, where life’s in full track With every song that fills the air I feel you near, I feel you there So I close my eyes and I escape In memories where love takes shape Dancing lights, the stars above A timeless tune, a song of love Oh, sweet nostalgia, take me back To those golden days, where life’s in full track With every song that fills the air I feel you near, I feel you there Oh, sweet nostalgia, stay with me In this heart where you’re forever free In every note, I find our song In the tender moments where we belong Underneath the willow, under the sky I’ll hold our memories, you and I Underneath the willow's sway I found memories that drift away Whispers of laughter, echoes of past Moments like shadows that never last Time like a river flows so free Carrying dreams, lost bits of me Oh, sweet nostalgia, take me back To those golden days, where life’s in full track With every song that fills the air I feel you near, I feel you there Walking through these streets so bare Each corner whispers secrets of care I chase the laughter, I chase the light But darkness sometimes steals the night Time like a river flows so free Carrying dreams, lost bits of me Oh, sweet nostalgia, take me back To those golden days, where life’s in full track With every song that fills the air I feel you near, I feel you there So I close my eyes and I escape In memories where love takes shape Dancing lights, the stars above A timeless tune, a song of love Oh, sweet nostalgia, take me back To those golden days, where life’s in full track With every song that fills the air I feel you near, I feel you there"}
+{"id": 41, "file_origin": "final_en_test_multi.jsonl", "file_index": 41, "lyrics": "City lights are shining bright We’re chasing dreams through the night Hands up high, we’re feeling free This is where we're meant to be Heartbeat racing to the sound In this rhythm, we’re unbound Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive Colors flash, the night ignites Dancing shadows, city lights Moving fast, we own the floor Living dreams, we crave for more Heartbeat racing to the sound In this rhythm, we’re unbound Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive When the world starts to fade away Close your eyes, let the night play Feel the beat, it’s all that we need Let the passion set you free Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive City lights are shining bright We’re chasing dreams through the night Hands up high, we’re feeling free This is where we're meant to be Heartbeat racing to the sound In this rhythm, we’re unbound Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive Colors flash, the night ignites Dancing shadows, city lights Moving fast, we own the floor Living dreams, we crave for more Heartbeat racing to the sound In this rhythm, we’re unbound Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive When the world starts to fade away Close your eyes, let the night play Feel the beat, it’s all that we need Let the passion set you free Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive Let the music take control Feel it pumping in your soul We’re alive, we’ve come alive In this moment, let's survive Let the music take control Feel it pumping in your soul"}
+{"id": 42, "file_origin": "final_en_test_multi.jsonl", "file_index": 42, "lyrics": "In the neon glow, I take my flight Chasing dreams through the starry night Footprints echo on electric streets Every heartbeat dances with the beats Lost in the moment, I’m alive With every rhythm, I will thrive Take me higher, into the sky We can soar, just you and I Through the echoes of our hearts entwined In this world, our dreams aligned As the dawn breaks, colors arise Painting hope with the sunlit skies Voices carry on the morning breeze Whispers of love in the rustling leaves We’ll write our story, page by page With every chapter, we’ll engage Take me higher, into the sky We can soar, just you and I Through the echoes of our hearts entwined In this world, our dreams aligned If the stars fall down tonight We’ll catch them, hold them tight With our dreams, we’ll light the fire Together, we’ll rise, never tire Take me higher, into the sky We can soar, just you and I Through the echoes of our hearts entwined In this world, our dreams aligned Take me higher, feel the light In your arms, everything’s alright Nothing can break this bond we share In this moment, there's magic in the air We’ll chime with the sounds of a new refrain With every heartbeat, we’ll dance again Through the shadows, we’ll find our way Hand in hand, we’ll seize the day Lost in the moment, I’m alive With every rhythm, I will thrive Take me higher, into the sky We can soar, just you and I Through the echoes of our hearts entwined In this world, our dreams aligned And when the night falls, we’ll still be free Under moonlight’s gaze, you and me Forever chasing our slice of the stars Together forever, no matter how far We’ll write our story, page by page With every chapter, we’ll engage Take me higher, into the sky We can soar, just you and I Through the echoes of our hearts entwined In this world, our dreams aligned If the stars fall down tonight We’ll catch them, hold them tight With our dreams, we’ll light the fire Together, we’ll rise, never tire Take me higher, into the sky We can soar, just you and I Through the echoes of our hearts entwined In this world, our dreams aligned Take me higher, feel the light In your arms, everything’s alright Nothing can break this bond we share In this moment, there's magic in the air Take me higher, into the light Forever with you, it feels so right From dusk till dawn, we’ll never part In this life, you’re my heart"}
+{"id": 43, "file_origin": "final_en_test_multi.jsonl", "file_index": 43, "lyrics": "In the shadows where the truth lies Echoes of whispers haunt my sighs Memories painted in shades of grey Lost in a world, the heart decays A broken mirror reflects my soul Searching the pieces, can I be whole? Holding on to the remnants of time Dreams fade away, like a forgotten rhyme Trapped in a cage, but still I fight For the flicker of hope in the dead of night Wading through waters, cold and deep The pain is a promise that I must keep Voices consume, they taint my view In the chaos, I’m searching for you A silhouette haunts the edge of my mind Fading away, only shadows I find Holding on to the remnants of time Dreams fade away, like a forgotten rhyme Trapped in a cage, but still I fight For the flicker of hope in the dead of night Every heartbeat echoes a plea Can you hear it calling, calling for me? In the silence, I drown in despair Yet every moment, I’m wishing you were there Holding on to the remnants of time Dreams fade away, like a forgotten rhyme Trapped in a cage, but still I fight For the flicker of hope in the dead of night Holding on, but the grip slips away Days turn to shadows, lost in the fray With every tear that stains the ground I cling to the silence, my haunting sound In the shadows where the truth lies Echoes of whispers haunt my sighs Memories painted in shades of grey Lost in a world, the heart decays A broken mirror reflects my soul Searching the pieces, can I be whole? Holding on to the remnants of time Dreams fade away, like a forgotten rhyme Trapped in a cage, but still I fight For the flicker of hope in the dead of night Wading through waters, cold and deep The pain is a promise that I must keep Voices consume, they taint my view In the chaos, I’m searching for you A silhouette haunts the edge of my mind Fading away, only shadows I find Holding on to the remnants of time Dreams fade away, like a forgotten rhyme Trapped in a cage, but still I fight For the flicker of hope in the dead of night Every heartbeat echoes a plea Can you hear it calling, calling for me? In the silence, I drown in despair Yet every moment, I’m wishing you were there Holding on to the remnants of time Dreams fade away, like a forgotten rhyme Trapped in a cage, but still I fight For the flicker of hope in the dead of night Holding on, but the grip slips away Days turn to shadows, lost in the fray With every tear that stains the ground I cling to the silence, my haunting sound"}
+{"id": 44, "file_origin": "final_en_test_multi.jsonl", "file_index": 44, "lyrics": "In shadows where silence resides I wander through the tangled streets Whispers of dreams long gone Echo in the empty nights Can you hear my heart beating It's drowning in the echoes Searching for warmth in the cold Paths that we've left untold So hold my hand and don't let go We're tracing stars, in skies so low Caught in a whirl of faded grace Time slips by, we embrace the space Through memories painted in grey I find the colors of our past Glimmers of laughter drift away Yet shadows will not let them last Can you hear my heart pleading It's begging for the daylight A flicker of hope in twilight We'll find our way to the bright So hold my hand and don't let go We're tracing stars, in skies so low Caught in a whirl of faded grace Time slips by, we embrace the space In the heart of the storm we will dance Whirling thoughts, lost in the trance If we fade, let it be in light In every breath, we dance with night So hold my hand and don't let go We're tracing stars, in skies so low Caught in a whirl of faded grace Time slips by, we embrace the space So hold my hand, we'll drift away Into the night where shadows play Together we'll create our song Where every heartache will be drawn Into the light of a brand new dawn We'll rise again, forever strong So hold my hand, we'll chase the stars In a universe that's truly ours Searching for what time cannot stain In dreams we'll fly above the rain And as the night begins to fade We'll find the love that we once made So take my heart and set it free In every moment, it's you and me"}
+{"id": 45, "file_origin": "final_en_test_multi.jsonl", "file_index": 45, "lyrics": "Beneath the willow by the stream I carved our names in the bark Whispers carried on the breeze Hold our memories in the dark Days that once felt like forever Fleeting moments held so close Through the echoes of the seasons Time will drift, but we’ll be ghosts And I'll sing to the stars above In the night, they shine like our love Every chord I strum feels true A melody, it sings of you From autumn leaves to winter’s chill Every season held our touch In the warmth of the sunlit hill I find the days where I miss you much We found solace in the quiet In the laughter and the tears Chasing fireflies in twilight Holding dreams that disappear And I'll sing to the stars above In the night, they shine like our love Every chord I strum feels true A melody, it sings of you When the world is dark and heavy Close your eyes, and listen still In the whispers of the night sky Feel the magic, feel the thrill And I'll sing to the stars above In the night, they shine like our love Every chord I strum feels true A melody, it sings of you So take my hand, we’ll walk anew In every moment, it’s me and you Together as the seasons change We’ll find our way, it remains the same And as the dawn begins to break I’ll stitch the dreams we never make In every note that calls your name Our hearts will dance in love's sweet flame Through the fields where flowers grow With every step, my heart will show In this life, we’ll wander free Just you and I, our legacy"}
+{"id": 46, "file_origin": "final_en_test_multi.jsonl", "file_index": 46, "lyrics": "In the quiet of the morning sun Whispers dance on golden rays Dreams awaken, shadows run Chasing time in heartfelt plays With every note the heartstrings pull A melody that stirs the soul So here we stand, beneath the sky Letting go of things we tried With open hearts and arms wide Together, we will rise and fly Through the storms and through the pain Every teardrop tells a tale Laughter echoes in the rain Love endures, it will prevail With every note the heartstrings pull A melody that stirs the soul So here we stand, beneath the sky Letting go of things we tried With open hearts and arms wide Together, we will rise and fly In the gaze of a distant star I found a spark, a guiding light Every tear, every scar Shaped the journey to this night So here we stand, beneath the sky Letting go of things we tried With open hearts and arms wide Together, we will rise and fly With open hearts and arms wide Together, we will rise and fly Forever free, we will touch the sky Together, we will rise and fly Together, we will rise and fly Together, we will rise and fly Together, we will rise and fly Together, we will rise and fly Together, we will rise and fly Together, we will rise and fly So let the light in, oh let it glow And let us dance with the world we know Together, we will rise and fly"}
+{"id": 47, "file_origin": "final_en_test_multi.jsonl", "file_index": 47, "lyrics": "In a garden where the shadows play Whispers of the trees softly sway Memories tangled in sunlight's gaze In fleeting moments, time lays its praise The petals fall like secrets shared Echoes of laughter drift through the air We dance under stars, a moment in flight Chasing the shadows that fade into night With every heartbeat, we write our own song In this world of wonder, we know we belong Through winding paths that we used to roam Carved in the earth, a place we call home The stories we crafted, wild and free Live in the corners of you and me With painted skies, our dreams take flight Courage to chase the unknown tonight We dance under stars, a moment in flight Chasing the shadows that fade into night With every heartbeat, we write our own song In this world of wonder, we know we belong Hold onto hope, let it be our guide In the silence, love will abide The echoes of dreams will carry us home In the light of hope, we're never alone We dance under stars, a moment in flight Chasing the shadows that fade into night With every heartbeat, we write our own song In this world of wonder, we know we belong In the garden where the shadows sway We'll treasure this love, come what may As whispers of trees softly sway In fleeting moments, time lays its praise In this world of wonder, we know we belong In this world of wonder, we know we belong In this world of wonder, where we both are strong Together we find where our hearts belong Forever we'll dance, forever we'll sing In these moments of magic, our love takes wing With every heartbeat, we'll write our own song In this world of wonder, oh won't you come along? In this world of wonder, we know we belong."}
+{"id": 48, "file_origin": "final_en_test_multi.jsonl", "file_index": 48, "lyrics": "Underneath the willow tree Memories drift like leaves with glee Whispers carry stories of old In the twilight, secrets unfold Time has painted shadows on our skins Dancing through the dreams we’ve lived in Faded photographs in a dusty book Inviting us, we’re drawn to look Let the fire warm our souls In the echoes, we are whole As the stars illuminate our plight We'll find our way, holding tight And in the morning sun we’ll rise With open hearts and endless skies Through the laughter and the tears We’ll uncover all our fears Gentle rivers flow beside the trails Guiding wandering hearts like sails In the softness of the evening glow We find the paths we didn't know Lanterns light the way through darkened woods Blueprints drawn in ancient moods And every moment carved in stone Reminds us we’re never alone Let the fire warm our souls In the echoes, we are whole As the stars illuminate our plight We'll find our way, holding tight And in the morning sun we’ll rise With open hearts and endless skies Through the laughter and the tears We’ll uncover all our fears Through the quiet, we’ll understand In this journey, take my hand Moments linger, hearts entwine In the echoes, love will shine And in the morning sun we’ll rise With open hearts and endless skies Through the laughter and the tears We’ll uncover all our fears And in the morning sun we’ll rise With open hearts and endless skies Through the laughter and the tears We’ll uncover all our fears And in the morning sun we’ll rise With open hearts, we’ll claim our prize In the stories yet to tell Together, we’ll weave our spell Together, we’ll weave our spell Together, we’ll weave our spell Together, we’ll weave our spell Together, we’ll weave our spell Together, we’ll weave our spell"}
+{"id": 49, "file_origin": "final_en_test_multi.jsonl", "file_index": 49, "lyrics": "In the quiet dawn, the sun awakes Whispers of dreams, on winds that shake The world is turning, colors ignite A tapestry woven, of day and night With open arms, I chase the light Leaving shadows, embracing the bright Rise from the ashes, higher we soar Hearts, like the rivers, forever explore Together we dance, in harmony's song Finding our place, where we belong Through valleys of doubt, I'll walk this road Each step a story, a truth to unfold With every heartbeat, a promise made In the silence, love won't fade With open arms, I chase the light Leaving shadows, embracing the bright Rise from the ashes, higher we soar Hearts, like the rivers, forever explore Together we dance, in harmony's song Finding our place, where we belong Through storms that crash, and skies that cry I'll hold your hand, we'll touch the sky With every heartbeat, forever bound In this symphony, our love resounds Rise from the ashes, higher we soar Hearts, like the rivers, forever explore Together we dance, in harmony's song Finding our place, where we belong Finding our place, where we belong Where we belong, where we belong In this world, we're never alone With open hearts, we'll find our home We'll find our home, our story unfolds With each breath, our future is bold Together we'll rise, together we'll stand Hand in hand, across this land Together we'll rise, together we'll stand In this love that forever expands In this love that forever expands We'll write a saga in the stars above A tale of courage, a tale of love With open hearts, we'll find our home Together forever, we'll never be alone Together forever, never be alone Never be alone."}
diff --git a/eval_pipeline/gt_lyrics/zh.jsonl b/eval_pipeline/gt_lyrics/zh.jsonl
new file mode 100644
index 0000000000000000000000000000000000000000..4ee135150c1e979db5157d7270da41bf56331d09
--- /dev/null
+++ b/eval_pipeline/gt_lyrics/zh.jsonl
@@ -0,0 +1,50 @@
+{"id": 0, "file_origin": "final_zh_test_multi.jsonl", "file_index": 0, "lyrics": "在晨曦中,我看到你 你的笑容,如花般绽放 仿佛时间停止,世界在静止 心跳轻轻回响,伴随晨风 我们一起追逐,梦想的轮廓 在无尽的天空留下足迹 你是我心中,最美的旋律 在每一个瞬间,陪伴我 与你的故事,如歌般动人 在永恒的时光,绽放无尽 当夜幕降临,星空闪烁 我在梦中,依然与你相拥 每个瞬间,都是无价 你的眼神,是唯一的灯塔 我们一起翻阅,岁月的篇章 铭记每一个微笑,和泪光 你是我心中,最美的旋律 在每一个瞬间,陪伴我 与你的故事,如歌般动人 在永恒的时光,绽放无尽 岁月如歌,声声入耳 只愿与你,一同走过 不管未来,有多少风雨 只想与你,共同面对 你是我心中,最美的旋律 在每一个瞬间,陪伴我 与你的故事,如歌般动人 在永恒的时光,绽放无尽 在心底深处,唯有你 如星辰般,映照我的梦 永远不会消逝,柔和光辉 与爱的记忆,交织成歌 一起走向那,未知的旅程 因为有你,光芒闪耀 在我的生命,写下篇章 不再惧怕,未来的每个转角 与你并肩,直到天荒地老 在爱中,找到归宿,勇敢飞翔"}
+{"id": 1, "file_origin": "final_zh_test_multi.jsonl", "file_index": 1, "lyrics": "在晨光中我醒来 回忆昨夜的梦境 那梦游走在星河间 带我去那未知的世界 阳光洒在我的脸庞 风轻轻拂过,似你低语 让我寻觅心中的声音 在世界的每一个角落 找回那些失去的光辉 跟随梦想,飞翔在天际 在繁华中我独自漫步 人潮涌动,似浮云散去 我在寻找一个出口 通往内心深处的宁静 每一次彷徨都让我更坚强 勇敢面向,未来的方向 让我寻觅心中的声音 在世界的每一个角落 找回那些失去的光辉 跟随梦想,飞翔在天际 无论黑暗如何吞噬 我心中的光永不熄灭 我的灵魂如风一样 在宇宙中自由翱翔 让我寻觅心中的声音 在世界的每一个角落 找回那些失去的光辉 跟随梦想,飞翔在天际 让爱与希望点亮夜空 如星辰般耀眼,永不消逝 我的梦想在光中延续 在每一个晨曦中重生 让我自由,追寻未来的旅程 在生命的歌声中飞翔 回到自己心中的那片海 无畏无惧,勇往直前 我将找到我自我的方向 跨越每一个梦的边界 心灵的和鸣,在此刻绽放 每一个瞬间都如此珍贵 让我随风起舞,无所畏惧 在音符中寻找生命的灵魂 与那自由相融,直到永恒 让我的心和梦同行 如星光明亮,点亮每一个夜 在浩瀚宇宙中,我是那蜡烛的光 永不熄灭,照亮前方的路"}
+{"id": 2, "file_origin": "final_zh_test_multi.jsonl", "file_index": 2, "lyrics": "夜幕降临,城市的心跳 灯光闪烁,如梦幻般围绕 我走在这铁和水的交响 耳边低语,历史的回响 一阵狂风,掀起心动 在嘶吼中找到方向 我们在虚空遨游,追寻声音 遗忘的真相在呼唤 每一个灵魂都有它的节奏 在这平行的宇宙中飞翔 一路狂奔,追逐时间 碎片拼凑出未来的蓝图 黎明来临,燃起希望 无惧风暴,继续航行 每一次挣扎都让我更坚强 心中的火焰永不熄灭 我们在虚空遨游,追寻声音 遗忘的真相在呼唤 每一个灵魂都有它的节奏 在这平行的宇宙中飞翔 时间在流逝,未来尚不明朗 但我愿意为此一搏 与星辰对话,聆听它们的秘密 在无尽的黑暗中划出光芒 我们在虚空遨游,追寻声音 遗忘的真相在呼唤 每一个灵魂都有它的节奏 在这平行的宇宙中飞翔 让每一次鼓声传达不屈的意志 如雷霆般震撼,打破沉默 时间在走,我们依旧坚定 走向未来,拥抱每一个瞬间 让旋律成为心灵的呐喊 在每个音符里找到希望 让节奏引领我们到达 那未知的征途,燃烧每一个夜晚 纵然前方荆棘丛生 我仍将飞跃重重阻碍 在每一道伤痕中获得成长 燃烧的灵魂,随着曲调起舞 让心潮澎湃,继续前行 迎接新的曙光,未来正在召唤 让我们一起创造自我的传说 在这片永恒的星空下 追寻属于我们自己的声音 在迷茫中,坚定的信仰 将会引导我走向光明"}
+{"id": 3, "file_origin": "final_zh_test_multi.jsonl", "file_index": 3, "lyrics": "在温柔的晨光中 我听见鸟儿轻轻歌唱 那是关于希望的旋律 似乎在诉说每一个梦 如果潮水带走了安静 是否能留下爱的痕迹 我在风中等待这一刻 心中的温暖如潮水般回归 我愿与你分享这份美 在每一个清晨的光影中长存 漫步在满是花香的小径 每一朵花都是我对你的念想 你是那温暖的阳光 驱散我心中的阴霾 如果潮水带走了安静 是否能留下爱的痕迹 我在风中等待这一刻 心中的温暖如潮水般回归 我愿与你分享这份美 在每一个清晨的光影中长存 或许人生就是一场旅行 每一段遇见都有意义 让我随风而行,不再彷徨 只要有你,就足够 我在风中等待这一刻 心中的温暖如潮水般回归 我愿与你分享这份美 在每一个清晨的光影中长存 与你一起谱写爱的旋律 让每个音符都镌刻心底 即使岁月改变模样 我愿与你走过每一个春夏 梦中醒来依然感动 愿爱在心中永远不灭 在每一个瞬间,永恒不变 在未来的日子里 携手走向属于我们的明天 让这份情感如星辰闪烁 直到时光的尽头,依然如初 在爱的海洋中,一同遨游 不再孤单,每一步都坚定 直到最后,我要与你并肩 在这漫漫旅途中,共赴明天 爱是最美的旋律,我心的归宿 永远相伴,因此勇敢追梦"}
+{"id": 4, "file_origin": "final_zh_test_multi.jsonl", "file_index": 4, "lyrics": "在古老的街上,回忆如潮 风吹过,带走了青春的梦 转身之间,似乎一切都已改变 心中那份,难以言说的期待 思念如烟,很缭绕在心头 每个夜晚,孤独伴我入眠 那些年华,光辉与暗淡交织 我走过的路,留下无数痕迹 每一次呼唤,都在心底回响 在这条路上,寻找失落的自己 繁星点点,夜空如水 熟悉的旋律,像是旧时光在诉说 那些青涩,曾是无畏的勇气 时光荏苒,心却始终惦念 思念如烟,很缭绕在心头 每个夜晚,孤独伴我入眠 那些年华,光辉与暗淡交织 我走过的路,留下无数痕迹 每一次呼唤,都在心底回响 在这条路上,寻找失落的自己 岁月如歌,千万旋律轻轻吟唱 长河流尽,带走了少年的欢笑 愿我的心,在这微风中等候 再一次感受,生命的温暖 那些年华,光辉与暗淡交织 我走过的路,留下无数痕迹 每一次呼唤,都在心底回响 在这条路上,寻找失落的自己 过去的梦,仍在心中闪烁 我的歌声,唤醒沉睡的灵魂 等待那一天,温暖的阳光再现 在古老的街上,回忆如潮 风吹过,带走了青春的梦 转身之间,似乎一切都已改变 心中那份,难以言说的期待 思念如烟,很缭绕在心头 每个夜晚,孤独伴我入眠 那些年华,光辉与暗淡交织 我走过的路,留下无数痕迹 每一次呼唤,都在心底回响 在这条路上,寻找失落的自己 繁星点点,夜空如水 熟悉的旋律,像是旧时光在诉说 那些青涩,曾是无畏的勇气 时光荏苒,心却始终惦念 思念如烟,很缭绕在心头 每个夜晚,孤独伴我入眠 那些年华,光辉与暗淡交织 我走过的路,留下无数痕迹 每一次呼唤,都在心底回响 在这条路上,寻找失落的自己 岁月如歌,千万旋律轻轻吟唱 长河流尽,带走了少年的欢笑 愿我的心,在这微风中等候 再一次感受,生命的温暖 那些年华,光辉与暗淡交织"}
+{"id": 5, "file_origin": "final_zh_test_multi.jsonl", "file_index": 5, "lyrics": "荒野的风在呼啸 孤独的灵魂在游荡 破旧的吉他在手边 歌声划破这寂静 每一个夜晚都漫长 心中燃起不灭的梦想 在广袤土地间奔跑 随心所欲,跟着旋律 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃 痛苦的回忆在缠绕 但我不会停下脚步 勇敢的心正在挣扎 向前冲,向前飞 每一个夜晚都漫长 心中燃起不灭的梦想 在广袤土地间奔跑 随心所欲,跟着旋律 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃 探求自由的灵魂 无畏无惧向前走 梦在荒野中闪烁 紧握那星光 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃 荒野的风在呼啸 孤独的灵魂在游荡 破旧的吉他在手边 歌声划破这寂静 每一个夜晚都漫长 心中燃起不灭的梦想 在广袤土地间奔跑 随心所欲,跟着旋律 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃 痛苦的回忆在缠绕 但我不会停下脚步 勇敢的心正在挣扎 向前冲,向前飞 每一个夜晚都漫长 心中燃起不灭的梦想 在广袤土地间奔跑 随心所欲,跟着旋律 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃 我在这片天际狂欢 绝不轻言放弃 燃烧的爱在心中 把黑夜点燃"}
+{"id": 6, "file_origin": "final_zh_test_multi.jsonl", "file_index": 6, "lyrics": "指尖划过那时光, 似梦似真又何妨, 流星划过夜空, 你我都在那方向。 岁月如歌的旋律, 心中燃烧着火焰, 在这个缤纷的季节, 让我们将爱缠绵。 在浪潮里奔涌, 像潮水推着我向前, 无畏天边那雷电, 只要你在身边。 光影流转的瞬间, 我依偎在你的肩, 风中耳边低语, 似吟唱的恋曲般缠绵。 追逐每个夜晚, 梦想如星辰闪烁, 在这个梦幻的过程中, 我愿与你并肩。 在浪潮里奔涌, 像潮水推着我向前, 无畏天边那雷电, 只要你在身边。 即使前路漫漫, 我不会退缩,我会追寻, 每一道光都指引着你, 在这旅途中勇往直前。 在浪潮里奔涌, 像潮水推着我向前, 无畏天边那雷电, 只要你在身边。 让我们一起飞翔, 在梦中追逐赤子心, 时间的河流不断向前, 爱在每个晨曦中重现。 你的笑容是那光辉, 永驻我心间, 即使世界变幻,我愿紧握这份情, 永不分离,仿佛从未有过距离。 在这无尽的岁月里, 一起走向那片光明, 我们的歌在风中起舞, 铸就不灭的旋律。 再遇见你时, 我会说出那句,我爱你, 直到时间的尽头, 愿能与你共舞, 在任何的天边, 永不停息的歌声, 直到最后的那一幕, 只想与你共享这一切。 共享这一切。"}
+{"id": 7, "file_origin": "final_zh_test_multi.jsonl", "file_index": 7, "lyrics": "月光洒落在湖面, 如你的笑容映照, 当我们并肩走过, 心中燃起了希望。 那些未曾说出的情话, 在风中轻轻飘荡, 我默默守候的未来, 是否能与你分享。 放眼天际的梦想, 在云层间追逐, 你是我心底的渴望, 永远不言放弃。 那片金色的麦田, 是我们曾相遇的地方, 一缕微风拂过, 何尝不是爱的传递。 时光静好如水, 珍藏那一瞬的美, 无论世界如何变迁, 我心永远属于你。 放眼天际的梦想, 在云层间追逐, 你是我心底的渴望, 永远不言放弃。 即使前路多荆棘, 我也要勇敢向前, 让爱成为指引, 照亮每一个晨曦。 放眼天际的梦想, 在云层间追逐, 你是我心底的渴望, 永远不言放弃。 不再畏惧风雨, 我会勇敢向前, 每一步都是信念, 与你的未来相连。 岁月静好如歌, 我们携手并肩, 就算风雨飘摇, 也不曾放手的决心。 放眼天际的梦想, 在云层间追逐, 你是我心底的渴望, 永远不言放弃。 想要与你见证, 岁月温柔的流转, 每一次心心相印, 都在我生命中铭刻。 时光流转如歌, 让爱成为永恒, 无论前方的路途, 我愿与你同行。 放眼天际的梦想, 在云层间追逐, 你是我心底的渴望, 永远不言放弃。 前方的路不再孤单, 我与你共享未来, 让爱成为旋律, 终将奏响人间。 放眼天际的梦想, 在云层间追逐, 你是我心底的渴望, 永远不言放弃。"}
+{"id": 8, "file_origin": "final_zh_test_multi.jsonl", "file_index": 8, "lyrics": "缓缓的风, 吹过这片林间, 我闭上双眼, 想着你的笑颜。 每一个梦中, 都和你相拥, 温暖的节奏, 心跳开始柔和。 你是我心中的光, 照亮漫长的夜晖, 无论多远的地方, 你都是我的归依。 余晖透过天际, 思念在蔓延, 我这一刻, 却只想与你相见。 每一段记忆, 都是你的声音, 让我在心中, 对你的渴望更清。 你是我心中的光, 照亮漫长的夜晖, 无论多远的地方, 你都是我的归依。 时间在流逝, 而爱从未减退, 我的心永远, 为你而等待。 你是我心中的光, 永远不会熄灭, 无论多远的地方, 你都是我的归依。 风继续轻拂, 带走思念的香, 我在这片土地, 默默守护希望。 每个夜晚, 都有星星伴我行, 我张开双手, 将这份爱传递。 我知道那天, 阳光洒在我们身上, 仿佛一切, 都在为你奏响。 我会一直在, 等待那轮回的盈, 愿你听见, 这份爱的心声。 每个晨曦, 我都在期待, 与你相拥, 再无离开。 我在这里, 静静守候, 直到有一天, 你会又回来。 那时的我们, 一定会更开心, 爱会漫延, 再无忧伤。 你是我心中的光, 照亮这片海洋, 无论岁月如何变, 你都是我的爱藏。"}
+{"id": 9, "file_origin": "final_zh_test_multi.jsonl", "file_index": 9, "lyrics": "在那一片宽广的天地 风轻轻吹过,带走思念 每一粒尘埃,都闪着光 故事在晨曦中缓缓展开 我在梦中游荡,追寻着你 那遥远的声音,仿佛在呼唤 爱如星辰,永不停歇 指引我漫步,在这片天空 心中的花朵,悄然绽放 与你的点滴,铭刻心海 这个瞬间如此美丽 掌心温暖着,无尽的期待 岁月轮转,记忆悠长 我在每一个日出中等待 我在梦中游荡,追寻着你 那遥远的声音,仿佛在呼唤 爱如星辰,永不停歇 指引我漫步,在这片天空 心中的花朵,悄然绽放 与你的点滴,铭刻心海 多少个秋冬春夏 你我她他,无处不在 那一抹微笑,温暖了安静 让时光静止,万物复苏 爱如星辰,永不停歇 指引我漫步,在这片天空 心中的花朵,悄然绽放 与你的点滴,铭刻心海 在这片天地,永不分离 直到最后一缕光辉散去 我依然珍藏,你的每一句 在那一片宽广的天地里 你的笑声,是我心中唯一"}
+{"id": 10, "file_origin": "final_zh_test_multi.jsonl", "file_index": 10, "lyrics": "月光洒落城市的街道 心跳如雷,燃烧着希望 我在夜色中奔跑 追逐那遥远的梦想 挥洒汗水,不怕风雨 相信未来,照亮前方 飞越那高墙,化作一只鹰 在刀锋之上,展翅翱翔 无畏风暴,勇往直前 心中激情,点燃狂热 每一次跌倒,重新站起 我不是孤单,齐心协力 未来的路,洒下汗水 携手并肩,共同迎接 挥洒汗水,不怕风雨 相信未来,照亮前方 飞越那高墙,化作一只鹰 在刀锋之上,展翅翱翔 无畏风暴,勇往直前 心中激情,点燃狂热 时间是唯一的答案 勇气伴随,每一个选择 让梦想开花,闪烁璀璨 我绝不放弃,继续追寻 飞越那高墙,化作一只鹰 在刀锋之上,展翅翱翔 无畏风暴,勇往直前 心中激情,点燃狂热 在烈火中,找到我自己 直到最后一刻,我依然存在 在这片舞台,放声歌唱 我永远不会,停止追寻 梦想的光芒,指引前方"}
+{"id": 11, "file_origin": "final_zh_test_multi.jsonl", "file_index": 11, "lyrics": "那片天空蔚蓝如梦 每当我仰望心中有你 岁月流转如风拂面 回忆每个笑容依旧清晰 不知不觉已走过多少年 时光荏苒我依然在这里 爱与梦在心底交织 袭来如潮水的温暖 纵然世界再大也不怕 我会牵着你不放 那片海浪轻轻拍打 藏着你我故事的印记 每一个黄昏沉醉其中 仿佛时间都停在那一刻 夕阳染红了天空的边际 你的眼眸如星星点亮 爱与梦在心底交织 袭来如潮水的温暖 纵然世界再大也不怕 我会牵着你不放 岁月如歌随着风起舞 每一步我都记得你的足迹 心跳在瞬间交错而至 愿这一生有你相伴 爱与梦在心底交织 袭来如潮水的温暖 纵然世界再大也不怕 我会牵着你不放 我会永远守护你 在每个瞬间不离开 我会牵着你不断前行 永远,到永远 直到海枯石烂,心不变 我会牵着你,无论在哪里 永远,你是我唯一 爱与梦在心底交织 我会把你珍藏在心 直到时间的尽头 我只想牵着你 无畏过往,勇往直前 我将一直陪着你 直到最后一刻,爱你如此 不离不弃,看穿岁月 我会牵着你飞翔 直到最后一刻,爱你如此 无畏前路,坚定不移 我将一直爱你,接受你的爱 在每个瞬间,共同走过人生"}
+{"id": 12, "file_origin": "final_zh_test_multi.jsonl", "file_index": 12, "lyrics": "在夜空中闪烁的星辰 是我思念的片段 每一颗都在诉说 那段爱过的瞬间 回忆如流水倾泻而下 涌进我心房的每一个角落 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 岁月像棋局,难以预测 我们曾是无畏的旅人 穿越风雨的磨砺 依然选择携手共进 让每个瞬间都闪耀光芒 不让遗憾掩埋梦想 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 即使时光荏苒,变迁万千 我依然愿意为你守候 每次回首,都能看到 那段永恒的爱恋 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 在夜空中闪烁的星辰 是我思念的片段 每一颗都在诉说 那段爱过的瞬间 回忆如流水倾泻而下 涌进我心房的每一个角落 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 岁月像棋局,难以预测 我们曾是无畏的旅人 穿越风雨的磨砺 依然选择携手共进 让每个瞬间都闪耀光芒 不让遗憾掩埋梦想 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 即使时光荏苒,变迁万千 我依然愿意为你守候 每次回首,都能看到 那段永恒的爱恋 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 在夜空中闪烁的星辰 是我思念的片段 每一颗都在诉说 那段爱过的瞬间 回忆如流水倾泻而下 涌进我心房的每一个角落 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行 岁月像棋局,难以预测 我们曾是无畏的旅人 穿越风雨的磨砺 依然选择携手共进 让每个瞬间都闪耀光芒 不让遗憾掩埋梦想 我想起你的笑 如同晨曦的温暖 在梦中牵着我的手 不再孤单漫步前行"}
+{"id": 13, "file_origin": "final_zh_test_multi.jsonl", "file_index": 13, "lyrics": "在暴风雨中呐喊 撕裂这无尽的黑暗 心中的火焰燃烧 引领我走向明天 不再沉默,勇往直前 每一次跌倒都是起点 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 挑战无畏,心跳加速 追逐梦想不能停下 生命的旋律在悠扬 唤醒我最初的渴望 全力以赴,迎接未来 绝不放弃心中信念 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 奔跑吧,不要停下 全世界是我的舞台 狂欢的旋律唤醒了我 在梦想中自由飞翔 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 在暴风雨中呐喊 撕裂这无尽的黑暗 心中的火焰燃烧 引领我走向明天 不再沉默,勇往直前 每一次跌倒都是起点 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 挑战无畏,心跳加速 追逐梦想不能停下 生命的旋律在悠扬 唤醒我最初的渴望 全力以赴,迎接未来 绝不放弃心中信念 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 奔跑吧,不要停下 全世界是我的舞台 狂欢的旋律唤醒了我 在梦想中自由飞翔 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 在暴风雨中呐喊 撕裂这无尽的黑暗 心中的火焰燃烧 引领我走向明天 不再沉默,勇往直前 每一次跌倒都是起点 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野 挑战无畏,心跳加速 追逐梦想不能停下 生命的旋律在悠扬 唤醒我最初的渴望 全力以赴,迎接未来 绝不放弃心中信念 我会挣脱束缚 在希望的前方 重获自由的力量 像风一样狂野"}
+{"id": 14, "file_origin": "final_zh_test_multi.jsonl", "file_index": 14, "lyrics": "随风飘散的记忆 像那细雨,轻轻滴落 我坐在老树下,眺望空白 曾经的岁月,是否还在 世界在变换,情绪难解 不知不觉,谁在叹息 有没有人,在夜深人静时 和我一起,述说这一切 在那远方的呼唤中 是否有温暖,伴我同行 走过的路,藏着梦 那些笑容,以及失落 时光匆匆,未曾停下 如影随形,藏在心底 世界在变幻,情绪难解 不知不觉,谁在叹息 有没有人,在夜深人静时 和我一起,述说这一切 在那远方的呼唤中 是否有温暖,伴我同行 往事如烟,随风飘逝 小路尽头,有你微笑 日子用力而过,常常迷惘 但有回忆,永不模糊 有没有人,在夜深人静时 和我一起,述说这一切 在那远方的呼唤中 是否有温暖,伴我同行 在黎明的阳光下,我们相遇 点亮心中,每一份期待 在漫长的路途上,继续启航 用爱去编织,那些美好 无论何时,梦回的瞬间 我会记得,有你同行 燃起心中,温暖的火焰 让视线交汇,重回最初 在这片时光的海洋 与你相拥,再次启航 清风徐来,温暖依然 记得那份,难舍难分 我们在心中,化作永恒 在这片时光里,与你同行 直到天涯,直到海角 无悔的时光,写下一段传奇 我们的故事,永流传"}
+{"id": 15, "file_origin": "final_zh_test_multi.jsonl", "file_index": 15, "lyrics": "在这个寂静的夜晚 思绪如潮涌上心头 你是否也在远方 轻声呼喊着我的名字 过往的点滴如烟消散 我却在这空荡的心里 追寻那些消逝的光 是否能再重现 我在黑暗中呐喊 渴望找到那份温暖 即便路途再艰辛 我也不放弃希望 月光照亮我孤独的身影 回忆如影随形不去 你是否也会想起 那些一起走过的日子 每个瞬间都恍若电影 在我脑海中不停循环 回忆虽美却无法追 我只能默默承受 我在黑暗中呐喊 渴望找到那份温暖 即便路途再艰辛 我也不放弃希望 破碎的梦依然闪耀 在我心底呼喊着 那道光明会再次出现 指引我前行的方向 我在黑暗中呐喊 渴望找到那份温暖 即便路途再艰辛 我也不放弃希望 我也不会放弃希望 直到有一天,光明会照亮 我的前路,也让我再一次爱"}
+{"id": 16, "file_origin": "final_zh_test_multi.jsonl", "file_index": 16, "lyrics": "在夜空下,我独自漫步 思绪如星辰,闪烁不定 轻风拂面,回忆如潮水 每一步都印着你微笑的痕迹 那些年华,如烟火绽放 你我曾拥抱,梦中恍若依然 亲爱的,你是否听见我呼唤 在这个时空,寻找那段岁月 无论时间如何流转 我将永远铭记,爱你的心跳 阳光洒下,映出你的影子 我在角落,静静望着远方 岁月的河流,带不走我们的梦 只愿与你,共度这一生 那些承诺,如同星空闪烁 即使世界变幻,我心不变 亲爱的,你是否听见我呼唤 在这个时空,寻找那段岁月 无论时间如何流转 我将永远铭记,爱你的心跳 不论前方的路,多么曲折 只要有你,这些都不算什么 随风飘荡的梦,永不停歇 因为你是,我心中的光 亲爱的,你是否听见我呼唤 在这个时空,寻找那段岁月 无论时间如何流转 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 亲爱的,你是否听见我呼唤 在这个时空,寻找那段岁月 无论时间如何流转 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 亲爱的,你是否听见我呼唤 在这个时空,寻找那段岁月 无论时间如何流转 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 亲爱的,你是否听见我呼唤 在这个时空,寻找那段岁月 无论时间如何流转 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳 我将永远铭记,爱你的心跳"}
+{"id": 17, "file_origin": "final_zh_test_multi.jsonl", "file_index": 17, "lyrics": "在这个静谧的清晨 小村庄的路,依旧蜿蜒 我提着梦想,走过那条老街 阳光洒下,温暖了我的心 回忆的风,正轻轻吹拂 我在每一瞬间,找寻失落的您 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 星空下,我仰望远方 想象你的笑,宛如晨曦 岁月的长河,承载着我的希冀 不论多远,你永在心里 这些年来,我从未忘记 每个细节,铭刻在心里 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 时光荏苒,花开又谢 每一个瞬间,都是爱的见证 无论前路多曲折 我都会微笑,继续前行 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 我将永远铭记,烙印在心底 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 我将永远铭记,烙印在心底 我将永远铭记,烙印在心底 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 我将永远铭记,烙印在心底 我将永远铭记,烙印在心底 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 我将永远铭记,烙印在心底 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 我将永远铭记,烙印在心底 我将永远铭记,烙印在心底 你知否,我心中那片蓝天 浸染着思念,如此悠远 每一缕阳光,都是你温暖的手 在我心底,轻声呢喃 我将永远铭记,烙印在心底"}
+{"id": 18, "file_origin": "final_zh_test_multi.jsonl", "file_index": 18, "lyrics": "片片落叶在心间, 思念的风轻轻拂面, 温暖的阳光照进梦, 你似乎依然在身边. 那段往事如流水, 在我心头挥之不去, 你的一笑似春风, 轻轻摇曳我沉默的心. 我在这片温暖的土地, 怀抱着过去的回忆, 念你的声音在耳边, 如同那遥远的歌谣. 星辰洒满了夜空, 而你却在我梦中, 轻声呼唤着我的名字, 仿佛依然不曾远去. 那段往事如流水, 在我心头挥之不去, 你的一笑似春风, 轻轻摇曳我沉默的心. 我在这片温暖的土地, 怀抱着过去的回忆, 念你的声音在耳边, 如同那遥远的歌谣. 追逐那隐约的身影, 在岁月的长河中渐行渐远, 我想把你的微笑, 封存成每一个片段. 我在这片温暖的土地, 怀抱着过去的回忆, 念你的声音在耳边, 如同那遥远的歌谣. 片片落叶在心间, 思念的风轻轻拂面, 温暖的阳光照进梦, 你似乎依然在身边. 那段往事如流水, 在我心头挥之不去, 你的一笑似春风, 轻轻摇曳我沉默的心. 我在这片温暖的土地, 怀抱着过去的回忆, 念你的声音在耳边, 如同那遥远的歌谣. 星辰洒满了夜空, 而你却在我梦中, 轻声呼唤着我的名字, 仿佛依然不曾远去. 那段往事如流水, 在我心头挥之不去, 你的一笑似春风, 轻轻摇曳我沉默的心. 我在这片温暖的土地, 怀抱着过去的回忆, 念你的声音在耳边, 如同那遥远的歌谣. 追逐那隐约的身影, 在岁月的长河中渐行渐远, 我想把你的微笑, 封存成每一个片段. 我在这片温暖的土地, 怀抱着过去的回忆, 念你的声音在耳边, 如同那遥远的歌谣."}
+{"id": 19, "file_origin": "final_zh_test_multi.jsonl", "file_index": 19, "lyrics": "在星空下漫步 感受风的低语 回忆涌上心头 如同那时的甜蜜 我握紧你的手 温暖抵御寒冷 心中种下希望 即使阴霾再深 我们一起追梦 在这段旅程里 无论多遥远的路 我永远不放弃 在那里日出时刻 阳光洒在大地 你我笑容灿烂 如花般的绚丽 岁月静好如歌 时间为我们停留 希望的种子静静 在心中悄然生长 我们一起追梦 在这段旅程里 无论多遥远的路 我永远不放弃 生命如歌悠扬 在每一个晨光中 我们的歌声飞扬 谱写爱的篇章 我们一起追梦 在这段旅程里 无论多遥远的路 我永远不放弃 在那天边的霞光 有你陪我同行 彼此间的默契 让我们勇敢前行 请相信这份情 永不会消失不见 愿我们携手走过 人生的每个季节 我们一起追梦 在这段旅程里 无论多遥远的路 我永远不放弃 即使前方风雨 也会把我锤炼 梦想在我心中 如同那颗璀璨 流星划过夜空 许下了愿望 无论多艰难的路 我将勇敢前行 我们一起追梦 在这段旅程里 无论多遥远的路 我永远不放弃 生活的每一次选择 如花瓣纷飞舞动 让我们在这旅程 感受爱的温暖 我们一起追梦 在这段旅程里 无论多遥远的路 我永远不放弃"}
+{"id": 20, "file_origin": "final_zh_test_multi.jsonl", "file_index": 20, "lyrics": "在那条熟悉的街道上 我依稀看见你的笑脸 阳光洒在树梢上 记录着我们的点滴 时光如流水,无法停留 你是否也在回忆里游走 只想再一次,握住你的手 倾听那份未说的心声 岁月轻轻走,我们依然在此 你是我心中永远的牵挂 春风轻拂过我脸庞 带走了多少过往 我站在这座老地方 思念像潮水般涌来 每当夜幕降临,星星闪烁 我会默默地希望你能再归来 只想再一次,握住你的手 倾听那份未说的心声 岁月轻轻走,我们依然在此 你是我心中永远的牵挂 时间的沙漏,流淌着我们的梦 在心底的角落,依然在等待 若有一天,你能听见我心声 请记得我曾如此深爱 只想再一次,拥抱你的影子 在这个漫长的夜晚里 我依然在这里,等候着你 直到钟声敲响,带走我的思念 只想再一次,握住你的手 倾听那份未说的心声 岁月轻轻走,我们依然在此 你是我心中永远的牵挂 在那条熟悉的街道上 我依稀看见你的笑脸 阳光洒在树梢上 记录着我们的点滴 时光如流水,无法停留 你是否也在回忆里游走 只想再一次,握住你的手 倾听那份未说的心声 岁月轻轻走,我们依然在此 你是我心中永远的牵挂 时间的沙漏,流淌着我们的梦 在心底的角落,依然在等待 若有一天,你能听见我心声 请记得我曾如此深爱 只想再一次,拥抱你的影子 在这个漫长的夜晚里 我依然在这里,等候着你 直到钟声敲响,带走我的思念 只想再一次,握住你的手 倾听那份未说的心声 岁月轻轻走,我们依然在此 你是我心中永远的牵挂 春风轻拂过我脸庞 带走了多少过往 我站在这座老地方 思念像潮水般涌来 每当夜幕降临,星星闪烁 我会默默地希望你能再归来"}
+{"id": 21, "file_origin": "final_zh_test_multi.jsonl", "file_index": 21, "lyrics": "在静谧的森林之中 回响着古老的歌谣 树影摇曳仿佛在诉说 岁月的真相不能遗忘 每一片叶子都在倾听 大地在低语着秘密 这里的空气如此清新 心灵在自然中徜徉 让我们在此共舞 与风一起旋转 生命如露珠般晶莹 在阳光下闪烁 越过蜿蜒的溪流 踏上那条未走的路 迎接每个晨曦的呼唤 把梦想种在心田 即使风雨也会来袭 信念终将带我前行 每一次跌倒再勇敢 都让我更懂得坚持 让我们在此共舞 与风一起旋转 生命如露珠般晶莹 在阳光下闪烁 大海的波涛呼唤着我 星空的微光指引我 在每个夜晚的思考中 寻找到生命的真谛 让我们在此共舞 与风一起旋转 生命如露珠般晶莹 在阳光下闪烁 让梦想再次起航 在风中荡漾 我愿在此刻停留 与自然同在一生 每一次呼吸都珍重 在大地的怀抱里 我找到真正的自己 在自然的歌声中 我与天地共鸣"}
+{"id": 22, "file_origin": "final_zh_test_multi.jsonl", "file_index": 22, "lyrics": "风起时又是这个季节 树影斑驳,梦在悄然盛开 我在回忆中寻觅你的脸 旧时光像陈酒,愈发醇厚 只要一瞬间,便足够相拥 时间飞逝,你却永在我心 星辰明灭欲诉浓情爱意 夜色如墨,秘密在月光里 阡陌之间,梦回千千遍 你我携手,踏遍这人间 花开花落,往事如烟 岁月无声,却写满诗篇 你那温柔的目光,穿透了时间 共舞在这生活的舞台,璀璨美梦 心跳加速,随你而动 爱如潮水,翻涌心底深处 星辰明灭欲诉浓情爱意 夜色如墨,秘密在月光里 阡陌之间,梦回千千遍 你我携手,踏遍这人间 回头望,心中有光 走过四季,爱未曾离去 就这样 forever, 彼此依偎 在这条路上,无所畏惧 星辰明灭欲诉浓情爱意 夜色如墨,秘密在月光里 阡陌之间,梦回千千遍 你我携手,踏遍这人间 在星空下,一起再出发 不怕风雨,只愿与你靠近 描绘那份爱,永不退色 我的未来因你而光彩 每一次呼吸都是为你而活 你是我心海的朝阳,永不沉落 在每一个角落,播撒甜蜜 和风细雨,与你相随 温暖的早晨,第一缕阳光 点亮了生活,与你同行 愿每个瞬间,都是美好回忆 告别遗憾,拥抱未来 如今愿与你,共创未来篇章 写下的诗篇,都是爱的誓言 让每颗星,都为你闪耀 你我携手,踏遍这人间 在这段旅程,心生感动 无论前方多远,永不分离 岁月长河,爱永不散 与你并肩行走,直到天涯海角"}
+{"id": 23, "file_origin": "final_zh_test_multi.jsonl", "file_index": 23, "lyrics": "在岁月的回忆中 那些欢声笑语依然清晰 晨曦下的阳光随风舞动 你我之间的秘密更显珍贵 时光不会倒流,任它飞逝 却在心中种下了温暖的种子 每一段旅程都镌刻在心 与你的笑容是我最美的篇章 让岁月的风轻轻教会我 如何珍惜每一个平凡的日子 漠然的夜空依旧闪烁 像是对我低语的星辰 在你离去的那一刻 留下的阴影,照亮我的回忆 多少个孤独的夜晚,我仰望 思念的情绪如潮水般涌动 每一段旅程都镌刻在心 与你的笑容是我最美的篇章 让岁月的风轻轻教会我 如何珍惜每一个平凡的日子 让我们编辑未来的每个篇章 不再害怕失去,拥抱希望 记忆的足迹伴着流星划过 带着你的温暖永远在我心间 每一段旅程都镌刻在心 与你的笑容是我最美的篇章 让岁月的风轻轻教会我 如何珍惜每一个平凡的日子 在岁月的海洋,我会静静守候 直到再次与你相遇的那天 再一次拥抱着,那个熟悉的你 让我在梦中继续编织着我们的故事 永不停止的爱的旋律,诠释着生活的意义 在这一刻,所有愿望都已成真 让我在岁月中,与你手牵手前行 穿越时间的阻隔,心中永远有你 无论前方有多少未知,我都将不离不弃"}
+{"id": 24, "file_origin": "final_zh_test_multi.jsonl", "file_index": 24, "lyrics": "阳光洒在小路上 脚步轻快而悠然 在那片熟悉的麦田 回忆像风筝一样飞 老树下的秋千摇 童年的笑声回荡 那一刻是自由 心灵的羁绊与飞翔 温暖的阳光洒满 我心中那片田野 曾经的梦在闪烁 似乎就在眼前 岁月如歌在流淌 车窗外的风景变 虽然时光已经远去 心中依然不曾忘 那片熟悉的麦田 依然在我心中 回忆像风一样飞 轻轻拂过我的脸 温暖的阳光洒满 我心中那片田野 曾经的梦在闪烁 似乎就在眼前 就让这旋律飞扬 在记忆的河流中 每个音符都是温暖 你在我心中永存 温暖的阳光洒满 我心中那片田野 曾经的梦在闪烁 似乎就在眼前 在每一个寂静的夜晚 我都会悄悄想起 那些金色梦境 依旧在心中闪烁 温暖的阳光洒满 我心中那片田野 曾经的梦在闪烁 似乎就在眼前 岁月如歌在流淌 车窗外的风景变 虽然时光已经远去 心中依然不曾忘 温暖的阳光洒满 我心中那片田野 曾经的梦在闪烁 似乎就在眼前 在每个晨曦初露 我都会悄悄想起 那些金色梦境 依旧在心中闪烁 温暖的阳光洒满 我心中那片田野 曾经的梦在闪烁 似乎就在眼前"}
+{"id": 25, "file_origin": "final_zh_test_multi.jsonl", "file_index": 25, "lyrics": "在静谧的夜空下 月光透过枝叶间 思念悄然浮现 像风轻拂过脸庞 你留给我 那些动人的回忆 那段过往 在心底的牵绊 无论时光如何流转 心永远只向你倾斜 愿追逐那段青春 在每个梦里重现 灯光暗淡的街角 我们曾一同驻足 低语间的笑声 染上了满满温柔 那段欢愉 无可替代的瞬间 我愿意 永远铭记在心 无论时光如何流转 心永远只向你倾斜 愿追逐那段青春 在每个梦里重现 岁月如歌 旋律在耳边低吟 我相信 命运的安排会再相见 无论时光如何流转 心永远只向你倾斜 愿追逐那段青春 在每个梦里重现 在每个梦里重现 在每个梦里重现 在静谧的夜空下 我依然守候着 只愿你的未来 不再有孤独的伤痕 无论时光如何流转 我都会在这里等你 在这条漫长的路上 只为与你相会 愿追逐那段青春 在每个梦里重现 无论未来多变化 我心永远只在你身边 在每个梦里重现 在每个梦里重现 都不会忘记那曾经 温暖的时光 心只向你倾斜 一如当初的承诺 那段青春 会在梦里绽放 会在梦里绽放 会在梦里绽放 希望在未来 再次遇见温暖的你 再次遇见温暖的你"}
+{"id": 26, "file_origin": "final_zh_test_multi.jsonl", "file_index": 26, "lyrics": "在城市的灯光下 我追逐着夜的旋律 每一步都是心跳 梦想在心中盛开 这条路上不再孤单 有你在身旁心欢畅 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 时光在指尖滑落 回忆是最美的碎片 每一个瞬间绽放 印刻在我的脑海 这条路上不再孤单 有你在身旁心欢畅 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 不怕距离与时间 我们的爱炙热如焰 纵使前方路曲折 手握着手,心相连 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 这是我们的故事 汇聚千万个瞬息 你我共同谱写 人生最美的旋律 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 一起梦游星海 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 在这个美丽的晚上 我与你共舞翩然 永远不再孤单 心连心一起歌唱 不怕距离与时间 我们的爱炙热如焰 纵使前方路曲折 手握着手,心相连 让我们一起歌唱 在星空下舞动心房 无畏风雨与黑暗 只要与你并肩 一起梦游星海 在这个美丽的晚上 我与你共舞翩然 永远不再孤单 心连心一起歌唱 让我们一起歌唱"}
+{"id": 27, "file_origin": "final_zh_test_multi.jsonl", "file_index": 27, "lyrics": "在那遥远的天边, 月光洒下柔和的光辉, 我行走在记忆的路上, 回忆如潮水般涌来。 那些曾经的岁月, 如白云在天际飘散, 温暖的笑声回荡, 在心底深处不曾遗忘。 你说过要陪我走, 一路追寻梦想的火焰, 如今却只剩下我, 独自仰望那星空。 我在风中呼喊你的名字, 那星光指引着我。 纵然时光的河流, 冲刷了那些誓言, 我依然坚信,总会重逢, 在那梦中的彼岸。 日出时分的光辉, 仿佛映照你的笑颜, 在每一个晨曦的起点, 我都在心中默念。 思念如草原般辽阔, 每一寸土地有你的影子, 无论多远的距离, 你永远是我心中记忆。 未来的路有些茫然, 但我知道你在前方, 每一步都踏出希望, 心中爱永不消亡。 我在风中呼喊你的名字, 那星光指引着我。 纵然时光的河流, 冲刷了那些誓言, 我依然坚信,总会重逢, 在那梦中的彼岸。 每当黑夜降临, 我仰望那一轮明月, 它的光辉照亮前路, 让我不再孤单漂流。 我在风中呼喊你的名字, 那星光指引着我。 纵然时光的河流, 冲刷了那些誓言, 我依然坚信,总会重逢, 在那梦中的彼岸。 我在梦中追寻你的身影, 那永恒的信念牵引着我。 即便岁月荏苒, 爱意始终不会褪色, 我依然坚信,总会重逢, 在那光芒闪耀的未来。"}
+{"id": 28, "file_origin": "final_zh_test_multi.jsonl", "file_index": 28, "lyrics": "在城市的霓虹下, 每一个梦想在闪烁, 迎接每一个新的夜晚, 心中热血在迸发。 未来的路我们共走, 抬头仰望星海的波澜, 不再畏惧那些挫折, 微笑面对我的人生。 坚持着追寻的步伐, 从未想过要放弃, 这一刻绝不回头, 找到属于我的舞台。 就让音乐响起, 像火焰般燃烧, 在每一次呼吸中, 感受着心跳的节奏。 不怕黑夜的覆盖, 勇敢向前奔跑, 相信自己能飞得更高, 这就是我的信仰。 翻越每个高峰, 挑战所有未知的边界, 无畏的心从未动摇, 我时刻准备着出发。 每一次汗水闪耀, 都在浇灌着梦想的花朵, 坚定地迈出每一步, 与世界共舞,尽情释放。 拼搏的身影交织着, 每个瞬间都铭刻, 我不要停歇, 在这追梦的征途。 就让音乐响起, 像火焰般燃烧, 在每一次呼吸中, 感受着心跳的节奏。 不怕黑夜的覆盖, 勇敢向前奔跑, 相信自己能飞得更高, 这就是我的信仰。 放下那些抱怨, 绽放出最美的微笑, 无畏的灵魂,自由翱翔, 让心跳的节奏指引。 就让音乐响起, 像火焰般燃烧, 在每一次呼吸中, 感受着心跳的节奏。 不怕黑夜的覆盖, 勇敢向前奔跑, 相信自己能飞得更高, 这就是我的信仰。 我将在音乐中飞翔, 追逐那破晓的希望, 心灵的舞动永不停歇, 这就是我永恒的梦想。"}
+{"id": 29, "file_origin": "final_zh_test_multi.jsonl", "file_index": 29, "lyrics": "夜空下的星光,照亮回家的路 风的低语,诉说着故事 在这条小径,足迹渐远 每一个转角,都藏着回忆 走过的每一步,都是印记 心中的渴望,如潮水涌来 追寻的梦,在那遥远地 像晨曦般温暖,唤醒希望 即便跌倒,仍不放弃 爱如星辰,指引航程 寂静的夜晚,思绪万千 那里有一个,属于我的你 记忆的画卷,轻轻展开 每一抹色彩,都是心潮 走过的每一步,都是印记 心中的渴望,如潮水涌来 追寻的梦,在那遥远地 像晨曦般温暖,唤醒希望 即便跌倒,仍不放弃 爱如星辰,指引航程 每一页篇章,都有新篇 每一次相遇,都是缘分 让人生的旋律,回荡心田 即使风雨,依旧前行 追寻的梦,在那遥远地 像晨曦般温暖,唤醒希望 即便跌倒,仍不放弃 爱如星辰,指引航程 在心底深处,刻下你的名 即使时光,无情流淌 追寻的梦,在那遥远地 像晨曦般温暖,唤醒希望 即便跌倒,仍不放弃 爱如星辰,指引航程 在每个瞬间,与我相随 在记忆深处,永不孤单 指引着我的,航程在继续 如星辰般长明,永不消逝 在心中的梦,熠熠生辉"}
+{"id": 30, "file_origin": "final_zh_test_multi.jsonl", "file_index": 30, "lyrics": "在辽阔的海洋上 月光洒落成银沙 我倾听潮水低语 记忆涌现如浪潮 每一波浪都是思念 每一次回波如心跳 带我穿越时空的隧道 带我回到旧日的笑颜 无论多远的距离 我依然在你身旁 就像海风轻吻着 无尽的天际与梦想 行走在青翠的山巅 每一次攀登都不惧 仿佛能触碰云端 俯瞰大地的宽广 阳光穿透每一道缝隙 照亮我前行的方向 我在风中追逐希望 让心灵自由在飞扬 无论多远的距离 我依然在你身旁 就像海风轻吻着 无尽的天际与梦想 每一次相遇都是命运 每一段旅程都是传奇 让我与你携手共去 凝视未来的无尽星空 无论多远的距离 我依然在你身旁 就像海风轻吻着 无尽的天际与梦想 我将心寄于星光 让爱的旋律响起 在每一个梦里相聚 在每一个梦里相聚 在辽阔的海洋上 月光洒落成银沙 我倾听潮水低语 记忆涌现如浪潮 每一波浪都是思念 每一次回波如心跳 带我穿越时空的隧道 带我回到旧日的笑颜 无论多远的距离 我依然在你身旁 就像海风轻吻着 无尽的天际与梦想 行走在青翠的山巅 每一次攀登都不惧 仿佛能触碰云端 俯瞰大地的宽广 阳光穿透每一道缝隙 照亮我前行的方向 我在风中追逐希望 让心灵自由在飞扬 无论多远的距离 我依然在你身旁 就像海风轻吻着 无尽的天际与梦想 每一次相遇都是命运 每一段旅程都是传奇 让我与你携手共去 凝视未来的无尽星空 无论多远的距离 我依然在你身旁 就像海风轻吻着 无尽的天际与梦想 我将心寄于星光 让爱的旋律响起 在每一个梦里相聚 在每一个梦里相聚"}
+{"id": 31, "file_origin": "final_zh_test_multi.jsonl", "file_index": 31, "lyrics": "阳光洒在窗前, 每一缕光影映照着梦, 回忆在温暖中闪烁, 那些笑声仿佛在耳畔。 时间的河流轻轻流淌, 带走了忧伤和等待, 心中的火焰再次燃起, 与你共舞那片星空。 用力呼吸这份爱的甜, 在无尽的旅途中你我相伴, 就算风雨无法阻挡, 我会守护你直到永远。 每一步都在寻找未来, 心灵的火花在闪耀, 为你谱写爱的乐章, 让每个瞬间都变得美好。 时间的河流轻轻流淌, 带走了忧伤和等待, 心中的火焰再次燃起, 与你共舞那片星空。 用力呼吸这份爱的甜, 在无尽的旅途中你我相伴, 就算风雨无法阻挡, 我会守护你直到永远。 夜空洒下繁星点点, 梦想在心中交织成网, 你的笑容如晨曦般明亮, 让我勇敢面对一切挑战。 用力呼吸这份爱的甜, 在无尽的旅途中你我相伴, 就算风雨无法阻挡, 我会守护你直到永远。 给你我所有的梦想, 将我们的路铺成希望, 记得那份约定,无论何时, 我都会在你身旁。 未来的路我们一起走, 手牵手迎接每一个朝阳, 就让爱在这片天空蔓延, 永远伴随你的每一个早晚。 直到最后一刻我也不曾离开, 用一生的爱守护着你, 在每个瞬间都与你分享, 直到星河转动,我依然爱随你。 让我们一起去追逐那梦想, 无畏前路的风雨再难挡, 在爱的港湾我愿等待, 你回头看见我微笑的脸。"}
+{"id": 32, "file_origin": "final_zh_test_multi.jsonl", "file_index": 32, "lyrics": "黑夜像一块幕布, 回忆在星光里闪烁, 当时的我们手牵手, 无畏未来的荒芜与孤独。 每段旋律都在诉说, 那些温暖的感觉是如此真实, 岁月如歌轻轻流淌, 在心底掀起回忆的浪涛。 燃烧吧,心中的火焰, 在夜空下,绽放无限光芒, 就算再多的风雨,只要有你, 我依旧会在你身边守护你。 告别那片熟悉的街, 每一步都在追寻成长, 回忆的镜头在转动, 那年我们满怀希望的笑声。 每一颗星辰如梦似幻, 掩埋了无数的渴望与思念, 但在每个孤独的瞬间, 你是我心中的那颗明珠。 燃烧吧,心中的火焰, 在夜空下,绽放无限光芒, 就算再多的风雨,只要有你, 我依旧会在你身边守护你。 每当孤独袭来时, 我就听见你的声音, 在心底的那份熟悉, 如同潮水,涌入心间。 燃烧吧,心中的火焰, 在夜空下,绽放无限光芒, 就算再多的风雨,只要有你, 我依旧会在你身边守护你。 让我们共同铭记,那份纯粹, 在岁月的深处,我们迎接未来, 无论时光如何变迁, 那份爱依然牢牢锁在心间。 就在那夜空下,我依然呐喊, 不怕一切阻碍,时间的流转, 与你的回忆永不消散, 让我们把每个梦想,都一起实现。 那颗心灵的火焰,照耀未来, 我将守护着,直到永远, 在每次的回眸里, 我唯有你是我不变的信仰。 即便黑暗渐渐漫延, 我们的心依旧在闪烁, 燃烧着那个爱的信念, 永不放手,与你共赴明天。 燃烧吧,心中的火焰, 让爱之光引领我前行, 不再孤单,用这个夜晚, 书写出我们爱的乐章。"}
+{"id": 33, "file_origin": "final_zh_test_multi.jsonl", "file_index": 33, "lyrics": "记忆如风轻轻呼唤 那些年少的时光 在那条小路旁边 你对我轻声细语 岁月不能再倒流 留下了多少牵挂 我在每个黄昏时分 追寻你的印记 那是我心中的歌谣 在田野里回荡 如同那轻柔的风 让我怀念着过往 画面在眼前浮现 阳光洒满青草地 我们手牵手漫步 每一次都是新的开始 岁月不能再倒流 留下了多少牵挂 我在每个黄昏时分 追寻你的印记 那是我心中的歌谣 在田野里回荡 如同那轻柔的风 让我怀念着过往 如果可以再回去 我会紧握你的手 把每一刻铭记 在心底永不忘记 那是我心中的歌谣 在田野里回荡 如同那轻柔的风 让我怀念着过往 在每一个晨曦初露 你存在我的心上 岁月在时光流转 我依然带着这份情 在往后的每个瞬间 让幸福继续前行 那是我心中的歌谣 在岁月里回响 那些美好的回忆 是我永不疲惫的飞翔 在那片熟悉的田野 我相信会再相逢 每一首歌都在倾诉 谱写出爱的乐章 在回忆里永存的地方 我仍然在梦中追寻 愿所有的温暖常在 让我再一次感受你 在每个落日下的晚风"}
+{"id": 34, "file_origin": "final_zh_test_multi.jsonl", "file_index": 34, "lyrics": "在旧街角的咖啡馆 我喝着冰冷的咖啡 回忆如潮水般涌来 你曾的笑容依然清晰 夜风轻轻掠过 一切都了无痕迹 你知道我多想念 那些年的欢笑与泪水 即使时光不再 我依旧在等待 那条曾一起走过的路 如今落满了尘埃 阳光洒在陌生的脸庞 我的心却无法平静 不断回想往昔 时光如箭般匆匆 你知道我多想念 那些年的欢笑与泪水 即使时光不再 我依旧在等待 也许我们会重逢 在那海边的黄昏 握紧你的手 不再放开 你知道我多想念 那些年的欢笑与泪水 即使时光不再 我依旧在等待 看那星空下的月光 照亮彼此的心房 这一份情感无可替代 如同风在耳边轻语 永远铭刻在心间 直到我们再次相拥 在那个温暖的舞台 轻声唱起你我的歌 让回忆再次涌现 在这个秋天的黄昏 我依旧在这里 等着你,等着那一天 无论多么遥远 怀念永不会改变 我在这里等候着 直到时间的尽头 你的身影再次出现 让每一个音符再一次 萦绕在我的心怀 永不消散"}
+{"id": 35, "file_origin": "final_zh_test_multi.jsonl", "file_index": 35, "lyrics": "在那灿烂的晨光里 我伸出手,拥抱这希望 心中渴望,如风般自由 每一步坚定,迈向未来 回忆如星,闪烁在彼岸 那些曾经的光,照亮我心 我在追逐,梦想的途中 每一次坠落,都是重生 在爱的鼓舞中 我愿去探索,未知的远方 秋叶随风,飘摇在空中 那一瞬间,仿佛看见你 一起笑语,轻柔的呢喃 铭刻在心,永远不忘 每个梦,都是奔跑的力量 在每个心跳,交错的瞬间 我在追逐,梦想的途中 每一次坠落,都是重生 在爱的鼓舞中 我愿去探索,未知的远方 当夜幕降临,星光闪耀 所有的期待,如潮水涌来 我明白生命,何其珍贵 勇敢追逐,才能体会 我在追逐,梦想的途中 每一次坠落,都是重生 在爱的鼓舞中 我愿去探索,未知的远方 在那灿烂的晨光里 我将继续奔跑,直到永远 不畏前方,一同向前 心中有梦,勇敢表达 每一个瞬间,都是值得珍藏的 我在追逐,梦想的途中 铭记心中的那份期待 在绿意盎然的未来 让我们一起,飞向未来 在爱的鼓舞中,一起探索 那些沧桑过的岁月 都是我心中最美的乐章 在追逐中,编织我们的未来 梦想如星,永远闪耀 勇敢追逐,这份爱与信仰 每一步都踏实,向前不回头 我将继续,受到爱的庇佑 在那轻柔的晨光里 我最终抵达,心之所念"}
+{"id": 36, "file_origin": "final_zh_test_multi.jsonl", "file_index": 36, "lyrics": "在那静谧的夜晚 我坐在旧木椅上 听那风声如诉 唤醒心底的回忆 岁月如歌轻轻吟唱 那些点滴,温暖依然 我在寻找,那些逝去的光 每一个瞬间,不再重来 在小径旁,留下的足迹 是否还能,触摸到感动 长街尽头的灯光 映照着思念的深渊 我轻声呼唤着你 在梦中寻觅,难以忘却 时间的河流缓缓流淌 那些昔日,恍如隔世 我在寻找,那些逝去的光 每一个瞬间,不再重来 在小径旁,留下的足迹 是否还能,触摸到感动 在这无尽的孤独中 我想起了你的微笑 如同晨露般清新 温暖我的心灵 我在寻找,那些逝去的光 每一个瞬间,不再重来 在小径旁,留下的足迹 是否还能,触摸到感动 我在摇曳的烛光下 听那远处的吟唱 如同昔日的回声 逐渐消散,随风而去 在这漫长的思念里 我心依旧,怀念你的音容 在那风中轻声呼唤 能否再见,弥补遗憾 我想在岁月尽头 与你重逢,共叙往事 在这片温暖的夜空 我将继续,守候回忆 静静期待,那一天的到来 在心底深处,永不遗忘 我在寻找,那些逝去的光 每一个瞬间,铭刻心头 那一缕温暖的微风 是我们共同的记忆 在时光中盘旋不去 继续前行,带着爱的梦想 我将走过,岁月的每个角落"}
+{"id": 37, "file_origin": "final_zh_test_multi.jsonl", "file_index": 37, "lyrics": "在星空下漫步,仰望那片夜空 风轻轻撩动着我的思绪 月光洒下的温柔,映照你的微笑 心中涌动如潮,浓烈又深邃 时间仿佛在静止,只有你我 凝视彼此的眼眸,燃起的火花 你是我心底的秘密,无法言语 像那晨露般清新,晨光下漫舞 无论天涯海角,始终与你同行 爱是我们共同的旋律,轻声吟唱 那片花海中,梦与魂共舞 一阵甜蜜的花香,缠绕着我们 纸飞机带着心愿,随风飘远 那一刻,你的笑容,将我融化 牵手走过四季,岁月静好 一切的烦恼,在你怀中消散 你是我心底的秘密,无法言语 像那晨露般清新,晨光下漫舞 无论天涯海角,始终与你同行 爱是我们共同的旋律,轻声吟唱 有时我们会迷失,迷雾中徘徊 但只要有你相伴,心就不再孤单 无论前路如何,我都会紧握你的手 一路向前,勇敢追寻梦想的光 你是我心底的秘密,无法言语 像那晨露般清新,晨光下漫舞 无论天涯海角,始终与你同行 爱是我们共同的旋律,轻声吟唱 此刻,愿与你共度此生的每一瞬 在相爱的旋律里,愿永恒不息 让我们的未来,如此美丽"}
+{"id": 38, "file_origin": "final_zh_test_multi.jsonl", "file_index": 38, "lyrics": "灯火辉煌的街口 人潮如海,我在其中 每张陌生的面孔 却有似曾相识的温柔 生活是漫长的等待 在爱的河边,不知归期 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 每一段旅途,都是新生 在微弱的星光下,写下诗篇 那些欢笑,泪水交织 恍若昨日,依旧鲜明 生活是漫长的等待 在爱的河边,不知归期 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 趁这一刻,让我诉说 灵魂的深处,藏着梦想 未来的路,总会涌现 希望的光芒,是我指引 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 梦无法停驻,时间如风 让我在寂静中,寻找同伴 心中的歌谣,属于我们 伴随着温暖,飘散在天际 心中的歌谣,属于我们"}
+{"id": 39, "file_origin": "final_zh_test_multi.jsonl", "file_index": 39, "lyrics": "在这城市的霓虹下 我踩着过去的影子 每一步都在思念你 那年夏天的午后 你我并肩走过的街 轻声细语隐藏在风里 时间的沙漏,只剩颗粒 但愿我能回到从前 再次凝视你那双眼 我在梦里追逐着你 却始终还是无处可寻 哦,我的爱,能否再相见 在那无忧的岁月里 你的笑容如阳光 照亮我所有的迷惘 每个夜晚都在回忆 心中默念着你的名 就像星空渴望着月亮 无论多远都不曾放弃 这份情感的重量 像浪潮涌动着不息 多想紧握你的手 穿越繁星与时间的缝隙 我愿意为你守候 无论未来多么不可知 哦,我的爱,能否再相见 在那无忧的岁月里 你的笑容如阳光 照亮我所有的迷惘 当潮水退去,留下空白 是那颗心在孤独呐喊 倘若时光能倒流 我会紧紧抓住每一分每一秒 哦,我的爱,能否再相见 在那无忧的岁月里 你的笑容如阳光 照亮我所有的迷惘 那一抹温暖还在心间 我将继续追寻我们的远方 直到世界的尽头,直到你的身旁 在这城市的霓虹下 我依然更新着思念 期待着再次相见 哦,我的爱,能否再相见 在那无忧的岁月里 你的笑容如阳光 照亮我所有的迷惘 哦,我的爱,能否再相见 在那无忧的岁月里 你的笑容如阳光 照亮我所有的迷惘 在那无忧的岁月里 你曾是我心底的光芒 而现在 我依然在寻找这一切 每一颗星星都在闪烁 而我只想找到你 在亮起的瞬间 哦,我的爱"}
+{"id": 40, "file_origin": "final_zh_test_multi.jsonl", "file_index": 40, "lyrics": "秋风轻轻拂过脸颊 像你的手温暖而细腻 我在那片金色的田野 感受着天地间的亲切 那时我们还年轻 梦想在心中荡漾 我吟唱着时光的旋律 在阳光下追逐无尽的希望 愿我们的心灵 能够永远相连 哦,岁月如歌 每一个音符都是回忆 我们在这无尽的旅途中 相聚,相别,然再相遇 夜空下星辰闪耀 是你我曾共享的梦想 即使远方也不会改变 心中的那份宁静 我带着记忆前行 不畏惧风雨的洗礼 我吟唱着时光的旋律 在月光下追逐无尽的希望 愿我们的心灵 能够永远相连 哦,岁月如歌 每一个音符都是回忆 我们在这无尽的旅途中 相聚,相别,然再相遇 那些美好的瞬间 就像是闪烁的星光 在每个黑夜洒下温暖 指引我前行的方向 哦,岁月如歌 每一个音符都是回忆 我们在这无尽的旅途中 相聚,相别,然再相遇 每一个笑容都是光芒 洒落在心间的柔软 我会紧握这一切 直到久远的未来 在这旋律的海洋 我与你共舞的时光 是梦,是希望的翅膀 带我飞向远方 我会心存感激 你的陪伴如影随形 在每个清晨与夜晚 我都感受到芬芳 哦,岁月如歌 每一个音符都是回忆 我们在这无尽的旅途中 相聚,相别,然再相遇 在每次的相遇中 我会铭记你的笑颜 直到永远 我与你共舞的时光 哦,岁月如歌"}
+{"id": 41, "file_origin": "final_zh_test_multi.jsonl", "file_index": 41, "lyrics": "在那熟悉的街角, 时间仿佛停滞, 阳光洒在脸庞, 温暖心闷的忧伤。 往昔的记忆如潮水, 涌来又退去, 带不走那份思念, 只能在心中默念。 你是梦中的花, 绽放我心底的美, 纵然时光荏苒, 情感依旧如初。 每一片花瓣都有故事, 在风中悄然飘动, 而我只想留住, 那些与你的回忆。 回首的瞬间如此美, 心若依旧明亮, 就算世界再变化, 我依然的爱着你。 你是梦中的花, 绽放我心底的美, 纵然时光荏苒, 情感依旧如初。 岁月如歌, 但愿与你共舞, 在星空下轻声呢喃, 永远不分离。 你是梦中的花, 绽放我心底的美, 纵然时光荏苒, 情感依旧如初。 你是我心中的爱, 在每个夜里闪耀, 我的梦, 因你而灵动。 在那熟悉的街角。 又让我想着你。 尽管岁月无声。 我们的爱在歌唱, 飘荡在每个梦乡。 你是梦中的花, 绽放我心底的美, 纵然时光荏苒, 情感依旧如初。 就让我在这个瞬间, 与你拥抱成永恒。 在那熟悉的街角。 让我想着你。 尽管岁月无声。 我们的爱在歌唱。 喧闹与宁静之间, 愿与你共鸳鸯。 再一次的相遇, 如梦如幻的轻舞, 在年轮的交替下, 我们的爱不会淡去。 你是梦中的花, 绽放我心底的美, 纵然时光荏苒, 情感依旧如初。 你是那片海, 总能让我沉醉, 在每个梦里, 与你不再分离。 关注彼岸的花, 心中只有你在。 光阴的洗礼, 你是我坚持的爱。 每一个瞬间, 都如心跳声轻柔, 在那熟悉的街角。 另一个轮回, 你是我所有的期盼, 就让我再一次, 和你拥抱成永恒。 你是梦中的花, 绽放我心底的美, 纵然时光荏苒, 情感依旧如初。 在那熟悉的街角, 我会永远想着你。"}
+{"id": 42, "file_origin": "final_zh_test_multi.jsonl", "file_index": 42, "lyrics": "在感受风暴中, 我踏上这条路, 看见挣扎的灵魂, 仰望那无尽星空。 我不会再沉默, 拒绝被同流合污, 要让每个声音, 打破黑暗的束缚。 我们是活火焰, 燃烧在这寂静, 撕裂这无情城市。 要让未来跃入光明, 终将不再屈服。 站在这片废墟, 不再害怕孤独, 渴望自由的声音, 对抗那无情伤痛。 我们的心,如雷霆, 挣扎中不停奔腾, 要把这世界驯服, 创造属于我们的未来。 我们是活火焰, 燃烧在这寂静, 撕裂这无情城市。 要让未来跃入光明, 终将不再屈服。 我们一起跳跃, 跨越所有障碍, 让每个心跳, 激荡在这一刻。 我们是活火焰, 燃烧在这寂静, 撕裂这无情城市。 要让未来跃入光明, 终将不再屈服。 狂热的声音, 汇聚成我们的歌, 每一次呐喊, 击碎所有的枷锁。 奋力向前, 欲望如鲸鱼翻腾, 不屈的信念, 在烈火中重生。 呼喊跟随, 梦想亦燃起, 谋求理想, 这条路不再独行。 无畏的青春, 扑打着强烈的心跳, 在此时此刻, 我们是无敌的风潮。 前路的荆棘, 再也无法阻挡, 直到那光明, 照亮每个心灵。 呐喊的声音, 如潮水般涌来, 将所有的痛苦, 化作不屈的歌。 前行的脚步, 让希望再度崛起, 在这实现梦想, 我们是无畏的勇士。 再一次的呐喊, 燃烧在这星空, 一起创造奇迹, 这个时代的旋律。 风暴中狂舞, 拼搏中的坚韧, 我们永不退缩, 此刻铭记在心。 与风共舞,这就是我们的呼喊, 让每个音符点亮, 前路尽是光芒。 我们是活火焰, 燃烧在这寂静, 撕裂这无情城市。 要让未来跃入光明, 终将不再屈服。 在音符的海洋, 我们的灵魂飞扬。"}
+{"id": 43, "file_origin": "final_zh_test_multi.jsonl", "file_index": 43, "lyrics": "冲破黑暗的束缚, 我要展翅高飞, 不再害怕未知的前路, 我心中燃烧着梦想. 一路向前的勇气, 在心中不停激荡, 别停下脚步, 我的热血在呐喊. 这就是我的归属, 无畏无惧的信仰, 在风中自由飞翔, 这是我最终的方向. 每一次的跌倒, 让我更加坚强, 不怕风雨的洗礼, 我仍旧微笑面对. 曾经的梦在心深处, 将我紧紧包围, 每一次我都相信, 明天会更加完美. 这就是我的归属, 无畏无惧的信仰, 在风中自由飞翔, 这是我最终的方向. 无数个日夜的沉淀, 换来此刻的坚持, 迎接那灿烂晨光, 我将勇敢去追逐. 这就是我的归属, 无畏无惧的信仰, 在风中自由飞翔, 这是我最终的方向. 我不会停止前行, 因为我心中有梦. 冲破黑暗的束缚, 我要展翅高飞, 不再害怕未知的前路, 我心中燃烧着梦想. 一路向前的勇气, 在心中不停激荡, 别停下脚步, 我的热血在呐喊. 这就是我的归属, 无畏无惧的信仰, 在风中自由飞翔, 这是我最终的方向. 每一次的跌倒, 让我更加坚强, 不怕风雨的洗礼, 我仍旧微笑面对. 曾经的梦在心深处, 将我紧紧包围, 每一次我都相信, 明天会更加完美. 这就是我的归属, 无畏无惧的信仰, 在风中自由飞翔, 这是我最终的方向. 无数个日夜的沉淀, 换来此刻的坚持, 迎接那灿烂晨光, 我将勇敢去追逐. 这就是我的归属, 无畏无惧的信仰, 在风中自由飞翔, 这是我最终的方向. 我不会停止前行, 因为我心中有梦."}
+{"id": 44, "file_origin": "final_zh_test_multi.jsonl", "file_index": 44, "lyrics": "在这个静谧的夜晚 我独坐窗前思绪飞扬 岁月如白驹过隙 在记忆里一道深情的伤痕 曾多少个日夜 回想着那段青葱岁月 我怀念曾经的辉煌 那些笑声欢歌的流光 但今我愿独自承受 往昔已无所谓,只剩回忆 在雨中我漫步无目的 一滴滴落下的是我的期许 渴望那些简单的快乐 如同童年般的阳光 而那些岁月已不再 我依然在这里等候 我怀念曾经的辉煌 那些笑声欢歌的流光 但今我愿独自承受 往昔已无所谓,只剩回忆 那段时间就像梦 随风而逝,再难以追寻 我怀念曾经的辉煌 那些笑声欢歌的流光 但今我愿独自承受 往昔已无所谓,只剩回忆 我依然在这里 用心守候那段故事 随着时光的流转 我最珍贵的财富 就是你在我心中 那段年少的梦 我再不想回首 只想向前走去 在未知的道路上 继续寻找我的明天 我怀念曾经的辉煌 那些笑声欢歌的流光 心中有片蓝天 继续在路上 无畏无惧只能前行 因为我曾追过梦 现在我留下的 是对明天的期待 我还会继续走下去 直到风雨变得平淡 直到天边的阳光 照耀我再次起航 永不停息的追梦之路 或许终会找到属于我的乐土 但我无怨无悔 只愿望天长地久 一路上与你常在共舞"}
+{"id": 45, "file_origin": "final_zh_test_multi.jsonl", "file_index": 45, "lyrics": "在星空下,心事逐渐清晰 往日的声音,像风一样轻盈 你曾许下的诺言,萦绕在我耳边 仿佛就在昨日,依依不舍的瞬间 我追寻着,那些微弱的光芒 渴望触碰,心底的温暖希望 在心中的旋律,跟随岁月的流转 爱恨交织中,寻找一个答案 无论时间多漫长,我都会陪伴你身旁 愿这份情感,永不消亡 回忆如潮水,弥漫在每个夜晚 闭上眼睛,听见心底的呼唤 你那微笑的影子,依尽在我的梦里 仿佛又听到,久违的旋律 我追寻着,永恒的承诺 渴望拥抱,你那熟悉的温柔 在心中的旋律,跟随岁月的流转 爱恨交织中,寻找一个答案 无论时间多漫长,我都会陪伴你身旁 愿这份情感,永不消亡 星空下的梦想,是我们共同的信仰 在时光流逝中,依旧闪耀着光辉 在心中的旋律,跟随岁月的流转 爱恨交织中,寻找一个答案 无论时间多漫长,我都会陪伴你身旁 愿这份情感,永不消亡 愿这份情感,永不消亡 在心中,梦里徘徊 与你诉说,终不会停歇 陪你到天边,直到星辰隐去 我的心,永远为你歌唱 在无尽的旅途,与你同在 牵手走过,风雨与阳光 无论多遥远,爱则长存 在时间的彼岸,我们的情未央 在心中,梦里徘徊 与你诉说,终不会停歇 陪你到天边,直到星辰隐去 我的心,永远为你歌唱 愿这份情感,永不消亡 愿这份情感,永不消亡 在心中,梦里徘徊 岁月如歌,回忆似烟 我们的爱,常在心间"}
+{"id": 46, "file_origin": "final_zh_test_multi.jsonl", "file_index": 46, "lyrics": "在闪烁的霓虹下 我们奔跑,心跳着 时光似水流过 每个瞬间都值得珍藏 在这城市的喧嚣里 你我相视,微笑依然 梦的轨迹映在夜空 如同星星般闪烁 握紧你的手 让时光停留 一路向前,无所畏惧 这是属于我们的旅程 别害怕,未来在召唤 勇敢去追逐,梦的远方 让青春燃烧,如光芒璀璨 在每个晨曦中迎接希望 回忆那年夏天的风 轻轻拂过脸庞,如此温暖 一起许下的愿望 会不会在某天盛放 放下所有的忧伤 让爱的力量带我们飞翔 心与心的碰撞 化作乐章,久久回荡 握紧你的手 让时光停留 一路向前,无所畏惧 这是属于我们的旅程 别害怕,未来在召唤 勇敢去追逐,梦的远方 让青春燃烧,如光芒璀璨 在每个晨曦中迎接希望 即使前路再遥远 有你在身旁,心不再孤单 每一次的呼喊 都在勇敢的心中回响 别害怕,未来在召唤 勇敢去追逐,梦的远方 让青春燃烧,如光芒璀璨 在每个晨曦中迎接希望 在你我心中奔流不息 每一个梦都不会消失 让我们再一次启程 在这无尽的旅途上 谱写属于我们的乐章 在夜空中闪耀着光芒 直到最后这一刻降临 那份爱会永远铭刻 在每一个闪烁的瞬间 成为不灭的信仰 在那片无垠的大海中 你我同行,心永不分离 继续书写未来的篇章 每一瞬间的珍贵 在爱的时光里重生 直到最后仍旧骄傲 在每道光辉中继续追寻 在每个悠长的梦里实现 让岁月见证我们的坚韧 在这旅途中感受存在 带着爱的勇气继续前行 直到未来最美的时光 和你在一起,永不言弃"}
+{"id": 47, "file_origin": "final_zh_test_multi.jsonl", "file_index": 47, "lyrics": "在这喧闹的城市里 每个人都有自己的梦 被潮流裹挟,随波逐流 多少人忘了初心 无止境的追逐,空虚的灵魂 在繁华背后,谁在沉沦 我在这个舞台上 用心去传达我的声音 不怕风雨的侵扰 只愿找到真实的自己 所以站起来,别再沉默 你的梦想值得被聆听 在这片土地上,发生奇迹 每一步,都是新的起点 背负着希望,飞翔向前 不仅仅是为了自己 为那些被遗忘的声音 我会勇敢,迎接挑战 不再后退,坚定信念 让每一个瞬间,都成为回忆 我在这个舞台上 用心去传达我的声音 不怕风雨的侵扰 只愿找到真实的自己 所以站起来,别再沉默 你的梦想值得被聆听 在这片土地上,发生奇迹 每一步,都是新的起点 回首这一路的坎坷 坚韧的心永不言弃 每一滴汗水都在闪光 那是我的骄傲与信仰 所以站起来,别再沉默 你的梦想值得被聆听 在这片土地上,发生奇迹 每一步,都是新的起点 让我们一起向前走 带着勇气,带着信仰 前新路上迎接阳光 未来会充满希望 因为我们要一起实现 带着彼此心中的梦想 在这个璀璨的舞台上 成为能照耀的星辰 让每个角落都为之沸腾 在这时代的洪流中 铸就我们不屈的灵魂 从未懦弱,我们在飞翔 将心中梦想播撒开 在每个街头,书写传奇 我的名字,将在心中永存 直到最后这一刻 为梦想而战,毫无惧怕 因为心中激情永不磨灭 迎接未来的挑战 让这每分每秒,都值得 让生命燃烧,至死不渝 这条路,因你而精彩 梦想的光辉,永远闪耀 在每个心中,代代相传 让我们在这条路上同心 书写属于我们的辉煌 在这样的世界,打破桎梏 做那永不言败的歌者 续写咆哮的篇章,直到最后 让这份热爱,永不消逝 无畏无惧,与梦同在 携手并肩,摸索方向 在这星空之下,永远追寻 直到那时,属于我们的辉煌"}
+{"id": 48, "file_origin": "final_zh_test_multi.jsonl", "file_index": 48, "lyrics": "记忆如潮水悄然涌来 光阴在指间滑落成沙 在那年夏天,我们的约定 隐藏在那片枫树林下 阳光透过树叶的缝隙 投下斑驳的光影,熠熠生辉 你的笑声,依然在耳畔 轻轻唤醒,沉睡的心灵 我们曾游历,朦胧的岁月 手握着手,感受生命 每一个瞬间,如此心动 在那片星空下,永不停息 秋风卷起,落叶纷飞 你的轮廓,愈加清晰 我在每个黄昏,默默凝视 你的背影,恍若一幅画 每当星空,映照着我们 心灵的对话,宛如呢喃 岁月静好,似水流年 你我的脸庞,写满了幸福 我们曾游历,朦胧的岁月 手握着手,感受生命 每一个瞬间,如此心动 在那片星空下,永不停息 而今灯火阑珊,夜色渐浓 我愿在时光隧道,追寻你的足迹 即便流年暗淡,也绝不惧怕 因为有你,便有了光明 我们曾游历,朦胧的岁月 手握着手,感受生命 每一个瞬间,如此心动 在那片星空下,永不停息 而今我将继续,带着你的笑 走在这条,未知的旅程 每一步,都有你的影子 把我温暖,伴我同行 岁月如歌,轻轻低吟 愿我们的梦,永远不散 在那炊烟袅袅中 寻找到爱的彼岸 不再孤单,歌声悠扬 愿与你,携手到老 在那片星空下 回眸轻笑,岁月如歌 在此时此刻,我愿与你 共享未来,心相依 直到时间的尽头"}
+{"id": 49, "file_origin": "final_zh_test_multi.jsonl", "file_index": 49, "lyrics": "在这个城市的角落里 我看见你微笑的样子 阳光洒在你的肩头 像是冬日的暖意 你说生活像一场旅行 每一个瞬间都值得珍惜 让我们跳舞在星空下 所有烦恼抛开吗 只想与你分享这一刻 心跳都是为了你 每次回头都想起过去 但未来更让人期待 握住你的手就感觉到了 生活的美好与梦 你说爱情像那朵花 需要阳光雨露去呵护 让我们跳舞在星空下 所有烦恼抛开吗 只想与你分享这一刻 心跳都是为了你 不再犹豫也不再怯懦 让梦想燃烧在夜空 勇敢追逐心中的光 你就是我此生的鼓舞 让我们跳舞在星空下 所有烦恼抛开吗 只想与你分享这一刻 心跳所有都是为了你 心跳都是为了你 心跳都是为了你 让我永远陪伴在你身边 心跳都是为了你 心跳都是为了你 让我永远陪伴在你身边 心跳都是为了你 你就是我唯一的梦 我愿永远守护着你 只想与你分享这一刻 让我们一起再次拥抱 在这个幸福的星空下 只想把你放在心底 心跳在每一瞬间继续 你就是我所有的意义 让这夜晚永不停止"}
diff --git a/eval_pipeline/mulan_seg/chunk_mulanT.sh b/eval_pipeline/mulan_seg/chunk_mulanT.sh
new file mode 100644
index 0000000000000000000000000000000000000000..779fee1f71a8c3cd34933a3e4cd7295b017c2b2e
--- /dev/null
+++ b/eval_pipeline/mulan_seg/chunk_mulanT.sh
@@ -0,0 +1,129 @@
+#!/usr/bin/env bash
+# source /root/miniconda3/etc/profile.d/conda.sh
+# conda activate mucodec
+
+# =====================================================
+# Task Configuration
+# Format: GPU_ID:INPUT_DIR
+# Note: If parallel processing is enabled, the GPU_ID here determines which GPU the task runs on.
+# Different tasks with different GPU_IDs can run in parallel.
+# =====================================================
+TASKS=(
+ "0:xxx/output_main_0.6b_50pct_5e-4_1.3"
+)
+
+# Allow command line arguments to override default tasks
+if [ $# -gt 0 ]; then
+ TASKS=("$@")
+fi
+
+# Base output directories
+PREPARE_BASE_DIR="./processed_metadata"
+EVAL_RESULTS_DIR="./eval_results"
+LOG_DIR="./logs_pipeline"
+
+mkdir -p "${PREPARE_BASE_DIR}"
+mkdir -p "${EVAL_RESULTS_DIR}"
+mkdir -p "${LOG_DIR}"
+
+# Define single task processing function
+process_task() {
+ local task_str=$1
+ IFS=':' read -r GPU_ID INPUT_DIR <<< "$task_str"
+ local TASK_NAME=$(basename "${INPUT_DIR}")
+ local LOG_FILE="${LOG_DIR}/task_${TASK_NAME}_gpu${GPU_ID}.log"
+
+ echo "[GPU ${GPU_ID}] Started task: ${TASK_NAME} (Log: ${LOG_FILE})"
+
+ (
+ echo "=========================================================="
+ echo "Processing Task:"
+ echo " GPU: ${GPU_ID}"
+ echo " DIR: ${INPUT_DIR}"
+ echo "=========================================================="
+
+ if [ ! -d "${INPUT_DIR}" ]; then
+ echo "[ERROR] Directory not found: ${INPUT_DIR}"
+ exit 1
+ fi
+
+ # 1. Generate Metadata
+ META_OUT_DIR="${PREPARE_BASE_DIR}/${TASK_NAME}"
+ echo "[Step 1] Preparing Metadata..."
+ python3 prepare_chunks.py \
+ --input_dir "${INPUT_DIR}" \
+ --output_dir "${META_OUT_DIR}"
+
+ if [ $? -ne 0 ]; then
+ echo "[ERROR] Step 1 failed."
+ exit 1
+ fi
+
+ # 2. Split audio
+ echo "[Step 2] Splitting Full Audio..."
+ python3 split_audio_by_tokens.py "${task_str}"
+
+ if [ $? -ne 0 ]; then
+ echo "[ERROR] Step 2 failed."
+ exit 1
+ fi
+
+ # 3. Evaluation
+ echo "[Step 3] Evaluating..."
+ META_FILES=$(find "${META_OUT_DIR}" -name "meta_*.jsonl")
+
+ if [ -z "${META_FILES}" ]; then
+ echo "[WARN] No valid metadata files found."
+ exit 0
+ fi
+
+ for META_FILE in ${META_FILES}; do
+ META_FILENAME=$(basename "${META_FILE}")
+ RESULT_FILE="${EVAL_RESULTS_DIR}/result_${TASK_NAME}_${META_FILENAME}"
+
+ echo " Evaluating Metadata: ${META_FILENAME}"
+
+ # Use CUDA_VISIBLE_DEVICES to ensure PyTorch only sees the assigned GPU
+ # This way, the --gpu parameter inside the script can be set to 0 (as it's a relative index), or directly pass the physical ID
+ # For simplicity, we directly pass the physical ID to the --gpu parameter without setting CUDA_VISIBLE_DEVICES environment variable,
+ # unless your code only recognizes gpu:0.
+ # If the code uses device = f"cuda:{args.gpu}", then directly pass the physical ID.
+
+ python3 eval_split_chunks.py \
+ --metadata "${META_FILE}" \
+ --original_jsonl_dir "${INPUT_DIR}" \
+ --task_root_dir "${INPUT_DIR}" \
+ --output "${RESULT_FILE}" \
+ --gpu "${GPU_ID}"
+
+ if [ $? -ne 0 ]; then
+ echo "[ERROR] Evaluation failed for ${META_FILENAME}"
+ fi
+ done
+
+ echo "[SUCCESS] Task ${TASK_NAME} Finished."
+ ) > "${LOG_FILE}" 2>&1
+}
+
+# =====================================================
+# Main loop - parallel launch
+# =====================================================
+echo "Starting parallel processing..."
+echo "Logs are being saved to: ${LOG_DIR}"
+
+PIDS=()
+
+for task in "${TASKS[@]}"; do
+ process_task "$task" &
+ PID=$!
+ PIDS+=($PID)
+ echo "Launched PID $PID for task: $task"
+done
+
+# Wait for all tasks to complete
+echo "Waiting for all tasks to complete..."
+for pid in "${PIDS[@]}"; do
+ wait $pid
+done
+
+echo "All parallel tasks finished."
diff --git a/eval_pipeline/mulan_seg/eval_split_chunks.py b/eval_pipeline/mulan_seg/eval_split_chunks.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae45a9f3abe5233471684c852d44c7a4017dc633
--- /dev/null
+++ b/eval_pipeline/mulan_seg/eval_split_chunks.py
@@ -0,0 +1,252 @@
+#!/usr/bin/env python3
+"""
+Evaluate Mulan-T score for SPLIT audio chunks (sliced from full audio).
+
+This script bridges the gap between:
+1. The metadata generated by `prepare_chunks.py` (which might FILTER OUT some chunks).
+2. The split audio files generated by `split_audio_by_tokens.py` (which includes ALL chunks to maintain sync).
+
+It matches them by re-scanning the original JSONL to find the 'global index' of each chunk,
+so we can pick the correct file from the `chunks/` folder.
+"""
+
+import argparse
+import json
+import os
+import sys
+import re
+from tqdm import tqdm
+import torch
+import numpy as np
+import librosa
+
+try:
+ from muq import MuQMuLan
+except ImportError:
+ # Add project root to path if needed
+ # sys.path.append("xxx/Muse/")
+ # from muq import MuQMuLan
+ raise ImportError("Please install MuQ or add it to Python path")
+
+def extract_audio_codes(text: str):
+ """Extract -> [int, ...]"""
+ return [int(x) for x in re.findall(r"", text)]
+
+def extract_dsec(text: str):
+ # Support both dsec and desc
+ match = re.search(r"\[(?:dsec|desc):(.*?)]", text, re.DOTALL)
+ return match.group(1).strip() if match else None
+
+def get_chunk_mapping(original_jsonl_path):
+ """
+ Simulates the logic of BOTH scripts to build a map:
+ metadata_index -> (song_idx, sub_chunk_idx)
+
+ Returns:
+ list of dicts: [{'song_idx': 0, 'sub_idx': 2}, ...]
+ The i-th element corresponds to the i-th line in the FLAT jsonl/metadata file.
+ """
+ mapping = []
+
+ with open(original_jsonl_path, 'r', encoding='utf-8') as f:
+ for song_idx, line in enumerate(f):
+ if not line.strip(): continue
+ try:
+ data = json.loads(line)
+ except:
+ continue
+
+ messages = data.get("messages", [])
+
+ # Logic to track "Global Sub Index" (what split_audio_by_tokens.py uses)
+ global_sub_idx = 0
+
+ # State for prepare_chunks logic
+ current_prompt = None
+
+ for msg in messages:
+ role = msg.get("role")
+ content = msg.get("content", "")
+
+ if role == "user":
+ dsec = extract_dsec(content)
+ if dsec:
+ current_prompt = dsec
+
+ elif role == "assistant":
+ # Check if this turn actually has audio
+ # (split_audio_by_tokens extracts tokens directly)
+ tokens = extract_audio_codes(content)
+ if not tokens:
+ continue
+
+ # This is a valid audio chunk in the Full Audio.
+ # So split_audio_by_tokens.py will save it as {song_idx}_{global_sub_idx}.wav
+
+ # Now checks if prepare_chunks.py would include it
+ if current_prompt:
+ # Yes, this chunk is included in the evaluation set.
+ mapping.append({
+ "song_idx": song_idx,
+ "sub_idx": global_sub_idx,
+ "prompt": current_prompt
+ })
+
+ # Reset prompt as it's consumed
+ current_prompt = None
+
+ # Increment the global counter because this chunk exists in the audio file
+ global_sub_idx += 1
+
+ return mapping
+
+def main():
+ parser = argparse.ArgumentParser()
+ # meta_xxx.jsonl file generated by prepare_chunks.py
+ parser.add_argument("--metadata", required=True, help="Path to meta_xxx.jsonl")
+
+ # Directory containing the original JSONLs (needed to rebuild mapping)
+ # usually .../testdata/output_xxx
+ parser.add_argument("--original_jsonl_dir", required=True, help="Directory containing original task .jsonl files")
+
+ # Directory containing the split chunks (the 'chunks' folder inside the task dir)
+ # But wait, tasks have subdirs.
+ # Let's ask for the ROOT of the task directory (where split_audio_by_tokens.py ran)
+ # e.g. .../output_5e-4_1.3_main
+ parser.add_argument("--task_root_dir", required=True, help="Root directory of the task (containing subdirs with wavs)")
+
+ parser.add_argument("--output", required=True, help="Path to save output results")
+ parser.add_argument("--model", default="MuQ-MuLan-large", help="MuQ-MuLan model name or path")
+ parser.add_argument("--gpu", type=int, default=0)
+ args = parser.parse_args()
+
+ device = f"cuda:{args.gpu}" if torch.cuda.is_available() else "cpu"
+ print(f"[INFO] Using device: {device}")
+
+ print(f"[INFO] Loading Mulan model from {args.model}...")
+ model = MuQMuLan.from_pretrained(args.model).to(device).eval()
+
+ # 1. Load Metadata
+ print(f"[INFO] Loading metadata from {args.metadata}...")
+ meta_items = []
+ with open(args.metadata, "r", encoding="utf-8") as f:
+ for line in f:
+ if line.strip():
+ meta_items.append(json.loads(line))
+
+ # 2. Rebuild Mapping
+ # The metadata items contain "original_jsonl": "filename.jsonl"
+ # We can group them by file and process.
+
+ # Group items by original jsonl file to avoid re-parsing the file many times
+ items_by_file = {}
+ for i, item in enumerate(meta_items):
+ fname = item["original_jsonl"]
+ if fname not in items_by_file:
+ items_by_file[fname] = []
+ items_by_file[fname].append((i, item)) # Store index to put back in order later
+
+ # Results list
+ results = [None] * len(meta_items)
+ all_scores = []
+
+ print("[INFO] Processing files and evaluating...")
+
+ for jsonl_name, grouped_items in items_by_file.items():
+ # Path to original jsonl
+ original_path = os.path.join(args.original_jsonl_dir, jsonl_name)
+ if not os.path.exists(original_path):
+ print(f"[WARN] Original jsonl not found: {original_path}. Skipping these items.")
+ continue
+
+ # Get the mapping for this file
+ # Returns list of {song_idx, sub_idx, prompt} corresponding to the SEQUENTIAL chunks that SHOULD exist in metadata
+ # Wait, the mapping logic in 'get_chunk_mapping' simulates the linear scan of prepare_chunks.
+ # So the K-th item in 'mapping' should correspond to the K-th item in 'grouped_items' (sorted by flat_index).
+
+ # Sort grouped items by their flat_index to ensure alignment
+ grouped_items.sort(key=lambda x: x[1]["flat_index"])
+
+ # Generate the ground truth mapping from file structure
+ file_mapping = get_chunk_mapping(original_path)
+
+ if len(file_mapping) != len(grouped_items):
+ print(f"[ERROR] Mismatch in chunk count for {jsonl_name}!")
+ print(f" Metadata expects: {len(grouped_items)}")
+ print(f" Re-scan found: {len(file_mapping)}")
+ print(" This suggests logic mismatch. Skipping file to avoid bad data.")
+ continue
+
+ # Subdirectory for chunks
+ subdir_name = os.path.splitext(jsonl_name)[0]
+ chunks_dir = os.path.join(args.task_root_dir, subdir_name, "chunks")
+
+ for (original_idx, meta_item), map_item in zip(grouped_items, file_mapping):
+ # Double check prompt alignment
+ if meta_item["text_prompt"] != map_item["prompt"]:
+ # This is a strong sanity check
+ # Note: exact string match might fail on whitespace, but usually reliable.
+ pass
+
+ song_idx = map_item["song_idx"]
+ sub_idx = map_item["sub_idx"]
+
+ # Construct filename: {song_idx:06d}_{sub_idx:04d}.wav
+ wav_name = f"{song_idx:06d}_{sub_idx:04d}.wav"
+ wav_path = os.path.join(chunks_dir, wav_name)
+
+ if not os.path.exists(wav_path):
+ print(f"[WARN] Audio chunk missing: {wav_path}")
+ continue
+
+ # Evaluate
+ try:
+ text = meta_item["text_prompt"]
+
+ # Check if we need to load audio first
+ # From usage in other files: model(wavs=wav_tensor) or model(texts=[text])
+
+ wav, _ = librosa.load(wav_path, sr=24000) # MuLan often uses 24k
+ wav_tensor = torch.tensor(wav).unsqueeze(0).to(device)
+
+ with torch.no_grad():
+ # Calculate embeddings separately
+ audio_emb = model(wavs=wav_tensor)
+ text_emb = model(texts=[text])
+ # Calculate similarity
+ sim = model.calc_similarity(audio_emb, text_emb).item()
+
+ score = sim
+
+ all_scores.append(score)
+
+ result_entry = {
+ "original_jsonl": jsonl_name,
+ "song_idx": song_idx,
+ "sub_idx": sub_idx,
+ "text_prompt": text,
+ "audio_path": wav_path,
+ "score": score
+ }
+ results[original_idx] = result_entry
+
+ except Exception as e:
+ print(f"[ERROR] Failed to eval {wav_path}: {e}")
+
+ # Save results
+ final_results = [r for r in results if r is not None]
+
+ with open(args.output, "w", encoding="utf-8") as f:
+ for r in final_results:
+ f.write(json.dumps(r, ensure_ascii=False) + "\n")
+
+ if all_scores:
+ avg_score = sum(all_scores) / len(all_scores)
+ print(f"\n[SUMMARY] Evaluated {len(all_scores)} chunks.")
+ print(f"[SUMMARY] Average Mulan-T Score: {avg_score:.4f}")
+ else:
+ print("[WARN] No scores computed.")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/eval_pipeline/mulan_seg/prepare_chunks.py b/eval_pipeline/mulan_seg/prepare_chunks.py
new file mode 100644
index 0000000000000000000000000000000000000000..ec9828399b626f596f9e8572a93553bb4dec40e2
--- /dev/null
+++ b/eval_pipeline/mulan_seg/prepare_chunks.py
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Prepare chunks for decode.py
+1. Reads raw conversation jsonl
+2. Extracts each section's Audio Tokens and dsec Prompt
+3. Writes a flat jsonl file where each line is just "..." (for decode.py to consume)
+4. Saves a metadata map so we know which wav corresponds to which section/prompt.
+"""
+
+import os
+import re
+import json
+import argparse
+from tqdm import tqdm
+
+def extract_audio_codes_str(text: str):
+ """Find all and return them as a single string string"""
+ # decode.py needs the string to contain tags
+ # It parses them using re.findall(r"", text)
+ # So we just need to keep the audio tags.
+ codes = re.findall(r"", text)
+ return "".join(codes)
+
+def extract_dsec(text: str):
+ # Match [dsec:...] OR [desc:...]
+ # Using re.DOTALL to allow newlines in the description
+ match = re.search(r"\[(?:dsec|desc):(.*?)(?:\]|$)", text, re.DOTALL)
+
+ # If using regex like this, we need to be careful not to capture too much if there are multiple brackets.
+ # The original regex was r"\[dsec:(.*?)\]" which is non-greedy.
+ # Let's support both variants.
+
+ match = re.search(r"\[(dsec|desc):(.*?)]", text, re.DOTALL)
+ return match.group(2).strip() if match else None
+
+def extract_section_name(text: str):
+ # Match [Section][dsec:...] or [Section][desc:...]
+ match = re.search(r"\[(.*?)\]\[(?:dsec|desc):", text)
+ if match:
+ return match.group(1).strip()
+ match = re.match(r"^\[(.*?)\]", text)
+ if match:
+ name = match.group(1).strip()
+ if name not in ["dsec", "desc"]: return name
+ return "Unknown"
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--input_dir", required=True, help="Dir containing raw .jsonl files")
+ parser.add_argument("--output_dir", required=True, help="Where to save processed jsonl for decoding")
+ args = parser.parse_args()
+
+ os.makedirs(args.output_dir, exist_ok=True)
+
+ # Filter out files starting with "mulan_" to avoid processing result files
+ jsonl_files = [f for f in os.listdir(args.input_dir) if f.endswith(".jsonl") and not f.startswith("mulan_")]
+
+ for jsonl_file in jsonl_files:
+ input_path = os.path.join(args.input_dir, jsonl_file)
+ # Create a subdir in output_dir for this jsonl task
+ # decode.py reads a directory of jsonl files.
+ # So we might want to put the generated jsonl file directly in args.output_dir
+ # But wait, decode.py does: for jsonl in input_dir...
+
+ print(f"Processing {input_path}...")
+
+ flat_lines = []
+ metadata = []
+
+ with open(input_path, 'r', encoding='utf-8') as f:
+ for song_idx, line in enumerate(f):
+ if not line.strip(): continue
+ try:
+ data = json.loads(line)
+ except:
+ continue
+
+ messages = data.get("messages", [])
+ current_prompt = None
+ current_section = "Start"
+
+ for msg in messages:
+ role = msg.get("role")
+ content = msg.get("content", "")
+
+ if role == "user":
+ dsec = extract_dsec(content)
+ if dsec:
+ current_prompt = dsec
+ current_section = extract_section_name(content)
+
+ elif role == "assistant":
+ audio_str = extract_audio_codes_str(content)
+ if not audio_str:
+ continue
+
+ if not current_prompt:
+ continue
+
+ # This line will be fed to decode.py
+ # It just needs to contain the audio tokens.
+ flat_lines.append(audio_str)
+
+ # Metadata to track what this line is
+ metadata.append({
+ "song_idx": song_idx,
+ "section_name": current_section,
+ "text_prompt": current_prompt,
+ "flat_index": len(flat_lines) - 1, # The line number (0-based) in the new jsonl
+ "original_jsonl": jsonl_file
+ })
+
+ # Reset for next section
+ current_prompt = None
+ current_section = "Unknown"
+
+ # Write the "flat" jsonl file
+ # Name it specifically so we can identify it later
+ flat_jsonl_name = f"flat_{jsonl_file}"
+ flat_jsonl_path = os.path.join(args.output_dir, flat_jsonl_name)
+
+ with open(flat_jsonl_path, 'w', encoding='utf-8') as f_out:
+ for l in flat_lines:
+ f_out.write(l + "\n")
+
+ # Write metadata mapping
+ meta_name = f"meta_{jsonl_file}"
+ meta_path = os.path.join(args.output_dir, meta_name)
+ with open(meta_path, 'w', encoding='utf-8') as f_meta:
+ for m in metadata:
+ f_meta.write(json.dumps(m, ensure_ascii=False) + "\n")
+
+ print(f" -> Generated {len(flat_lines)} chunks. Saved to {flat_jsonl_path}")
+ print(f" -> Metadata saved to {meta_path}")
+
+if __name__ == "__main__":
+ main()
+
diff --git a/eval_pipeline/mulan_seg/split_audio_by_tokens.py b/eval_pipeline/mulan_seg/split_audio_by_tokens.py
new file mode 100644
index 0000000000000000000000000000000000000000..578283c0912f6679d20fd0e9644900dde3e0193d
--- /dev/null
+++ b/eval_pipeline/mulan_seg/split_audio_by_tokens.py
@@ -0,0 +1,329 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+"""
+Split full audio files into chunks based on token counts from jsonl files.
+Configuration is similar to run_chunk_decode_v2_2.sh tasks.
+
+Usage:
+ python split_audio_by_tokens.py "TASK_STRING" "TASK_STRING" ...
+
+Task String Format:
+ GPU_ID:JSONL_DIR
+ (GPU_ID is ignored here but kept for compatibility with the config format)
+
+Logic:
+ 1. Read the jsonl file to get token counts for each segment.
+ 2. Calculate start and end time for each segment (25 tokens = 1 second).
+ 3. Load the corresponding FULL audio file.
+ - It assumes the full audio is located at: JSONL_DIR/subdir_name/000000.wav
+ - Or adjacent to the jsonl file if structure is different.
+ - Based on user input: /.../output_xxx/generate_multi.../000000.wav
+ 4. Slice the audio and save as 000000.wav, 000001.wav, etc. in a new 'chunks' subdirectory.
+"""
+
+import os
+import sys
+import re
+import json
+import argparse
+import torchaudio
+import torch
+from tqdm import tqdm
+
+SAMPLE_RATE = 48000
+TOKEN_RATE = 25.0
+
+def extract_audio_codes(text: str):
+ """Extract -> [int, ...]"""
+ return [int(x) for x in re.findall(r"", text)]
+
+def process_task(task_str):
+ parts = task_str.split(":", 1)
+ if len(parts) != 2:
+ print(f"[WARN] Invalid task format: {task_str}")
+ return
+
+ # input_dir contains the jsonl files
+ input_dir = parts[1]
+
+ if not os.path.isdir(input_dir):
+ print(f"[WARN] Directory not found: {input_dir}")
+ return
+
+ print(f"\n[INFO] Processing directory: {input_dir}")
+
+ # Find all jsonl files
+ jsonl_files = sorted([f for f in os.listdir(input_dir) if f.endswith(".jsonl")])
+
+ if not jsonl_files:
+ print(f"[WARN] No jsonl files found in {input_dir}")
+ return
+
+ for jsonl_name in jsonl_files:
+ jsonl_path = os.path.join(input_dir, jsonl_name)
+ # Assuming folder structure:
+ # input_dir/
+ # task_name.jsonl
+ # task_name/ (contains 000000.wav - the full audio)
+
+ subdir_name = os.path.splitext(jsonl_name)[0]
+ full_audio_dir = os.path.join(input_dir, subdir_name)
+
+ # Where to save the chunks?
+ # Let's create a 'chunks' folder inside the full_audio_dir
+ output_dir = os.path.join(full_audio_dir, "chunks")
+ os.makedirs(output_dir, exist_ok=True)
+
+ print(f" -> Processing {jsonl_name}")
+ print(f" Full Audio Dir: {full_audio_dir}")
+ print(f" Output Dir: {output_dir}")
+
+ with open(jsonl_path, 'r', encoding='utf-8') as f:
+ lines = f.readlines()
+
+ # Filter empty lines
+ lines = [line.strip() for line in lines if line.strip()]
+
+ for idx, line in enumerate(tqdm(lines, desc="Splitting")):
+ # Construct expected full audio filename for this line
+ # Assuming standard naming: {idx:06d}.wav
+ full_audio_filename = f"{idx:06d}.wav"
+ full_audio_path = os.path.join(full_audio_dir, full_audio_filename)
+
+ if not os.path.isfile(full_audio_path):
+ # Optionally warn, but might span stdout too much
+ # print(f"[SKIP] Audio not found: {full_audio_path}")
+ continue
+
+ # Load full audio for this specific line/song
+ try:
+ waveform, sr = torchaudio.load(full_audio_path)
+ except Exception as e:
+ print(f"[ERROR] Failed to load audio {full_audio_path}: {e}")
+ continue
+
+ if sr != SAMPLE_RATE:
+ # Resample if necessary
+ resampler = torchaudio.transforms.Resample(sr, SAMPLE_RATE)
+ waveform = resampler(waveform)
+ sr = SAMPLE_RATE
+
+ # Extract tokens to calculate duration
+ # ... (rest of processing logic)
+
+ # If it's a JSON line (raw format), we need to extract assistant content
+ if line.startswith("{"):
+ try:
+ data = json.loads(line)
+ # Extract text from assistant messages
+ text_content = ""
+ for msg in data.get("messages", []):
+ if msg.get("role") == "assistant":
+ text_content += msg.get("content", "")
+
+ # If multiple assistant messages, this might be tricky.
+ # But usually it's conversation turns.
+ # Wait, if we are splitting a FULL audio generated from a FULL jsonl,
+ # we need to know which part of the audio corresponds to which line.
+ # Usually 'generate.py' generates one audio per LINE in the input jsonl.
+ # If the input jsonl has multiple lines, generate.py outputs 000000.wav, 000001.wav...
+
+ # CASE A: The "full audio" 000000.wav corresponds to ONE line in the jsonl.
+ # But that line contains multiple chunks (multi-turn or long generation).
+ # If so, we split that single wav file into sub-segments based on tokens IN THAT LINE.
+
+ # CASE B: The jsonl has multiple lines, and we have multiple wavs.
+ # But the user query says "audio decoded after sounds different".
+ # The user wants to split the "Full Song" into "Chunks".
+ # Usually a song is ONE line in the inference jsonl.
+
+ tokens = extract_audio_codes(text_content)
+
+ except json.JSONDecodeError:
+ # Maybe it's not json, try direct extraction
+ tokens = extract_audio_codes(line)
+ else:
+ tokens = extract_audio_codes(line)
+
+ if not tokens:
+ print(f"[WARN] No tokens found in line {idx}")
+ continue
+
+ # If we are splitting ONE file (000000.wav) based on sub-parts,
+ # we need to know if the jsonl represents the structure of that one file.
+ # Assuming the standard use case:
+ # The jsonl has 1 line (the song). That line has multiple