File size: 3,893 Bytes
67f1860
22e0c4a
 
 
 
 
2a9cfa7
22e0c4a
 
 
67f1860
aa9be1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a9cfa7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
title: Muse Space
emoji: 🎡
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
---

# Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

<p align="center">
  πŸ“„ <a href="https://arxiv.org/abs/2601.03973">Paper</a> β€’ πŸ“Š <a href="https://huggingface.co/datasets/bolshyC/Muse">Dataset</a> β€’ πŸ€– <a href="https://huggingface.co/bolshyC/models">Model</a> β€’ πŸ“š <a href="#citation">Citation</a>
</p>

This repository is the official repository for "Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control". In this repository, we provide the Muse model, training and inference scripts, pretrained checkpoints, and evaluation pipelines.

## News and Updates

* **2026.01.11 πŸ”₯**: We are excited to announce that all datasets and models are now fully open-sourced! 🎢 The complete training dataset (116k songs), pretrained model weights, training and evaluation code, and data pipeline are publicly available.

## Installation

**Requirements**: Python 3.10 is required.

To set up the environment for Muse:

- **For training**: Install the training framework:
  ```bash
  pip install ms-swift -U
  ```
- **For inference**: Install vLLM:
  ```bash
  pip install vllm
  ```
- **For audio encoding/decoding**: Some dependencies (e.g., `av`) require system-level packages. On Ubuntu/Debian, install FFmpeg 4.4+ first:
  ```bash
  sudo apt-get update
  sudo apt-get install -y software-properties-common
  sudo add-apt-repository ppa:savoury1/ffmpeg4 -y
  sudo apt-get update
  sudo apt-get install -y pkg-config ffmpeg libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-dev
  ```
  We recommend creating a new conda environment with Python 3.10. **Note**: Since `omegaconf==2.0.6` is required and has compatibility issues with pip 24.1+, you need to downgrade pip first:
  ```bash
  pip install "pip<24.1"
  ```
  Then install dependencies:
  ```bash
  pip install --default-timeout=1000 -r requirements_mucodec.txt
  ```
  For more details, please refer to the [MuCodec](https://github.com/tencent-ailab/MuCodec) official repository.

- **For data pipeline and evaluation**: If you need to run data processing scripts (lyrics generation, metadata processing) or evaluation scripts, install additional dependencies:
  ```bash
  pip install -r requirements_data_eval.txt
  ```

## Repository Structure

This repository contains the following main directories:

- **`train/`**: Training scripts and utilities for fine-tuning the Muse model. See [`train/README.md`](train/README.md) for details.
- **`infer/`**: Inference scripts for generating music with the Muse model. See [`infer/README.md`](infer/README.md) for details.
- **`eval_pipeline/`**: Evaluation scripts for assessing model performance (Mulan-T, PER, AudioBox, SongEval, etc.).
- **`data_pipeline/`**: Scripts for building and processing training data, including lyrics generation, metadata processing, and music generation utilities.

## Model Architecture

<p align="center">
  <img src="assets/intro.jpg" width="800"/>
</p>

## Acknowledgments

We thank [Qwen3](https://github.com/QwenLM/Qwen3) for providing the base language model, [ms-swift](https://github.com/modelscope/ms-swift) for the training framework, and [MuCodec](https://github.com/tencent-ailab/MuCodec) for discrete audio tokenization.

## Citation

If you find our work useful, please cite our paper:

```bibtex
@article{jiang2026muse,
  title={Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control},
  author={Jiang, Changhao and Chen, Jiahao and Xiang, Zhenghao and Yang, Zhixiong and Wang, Hanchen and Zhuang, Jiabao and Che, Xinmeng and Sun, Jiajun and Li, Hui and Cao, Yifei and others},
  journal={arXiv preprint arXiv:2601.03973},
  year={2026}
}
```