| --- |
| language: zh |
| license: mit |
| tags: |
| - maimai |
| - music-generation |
| - chart-generation |
| - transformer |
| - audio-conditioned |
| - pytorch |
| pipeline_tag: text-generation |
| --- |
| |
| # MaiGenerator (maiChartGen) |
|
|
| 基于 EnCodec 音频编码 + Transformer 的舞萌 (maimai) 谱面自动生成模型。 |
|
|
| 输入音频 + BPM/难度/等级 → 输出可玩的 maimai 谱面 (maidata.txt)。 |
|
|
| ## 模型简介 |
|
|
| - **架构**: Encoder-Decoder Transformer (4 enc + 8 dec),512 维,8 头 |
| - **Decoder**: 前 6 层共享 FFN + 后 2 层 MoE(4 专家 × 难度路由) |
| - **核心设计**: 时间对齐 RoPE(谱面位置 = BPM 转换的音频帧号) |
| - **推理**: KV-Cache 增量推理,支持 float16 加速 |
| - **输出**: 标准 maidata.txt 格式,可直接导入 Simai/maimai 模拟器 |
|
|
| ## 快速使用 |
|
|
| ### 安装依赖 |
|
|
| ```bash |
| pip install torch torchaudio encodec soundfile tqdm numpy |
| pip install huggingface_hub |
| ``` |
|
|
| ### 下载模型并推理 |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch |
| import subprocess |
| |
| # 下载模型权重 |
| model_path = hf_hub_download(repo_id="Goldgom/maiChartGen", filename="best.pt") |
| |
| # 推理 |
| subprocess.run([ |
| "python", "inference.py", |
| "--checkpoint", model_path, |
| "--audio", "your_track.mp3", |
| "--bpm", "270", "--diff", "MASTER", "--level", "13.5", |
| "--temperature", "1.0", "--top-k", "500", "--top-p", "0.95", |
| "--density-bias", "1.0", "--target-notes", "650", |
| "--config-density-bias", "0.4", "--hold-bias", "0.2", "--slide-bias", "0.8", |
| "--break-penalty", "2.5", "--max-break-ratio", "0.06", |
| "--output", "maidata.txt" |
| ]) |
| ``` |
|
|
| ### 命令行推理 |
|
|
| ```bash |
| # 基础用法 |
| python inference.py --checkpoint best.pt --audio track.mp3 \ |
| --bpm 150 --diff MASTER --level 12.0 --output maidata.txt |
| |
| # 高质量 MASTER 谱面 |
| python inference.py --checkpoint best.pt --audio track.mp3 \ |
| --bpm 270 --diff MASTER --level 13.5 \ |
| --temperature 1.0 --top-k 500 --top-p 0.95 \ |
| --density-bias 1.0 --target-notes 650 \ |
| --config-density-bias 0.4 --hold-bias 0.2 --slide-bias 0.8 \ |
| --break-penalty 2.5 --max-break-ratio 0.06 \ |
| --output maidata.txt |
| |
| # 高性能推理 (float16 + torch.compile) |
| python inference.py --checkpoint best.pt --audio track.mp3 \ |
| --bpm 150 --diff MASTER --level 12.0 \ |
| --precision float16 --compile --output maidata.txt |
| ``` |
|
|
| ### 后端 API 使用 |
|
|
| ```python |
| from Tokenizer.MaiTrackTokenizer import MaiTrackTokenizer |
| from inference import generate_chart |
| import torch |
| |
| # 加载模型 |
| device = torch.device("cuda") |
| ckpt = torch.load("best.pt", map_location=device, weights_only=False) |
| config = ckpt["config"] |
| model = MaiGenerator( |
| d_model=config.get("d_model", 512), |
| enc_layers=config.get("enc_layers", 4), |
| dec_layers=config.get("dec_layers", 8), |
| heads=config.get("heads", 8), |
| d_ff=config.get("d_ff", 2048), |
| ).to(device) |
| model.load_state_dict(ckpt["model_state"], strict=False) |
| model.eval() |
| |
| # 音频 tokenize |
| audio_tok = MaiTrackTokenizer(n_layers=2, device="cuda") |
| audio_tokens = audio_tok.encode("track.mp3") |
| |
| # 生成谱面 |
| chart_tokens = generate_chart( |
| model, torch.tensor([audio_tokens]), |
| bpm=270, difficulty=3, level_value=13.5, |
| temperature=1.0, top_k=500, top_p=0.95, |
| density_bias=1.0, target_notes=650, |
| config_density_bias=0.4, |
| type_biases={"hold": 0.2, "slide": 0.8}, |
| break_penalty=2.5, max_break_ratio=0.06, |
| ) |
| ``` |
|
|
| ## 推理参数 |
|
|
| | 参数 | 默认值 | 说明 | |
| |------|--------|------| |
| | `--temperature` | 0.8 | 采样温度 | |
| | `--top-k` | 50 | Top-K 过滤 | |
| | `--top-p` | 0.95 | Nucleus 过滤 | |
| | `--density-bias` | 0.0 | 密度引导强度 | |
| | `--target-notes` | auto | 目标 note 数 | |
| | `--hold-bias` | 0.0 | HOLD 偏置 | |
| | `--slide-bias` | 0.0 | SLIDE 偏置 | |
| | `--break-penalty` | 1.5 | BREAK 比例惩罚 | |
| | `--max-break-ratio` | 0.08 | BREAK 比例上限 | |
| | `--precision` | float32 | float16/bf16 精度 | |
|
|
| 完整参数见 [README](https://github.com/Goldgom/maiGenerator) 或 `python inference.py --help`。 |
|
|
| ## 训练数据 |
|
|
| - 1,730 首歌曲,~5,000 个谱面样本 |
| - 难度范围: BASIC ~ ReMASTER (lv1~15) |
| - 谱面类型: SD / DX / FULLTOUCH |
|
|
| ## 技术细节 |
|
|
| - **音频处理**: EnCodec 24kHz, 2-layer RVQ, 75Hz 帧率 |
| - **分词器**: 基础 256 token + Config Token(单 token 编码完整 note) |
| - **约束解码**: 硬语法状态机 (NORMAL → SIM → DUR → SLIDE_BODY) |
| - **采样引导**: 密度引导 + 类型偏置 + BREAK 比例控制 |
| |
| ## 代码仓库 |
| |
| https://github.com/Goldgom/maiGenerator |
| |
| ## License |
| |
| MIT |
| |