Spaces:

OnyxMunk
/

Ace-Step-Munk

Running

字段	类型	必填	默认值	说明
`model`	string	否	自动	模型 ID（从 `/v1/models` 获取）
`messages`	array	是	-	聊天消息列表，见输入模式
`stream`	boolean	否	`false`	是否启用流式返回，见流式响应
`audio_config`	object	否	`null`	音频生成配置，见下方
`temperature`	float	否	`0.85`	LM 采样温度
`top_p`	float	否	`0.9`	LM nucleus sampling
`seed`	int \| string	否	`null`	随机种子。`batch_size > 1` 时可用逗号分隔指定多个，如 `"42,123,456"`
`lyrics`	string	否	`""`	直接传入歌词（优先级高于 messages 中解析的歌词），此时 messages 文本作为 prompt
`sample_mode`	boolean	否	`false`	启用 LLM sample 模式，messages 文本作为 sample_query 由 LLM 自动生成 prompt/lyrics
`thinking`	boolean	否	`false`	是否启用 LLM thinking 模式（更深度推理）
`use_format`	boolean	否	`false`	当用户提供 prompt/lyrics 时，是否先通过 LLM 格式化增强
`use_cot_caption`	boolean	否	`true`	是否通过 CoT 改写/增强音乐描述
`use_cot_language`	boolean	否	`true`	是否通过 CoT 自动检测歌词语言
`guidance_scale`	float	否	`7.0`	Classifier-free guidance scale
`batch_size`	int	否	`1`	生成音频数量
`task_type`	string	否	`"text2music"`	任务类型，见音频输入
`repainting_start`	float	否	`0.0`	repaint 区域起始位置（秒）
`repainting_end`	float	否	`null`	repaint 区域结束位置（秒）
`audio_cover_strength`	float	否	`1.0`	cover 强度 (0.0~1.0)

audio_config 对象

字段	类型	默认值	说明
`duration`	float	`null`	音频时长（秒），不传由 LM 自动决定
`bpm`	integer	`null`	每分钟节拍数，不传由 LM 自动决定
`vocal_language`	string	`"en"`	歌词语言代码（如 `"zh"`, `"en"`, `"ja"`）
`instrumental`	boolean	`null`	是否为纯器乐（无人声）。不传时根据歌词自动判断
`format`	string	`"mp3"`	输出音频格式
`key_scale`	string	`null`	调号（如 `"C major"`）
`time_signature`	string	`null`	拍号（如 `"4/4"`）

messages 文本含义取决于模式：

设置了 lyrics → messages 文本 = prompt（音乐描述）

设置了 sample_mode: true → messages 文本 = sample_query（交给 LLM 生成一切）

均未设置 → 自动检测：有标签走标签模式，像歌词走歌词模式，否则走 sample 模式

messages 格式

支持纯文本和多模态（文本 + 音频）两种格式：

纯文本：

{
  "messages": [
    {"role": "user", "content": "你的输入内容"}
  ]
}

多模态（含音频输入）：

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "翻唱这首歌"},
        {
          "type": "input_audio",
          "input_audio": {
            "data": "<base64 音频数据>",
            "format": "mp3"
          }
        }
      ]
    }
  ]
}

非流式响应 (`stream: false`)

{
  "id": "chatcmpl-a1b2c3d4e5f6g7h8",
  "object": "chat.completion",
  "created": 1706688000,
  "model": "acemusic/acestep-v15-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "## Metadata\n**Caption:** Upbeat pop song...\n**BPM:** 120\n**Duration:** 30s\n**Key:** C major\n\n## Lyrics\n[Verse 1]\nHello world...",
        "audio": [
          {
            "type": "audio_url",
            "audio_url": {
              "url": "data:audio/mpeg;base64,SUQzBAAAAAAAI1RTU0UAAAA..."
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 100,
    "total_tokens": 110
  }
}

响应字段说明：

字段	说明
`choices[0].message.content`	LM 生成的文本信息，包含 Metadata（Caption/BPM/Duration/Key/Time Signature/Language）和 Lyrics。如果 LM 未参与，返回 `"Music generated successfully."`
`choices[0].message.audio`	音频数据数组，每项包含 `type` (`"audio_url"`) 和 `audio_url.url`（Base64 Data URL，格式 `data:audio/mpeg;base64,...`）
`choices[0].finish_reason`	`"stop"` 表示正常完成

音频解码格式：

audio_url.url 值为 Data URL 格式：data:audio/mpeg;base64,<base64_data>

客户端提取 base64 数据部分后解码即可得到 MP3 文件：

import base64

url = response["choices"][0]["message"]["audio"][0]["audio_url"]["url"]
# 去掉 "data:audio/mpeg;base64," 前缀
b64_data = url.split(",", 1)[1]
audio_bytes = base64.b64decode(b64_data)

with open("output.mp3", "wb") as f:
    f.write(audio_bytes)

const url = response.choices[0].message.audio[0].audio_url.url;
const b64Data = url.split(",")[1];
const audioBytes = atob(b64Data);
// 或直接用于 <audio> 标签
const audio = new Audio(url);
audio.play();

2. 模型列表

GET /v1/models

返回可用模型信息。

响应

{
  "data": [
    {
      "id": "acemusic/acestep-v15-turbo",
      "name": "ACE-Step",
      "created": 1706688000,
      "description": "High-performance text-to-music generation model. Supports multiple styles, lyrics input, and various audio durations.",
      "input_modalities": ["text", "audio"],
      "output_modalities": ["audio", "text"],
      "context_length": 4096,
      "pricing": {"prompt": "0", "completion": "0", "request": "0"},
      "supported_sampling_parameters": ["temperature", "top_p"]
    }
  ]
}

3. 健康检查

GET /health

响应

{
  "status": "ok",
  "service": "ACE-Step OpenRouter API",
  "version": "1.0"
}

输入模式

系统根据 messages 中最后一条 user 消息的内容自动选择输入模式。也可通过 lyrics 或 sample_mode 字段显式指定。

模式 1: 标签模式（推荐）

使用 <prompt> 和 <lyrics> 标签明确指定音乐描述和歌词：

{
  "messages": [
    {
      "role": "user",
      "content": "<prompt>A gentle acoustic ballad in C major, female vocal</prompt>\n<lyrics>[Verse 1]\nSunlight through the window\nA brand new day begins\n\n[Chorus]\nWe are the dreamers\nWe are the light</lyrics>"
    }
  ],
  "audio_config": {
    "duration": 30,
    "vocal_language": "en"
  }
}

<prompt>...</prompt> — 音乐风格/场景描述（即 caption）
<lyrics>...</lyrics> — 歌词内容
两个标签可以只传其中一个
当 use_format: true 时，LLM 会自动增强 prompt 和 lyrics

模式 2: 自然语言模式（Sample 模式）

直接用自然语言描述想要的音乐，系统自动通过 LLM 生成 prompt 和 lyrics：

{
  "messages": [
    {"role": "user", "content": "帮我生成一首欢快的中文流行歌曲，关于夏天和旅行"}
  ],
  "sample_mode": true,
  "audio_config": {
    "vocal_language": "zh"
  }
}

触发条件：sample_mode: true，或消息内容不包含标签且不像歌词时自动触发。

模式 3: 纯歌词模式

直接传入带结构标记的歌词，系统自动识别：

{
  "messages": [
    {
      "role": "user",
      "content": "[Verse 1]\nWalking down the street\nFeeling the beat\n\n[Chorus]\nDance with me tonight\nUnder the moonlight"
    }
  ],
  "audio_config": {"duration": 30}
}

触发条件：消息内容包含 [Verse]、[Chorus] 等标记，或有多行短文本结构。

模式 4: 歌词 + Prompt 分离

通过 lyrics 字段直接传入歌词，messages 文本自动作为 prompt：

{
  "messages": [
    {"role": "user", "content": "Energetic EDM with heavy bass drops"}
  ],
  "lyrics": "[Verse 1]\nFeel the rhythm in your soul\nLet the music take control\n\n[Drop]\n(instrumental break)",
  "audio_config": {
    "bpm": 128,
    "duration": 60
  }
}

器乐模式

设置 audio_config.instrumental: true：

{
  "messages": [
    {"role": "user", "content": "<prompt>Epic orchestral cinematic score, dramatic and powerful</prompt>"}
  ],
  "audio_config": {
    "instrumental": true,
    "duration": 30
  }
}

音频输入

支持通过多模态 messages 传入音频文件（base64 编码），用于 cover、repaint 等任务。

task_type 类型

task_type	说明	需要音频输入
`text2music`	文本生成音乐（默认）	可选（作为 reference）
`cover`	翻唱/风格迁移	需要 src_audio
`repaint`	局部重绘	需要 src_audio
`lego`	音频拼接	需要 src_audio
`extract`	音频提取	需要 src_audio
`complete`	音频续写	需要 src_audio

音频路由规则

多个 input_audio 块按顺序路由到不同参数（类似多图片上传）：

task_type	audio[0]	audio[1]
`text2music`	reference_audio（风格参考）	-
`cover/repaint/lego/extract/complete`	src_audio（待编辑音频）	reference_audio（可选风格参考）

音频输入示例

Cover 任务（翻唱）：

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "<prompt>Jazz style cover with saxophone</prompt>"},
        {
          "type": "input_audio",
          "input_audio": {"data": "<base64 原始音频>", "format": "mp3"}
        }
      ]
    }
  ],
  "task_type": "cover",
  "audio_cover_strength": 0.8,
  "audio_config": {"duration": 30}
}

Repaint 任务（局部重绘）：

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "<prompt>Replace with guitar solo</prompt>"},
        {
          "type": "input_audio",
          "input_audio": {"data": "<base64 原始音频>", "format": "mp3"}
        }
      ]
    }
  ],
  "task_type": "repaint",
  "repainting_start": 10.0,
  "repainting_end": 20.0,
  "audio_config": {"duration": 30}
}

流式响应

设置 "stream": true 启用 SSE（Server-Sent Events）流式返回。

事件格式

每个事件以 data: 开头，后跟 JSON，以双换行 \n\n 结尾：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1706688000,"model":"acemusic/acestep-v15-turbo","choices":[{"index":0,"delta":{...},"finish_reason":null}]}

流式事件顺序

阶段	delta 内容	说明
1. 初始化	`{"role":"assistant","content":""}`	建立连接
2. LM 内容	`{"content":"\n\n## Metadata\n..."}`	LM 参与时推送 metadata 和 lyrics
3. 心跳	`{"content":"."}`	音频生成期间每 2 秒发送，保持连接
4. 音频数据	`{"audio":[{"type":"audio_url","audio_url":{"url":"data:..."}}]}`	音频 base64
5. 结束	`finish_reason: "stop"`	生成完成
6. 终止	`data: [DONE]`	流结束标记

流式响应示例

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1706688000,"model":"acemusic/acestep-v15-turbo","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1706688000,"model":"acemusic/acestep-v15-turbo","choices":[{"index":0,"delta":{"content":"\n\n## Metadata\n**Caption:** Upbeat pop\n**BPM:** 120"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1706688000,"model":"acemusic/acestep-v15-turbo","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1706688000,"model":"acemusic/acestep-v15-turbo","choices":[{"index":0,"delta":{"audio":[{"type":"audio_url","audio_url":{"url":"data:audio/mpeg;base64,..."}}]},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1706688000,"model":"acemusic/acestep-v15-turbo","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

客户端处理流式响应

import json
import httpx

with httpx.stream("POST", "http://127.0.0.1:8002/v1/chat/completions", json={
    "messages": [{"role": "user", "content": "生成一首轻快的吉他曲"}],
    "sample_mode": True,
    "stream": True,
    "audio_config": {"instrumental": True}
}) as response:
    content_parts = []
    audio_url = None

    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue
        if line == "data: [DONE]":
            break

        chunk = json.loads(line[6:])
        delta = chunk["choices"][0]["delta"]

        if "content" in delta and delta["content"]:
            content_parts.append(delta["content"])

        if "audio" in delta and delta["audio"]:
            audio_url = delta["audio"][0]["audio_url"]["url"]

        if chunk["choices"][0].get("finish_reason") == "stop":
            print("Generation complete!")

    print("Content:", "".join(content_parts))
    if audio_url:
        import base64
        b64_data = audio_url.split(",", 1)[1]
        with open("output.mp3", "wb") as f:
            f.write(base64.b64decode(b64_data))

const response = await fetch("http://127.0.0.1:8002/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    messages: [{ role: "user", content: "生成一首轻快的吉他曲" }],
    sample_mode: true,
    stream: true,
    audio_config: { instrumental: true }
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let audioUrl = null;
let content = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (!line.startsWith("data: ") || line === "data: [DONE]") continue;

    const chunk = JSON.parse(line.slice(6));
    const delta = chunk.choices[0].delta;

    if (delta.content) content += delta.content;
    if (delta.audio) audioUrl = delta.audio[0].audio_url.url;
  }
}

// audioUrl 可直接用于 <audio src="...">

完整示例

示例 1: 自然语言生成（最简用法）

curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "一首温柔的中文民谣，关于故乡和回忆"}
    ],
    "sample_mode": true,
    "audio_config": {"vocal_language": "zh"}
  }'

示例 2: 标签模式 + 指定参数

curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "<prompt>Energetic EDM track with heavy bass drops and synth leads</prompt><lyrics>[Verse 1]\nFeel the rhythm in your soul\nLet the music take control\n\n[Drop]\n(instrumental break)</lyrics>"
      }
    ],
    "audio_config": {
      "bpm": 128,
      "duration": 60,
      "vocal_language": "en"
    }
  }'

示例 3: 纯器乐 + 关闭 LM 增强

curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "<prompt>Peaceful piano solo, slow tempo, jazz harmony</prompt>"
      }
    ],
    "use_cot_caption": false,
    "audio_config": {
      "instrumental": true,
      "duration": 45
    }
  }'

示例 4: 流式请求

curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "messages": [
      {"role": "user", "content": "Generate a happy birthday song"}
    ],
    "sample_mode": true,
    "stream": true
  }'

示例 5: 多种子批量生成

curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "<prompt>Lo-fi hip hop beat</prompt>"}
    ],
    "batch_size": 3,
    "seed": "42,123,456",
    "audio_config": {
      "instrumental": true,
      "duration": 30
    }
  }'

错误码

HTTP 状态码	说明
400	请求格式错误或缺少有效输入
401	API Key 缺失或无效
429	服务繁忙，队列已满
500	音乐生成过程中发生内部错误
503	模型尚未初始化完成
504	生成超时

错误响应格式：

{
  "detail": "错误描述信息"
}

环境变量配置

以下环境变量可用于配置服务端（供运维参考）：

变量名	默认值	说明
`OPENROUTER_API_KEY`	无	API 认证密钥
`OPENROUTER_HOST`	`127.0.0.1`	监听地址
`OPENROUTER_PORT`	`8002`	监听端口
`ACESTEP_CONFIG_PATH`	`acestep-v15-turbo`	DiT 模型配置路径
`ACESTEP_DEVICE`	`auto`	推理设备
`ACESTEP_LM_MODEL_PATH`	`acestep-5Hz-lm-0.6B`	LLM 模型路径
`ACESTEP_LM_BACKEND`	`vllm`	LLM 推理后端
`ACESTEP_QUEUE_MAXSIZE`	`200`	任务队列最大容量
`ACESTEP_GENERATION_TIMEOUT`	`600`	非流式请求超时（秒）

ACE-Step OpenRouter API 文档

目录

认证

接口列表

1. 生成音乐

请求参数