# ACE-Step OpenRouter API Documentation > OpenAI Chat Completions-compatible API for AI music generation **Base URL:** `http://{host}:{port}` (default `http://127.0.0.1:8002`) --- ## Table of Contents - [Authentication](#authentication) - [Endpoints](#endpoints) - [POST /v1/chat/completions - Generate Music](#1-generate-music) - [GET /v1/models - List Models](#2-list-models) - [GET /health - Health Check](#3-health-check) - [Input Modes](#input-modes) - [Audio Input](#audio-input) - [Streaming Responses](#streaming-responses) - [Examples](#examples) - [Error Codes](#error-codes) --- ## Authentication If the server is configured with an API key (via the `OPENROUTER_API_KEY` environment variable or `--api-key` CLI flag), all requests must include the following header: ``` Authorization: Bearer ``` No authentication is required when no API key is configured. --- ## Endpoints ### 1. Generate Music **POST** `/v1/chat/completions` Generates music from chat messages and returns audio data along with LM-generated metadata. #### Request Parameters | Field | Type | Required | Default | Description | |---|---|---|---|---| | `model` | string | No | auto | Model ID (obtain from `/v1/models`) | | `messages` | array | **Yes** | - | Chat message list. See [Input Modes](#input-modes) | | `stream` | boolean | No | `false` | Enable streaming response. See [Streaming Responses](#streaming-responses) | | `audio_config` | object | No | `null` | Audio generation configuration. See below | | `temperature` | float | No | `0.85` | LM sampling temperature | | `top_p` | float | No | `0.9` | LM nucleus sampling parameter | | `seed` | int \| string | No | `null` | Random seed. When `batch_size > 1`, use comma-separated values, e.g. `"42,123,456"` | | `lyrics` | string | No | `""` | Lyrics passed directly (takes priority over lyrics parsed from messages). When set, messages text becomes the prompt | | `sample_mode` | boolean | No | `false` | Enable LLM sample mode. Messages text becomes sample_query for LLM to auto-generate prompt/lyrics | | `thinking` | boolean | No | `false` | Enable LLM thinking mode for deeper reasoning | | `use_format` | boolean | No | `false` | When user provides prompt/lyrics, enhance them via LLM formatting | | `use_cot_caption` | boolean | No | `true` | Rewrite/enhance the music description via Chain-of-Thought | | `use_cot_language` | boolean | No | `true` | Auto-detect vocal language via Chain-of-Thought | | `guidance_scale` | float | No | `7.0` | Classifier-free guidance scale | | `batch_size` | int | No | `1` | Number of audio samples to generate | | `task_type` | string | No | `"text2music"` | Task type. See [Audio Input](#audio-input) | | `repainting_start` | float | No | `0.0` | Repaint region start position (seconds) | | `repainting_end` | float | No | `null` | Repaint region end position (seconds) | | `audio_cover_strength` | float | No | `1.0` | Cover strength (0.0~1.0) | #### audio_config Object | Field | Type | Default | Description | |---|---|---|---| | `duration` | float | `null` | Audio duration in seconds. If omitted, determined automatically by the LM | | `bpm` | integer | `null` | Beats per minute. If omitted, determined automatically by the LM | | `vocal_language` | string | `"en"` | Vocal language code (e.g. `"zh"`, `"en"`, `"ja"`) | | `instrumental` | boolean | `null` | Whether to generate instrumental-only (no vocals). If omitted, auto-determined from lyrics | | `format` | string | `"mp3"` | Output audio format | | `key_scale` | string | `null` | Musical key (e.g. `"C major"`) | | `time_signature` | string | `null` | Time signature (e.g. `"4/4"`) | > **Messages text meaning depends on the mode:** > - If `lyrics` is set → messages text = prompt (music description) > - If `sample_mode: true` is set → messages text = sample_query (let LLM generate everything) > - Neither set → auto-detect: tags → tag mode, lyrics-like → lyrics mode, otherwise → sample mode #### messages Format Supports both plain text and multimodal (text + audio) formats: **Plain text:** ```json { "messages": [ {"role": "user", "content": "Your input content"} ] } ``` **Multimodal (with audio input):** ```json { "messages": [ { "role": "user", "content": [ {"type": "text", "text": "Cover this song"}, { "type": "input_audio", "input_audio": { "data": "", "format": "mp3" } } ] } ] } ``` --- #### Non-Streaming Response (`stream: false`) ```json { "id": "chatcmpl-a1b2c3d4e5f6g7h8", "object": "chat.completion", "created": 1706688000, "model": "acemusic/acestep-v15-turbo", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "## Metadata\n**Caption:** Upbeat pop song...\n**BPM:** 120\n**Duration:** 30s\n**Key:** C major\n\n## Lyrics\n[Verse 1]\nHello world...", "audio": [ { "type": "audio_url", "audio_url": { "url": "data:audio/mpeg;base64,SUQzBAAAAAAAI1RTU0UAAAA..." } } ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 100, "total_tokens": 110 } } ``` **Response Fields:** | Field | Description | |---|---| | `choices[0].message.content` | Text information generated by the LM, including Metadata (Caption/BPM/Duration/Key/Time Signature/Language) and Lyrics. Returns `"Music generated successfully."` if LM was not involved | | `choices[0].message.audio` | Audio data array. Each item contains `type` (`"audio_url"`) and `audio_url.url` (Base64 Data URL in format `data:audio/mpeg;base64,...`) | | `choices[0].finish_reason` | `"stop"` indicates normal completion | **Decoding Audio:** The `audio_url.url` value is a Data URL: `data:audio/mpeg;base64,` Extract the base64 portion after the comma and decode it to get the MP3 file: ```python import base64 url = response["choices"][0]["message"]["audio"][0]["audio_url"]["url"] # Strip the "data:audio/mpeg;base64," prefix b64_data = url.split(",", 1)[1] audio_bytes = base64.b64decode(b64_data) with open("output.mp3", "wb") as f: f.write(audio_bytes) ``` ```javascript const url = response.choices[0].message.audio[0].audio_url.url; const b64Data = url.split(",")[1]; const audioBytes = atob(b64Data); // Or use the Data URL directly in an