moonshotai
/

Kimi-Audio-7B-Instruct

audio-language-model

speech-recognition

audio-understanding

audio-generation

Model card Files Files and versions

bigmoyan commited on Apr 25, 2025

Commit

9e6699f

·

verified ·

1 Parent(s): 61a01cb

Update README.md

Files changed (1) hide show

README.md +15 -5

README.md CHANGED Viewed

@@ -21,12 +21,12 @@ tags:
 <p>
 <p align="center">
-Kimi-Audio-7B  <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B">🤗</a>&nbsp; ｜ Kimi-Audio-7B-Instruct <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct">🤗</a>&nbsp; | 📑 <a href="https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/main/assets/kimia_report.pdf">Paper</a>
 </p>
 ## Introduction
-We present Kimi-Audio, an open-source audio foundation model excelling in **audio understanding, generation, and conversation**. This repository hosts the model checkpoints for Kimi-Audio-7B and Kimi-Audio-7B-Instruct.
 Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:
@@ -41,14 +41,24 @@ For more details, please refer to our [GitHub Repository](https://github.com/Moo
 ## Requirements
-To run the inference code, you need to install the necessary dependencies. It's recommended to clone the main Kimi-Audio repository and install the `kimia_infer` package from there.
 ```bash
 git clone https://github.com/MoonshotAI/Kimi-Audio
 cd Kimi-Audio
-pip install -e . # install the package for inference
 ```
 ## Quickstart
 This example demonstrates basic usage for generating text from audio (ASR) and generating both text and speech in a conversational turn using the `Kimi-Audio-7B-Instruct` model.

 <p>
 <p align="center">
+Kimi-Audio-7B-Instruct <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct">🤗</a>&nbsp; | 📑 <a href="https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/main/assets/kimia_report.pdf">Paper</a>
 </p>
 ## Introduction
+We present Kimi-Audio, an open-source audio foundation model excelling in **audio understanding, generation, and conversation**. This repository hosts the model checkpoints for Kimi-Audio-7B-Instruct.
 Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:
 ## Requirements
+We recommend that you build a Docker image to run the inference. After cloning the inference code, you can construct the image using the `docker build` command.
 ```bash
 git clone https://github.com/MoonshotAI/Kimi-Audio
 cd Kimi-Audio
+docker build -t kimi-audio:v0.1 .
+```
+Alternatively, You can also use our pre-built image:
+```bash
+docker pull moonshotai/kimi-audio:v0.1
 ```
+Or, you can install requirments by:
+```bash
+pip install -r requirements.txt
+```
+You may refer to the Dockerfile in case of any environment issues.
 ## Quickstart
 This example demonstrates basic usage for generating text from audio (ASR) and generating both text and speech in a conversational turn using the `Kimi-Audio-7B-Instruct` model.