Skill Pilot Media Server

An all-in-one Media Server for Skill Pilot AI Agent https://skill-pilot.ai. Includes ComfyUI for image/video generation, IndexTTS for text-to-speech, MuseTalk for talking video and live avatar, and SongBloom for music creation.

Included Models and Tools

Tool Capabilities
ComfyUI Image generation, video generation, and talking video via Z-Image and Wan2.2
IndexTTS High-quality text-to-speech synthesis
MuseTalk Lip-sync video generation and real-time live avatar
SongBloom AI music & song creation

How to Use

docker pull skillpilot/media-server:latest
docker run --gpus all \
  -p 18188:8188 -p 17860:7860 -p 18080:8080 \
  -p 13478:3478 -p 13478:3478/udp \
  -p 15349:5349 -p 15349:5349/udp \
  skillpilot/media-server:latest

When the container is running, it will automatically download the modeles by comand line below:

huggingface-cli download skill-pilot/media-mcp --local-dir /home/ubuntu/workspace/models

Use command below to check the downloading process in the container:

tmux attach -t download -r

You can use the services until the downloading process is completed. The models will be stored in the /home/ubuntu/workspace/models directory inside the container.

For how to use the media mcp server, please check at https://skill-pilot.ai, and find our Discord server invite link at https://skill-pilot.ai and join us for support.

The image is built on runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04, ensuring compatibility with a wide range of NVIDIA GPUs and CUDA versions.

Tested on NVIDIA RTX 2060 GPU (12 GB VRAM) - MuseTalk live avatar can have real-time performance at 12 FPS.

Supports local GPU acceleration with NVIDIA drivers on the host machine or Runpod Cloud environments.

Free community support — no account required.

Join our Discord server for help, setup, tips, and updates. Find the invite link at https://skill-pilot.ai.

License

Free and open source https://skill-pilot.ai · MIT License

All models included in this repository are licensed under their respective licenses. Please refer to the individual model documentation for details.

Downloads last month
62
GGUF
Model size
12B params
Architecture
flux
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support