minchyeom
/

StarVoice

Model card Files Files and versions

StarVoice / README.md

minchyeom's picture

Update README.md

adc67c4 verified 14 days ago

|

history blame contribute delete

965 Bytes

	# How to run

	```bash
	git clone https://github.com/vllm-project/vllm-omni.git
	cd vllm-omni
	```

	```bash
	DOCKER_BUILDKIT=1 docker build \
	-f docker/Dockerfile.cuda \
	--build-arg BASE_IMAGE=vllm/vllm-openai:latest \
	-t vllm-omni-custom:latest \
	.
	```
	The above is for Nvidia GPU.

	```
	docker rm -f vllm

	docker run -d \
	--name vllm \
	--gpus all \
	--ipc=host \
	-p 8000:8000 \
	-v ~/.cache/huggingface:/root/.cache/huggingface \
	-e HF_TOKEN="$HF_TOKEN" \
	-e CUDA_VISIBLE_DEVICES=0 \
	--entrypoint /bin/bash \
	vllm-omni-custom:latest \
	-lc 'pip install --no-cache-dir "vllm[audio]" torchdiffeq && \
	vllm serve minchyeom/StarVoice \
	--omni \
	--served-model-name starlette \
	--dtype float32 \
	--max-model-len 32768 \
	--gpu-memory-utilization 0.5 \
	--trust-remote-code \
	--host 0.0.0.0 \
	--port 8000'
	```
	Configure `--gpu-memory-utilization` according to your GPU VRAM budget. I tested this on a single RTX 5090.