| # WhisperLive-TensorRT | |
| We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup. | |
| **Note**: We use `tensorrt_llm==0.18.2` | |
| ## Installation | |
| - Install [docker](https://docs.docker.com/engine/install/) | |
| - Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) | |
| - Run WhisperLive TensorRT in docker | |
| ```bash | |
| docker build . -f docker/Dockerfile.tensorrt -t whisperlive-tensorrt | |
| docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisperlive-tensorrt | |
| ``` | |
| ## Whisper TensorRT Engine | |
| - We build `small.en` and `small` multilingual TensorRT engine as examples below. The script logs the path of the directory with Whisper TensorRT engine. We need that model_path to run the server. | |
| ```bash | |
| # convert small.en | |
| bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en # float16 | |
| bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int8 # int8 weight only quantization | |
| bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int4 # int4 weight only quantization | |
| # convert small multilingual model | |
| bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small | |
| ``` | |
| ## Run WhisperLive Server with TensorRT Backend | |
| ```bash | |
| # Run English only model | |
| python3 run_server.py --port 9090 \ | |
| --backend tensorrt \ | |
| --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en_float16" | |
| # Run Multilingual model | |
| python3 run_server.py --port 9090 \ | |
| --backend tensorrt \ | |
| --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \ | |
| --trt_multilingual | |
| ``` | |
| By default trt_backend uses cpp_session, to use python session pass `--trt_py_session` to run_server.py | |
| ```bash | |
| python3 run_server.py --port 9090 \ | |
| --backend tensorrt \ | |
| --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \ | |
| --trt_py_session | |
| ``` |