# WhisperLive-TensorRT We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup. **Note**: We use `tensorrt_llm==0.18.2` ## Installation - Install [docker](https://docs.docker.com/engine/install/) - Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) - Run WhisperLive TensorRT in docker ```bash docker build . -f docker/Dockerfile.tensorrt -t whisperlive-tensorrt docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisperlive-tensorrt ``` ## Whisper TensorRT Engine - We build `small.en` and `small` multilingual TensorRT engine as examples below. The script logs the path of the directory with Whisper TensorRT engine. We need that model_path to run the server. ```bash # convert small.en bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en # float16 bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int8 # int8 weight only quantization bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en int4 # int4 weight only quantization # convert small multilingual model bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small ``` ## Run WhisperLive Server with TensorRT Backend ```bash # Run English only model python3 run_server.py --port 9090 \ --backend tensorrt \ --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en_float16" # Run Multilingual model python3 run_server.py --port 9090 \ --backend tensorrt \ --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \ --trt_multilingual ``` By default trt_backend uses cpp_session, to use python session pass `--trt_py_session` to run_server.py ```bash python3 run_server.py --port 9090 \ --backend tensorrt \ --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_float16" \ --trt_py_session ```