Buckets:
| # Using TEI on AMD Instinct GPUs (ROCm) | |
| Text Embeddings Inference supports AMD Instinct GPUs (MI200, MI300 series) using [ROCm](https://rocm.docs.amd.com/). | |
| ## Prerequisites | |
| - AMD Instinct GPU (MI200, MI300 series) with ROCm drivers on the host | |
| ## Option A: Docker (recommended) | |
| The easiest way to run TEI on AMD GPUs is with the pre-built Docker image: | |
| ```shell | |
| model=BAAI/bge-base-en-v1.5 | |
| volume=$PWD/data # share a volume to avoid re-downloading weights | |
| docker run \ | |
| --device /dev/kfd --device /dev/dri \ | |
| --group-add video \ | |
| --ipc=host \ | |
| -p 8080:80 \ | |
| -v $volume:/data \ | |
| --pull always \ | |
| ghcr.io/huggingface/text-embeddings-inference:rocm-latest \ | |
| --model-id $model --dtype bfloat16 | |
| ``` | |
| Then test it: | |
| ```shell | |
| curl http://localhost:8080/embed \ | |
| -X POST \ | |
| -H 'Content-Type: application/json' \ | |
| -d '{"inputs": "What is Deep Learning?"}' | |
| ``` | |
| --- | |
| ## Option B: Manual setup from source | |
| If you prefer to build from source, use AMD's official ROCm PyTorch image as the base environment. | |
| ## Step 1: Start the container | |
| ```shell | |
| docker run -it --device=/dev/kfd --device=/dev/dri \ | |
| --group-add video --shm-size 8g \ | |
| -v $PWD:/workspace \ | |
| rocm/pytorch:latest bash | |
| ``` | |
| Inside the container, clone the TEI repository (or mount it via `-v`) and run the remaining steps from the repo root. | |
| ## Step 2: Install Rust | |
| ```shell | |
| curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y | |
| source "$HOME/.cargo/env" | |
| ``` | |
| ## Step 3: Install Python dependencies | |
| PyTorch is already provided by the container image, so install the remaining dependencies without pulling a new torch: | |
| ```shell | |
| pip install --no-deps -r backends/python/server/requirements-amd.txt | |
| pip install safetensors opentelemetry-api opentelemetry-sdk \ | |
| opentelemetry-exporter-otlp-proto-grpc grpcio-reflection \ | |
| grpc-interceptor einops packaging | |
| ``` | |
| ## Step 4: Generate protobuf stubs | |
| ```shell | |
| pip install grpcio-tools==1.62.2 mypy-protobuf==3.6.0 types-protobuf | |
| mkdir -p backends/python/server/text_embeddings_server/pb | |
| python -m grpc_tools.protoc \ | |
| -I backends/proto \ | |
| --python_out=backends/python/server/text_embeddings_server/pb \ | |
| --grpc_python_out=backends/python/server/text_embeddings_server/pb \ | |
| --mypy_out=backends/python/server/text_embeddings_server/pb \ | |
| backends/proto/embed.proto | |
| # Fix relative imports in generated files | |
| find backends/python/server/text_embeddings_server/pb/ -name "*.py" \ | |
| -exec sed -i 's/^\(import.*pb2\)/from . \1/g' {} \; | |
| touch backends/python/server/text_embeddings_server/pb/__init__.py | |
| ``` | |
| ## Step 5: Install the Python server package | |
| ```shell | |
| pip install -e backends/python/server | |
| ``` | |
| ## Step 6: Build the Rust router | |
| ```shell | |
| cargo build --release \ | |
| --no-default-features \ | |
| --features python,http \ | |
| --bin text-embeddings-router | |
| ``` | |
| ## Step 7: Launch TEI | |
| ```shell | |
| model=BAAI/bge-base-en-v1.5 | |
| ./target/release/text-embeddings-router --model-id $model --dtype bfloat16 --port 8080 | |
| ``` | |
| Once the server is ready, you can test it with a simple embed request: | |
| ```shell | |
| curl http://localhost:8080/embed \ | |
| -X POST \ | |
| -H 'Content-Type: application/json' \ | |
| -d '{"inputs": "What is Deep Learning?"}' | |
| ``` | |
| ## Verifying GPU detection | |
| After launch you should see a log line confirming ROCm was detected: | |
| ``` | |
| INFO text_embeddings_server::utils::device: ROCm / HIP version: X.Y.Z | |
| ``` | |
| You can also verify from Python: | |
| ```python | |
| import torch | |
| print(torch.cuda.is_available()) # True | |
| print(torch.version.hip) # e.g. 6.2.12345-... | |
| ``` | |
| ## Notes | |
| This is a work in progress — more model support and optimized operations for AMD GPUs are coming soon. | |
Xet Storage Details
- Size:
- 3.67 kB
- Xet hash:
- 09438df31a66b9e4409c29b46452074c024cf0a61b886f8e9481a30e0a45e038
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.