Buckets:

|
download
raw
3.67 kB
# Using TEI on AMD Instinct GPUs (ROCm)
Text Embeddings Inference supports AMD Instinct GPUs (MI200, MI300 series) using [ROCm](https://rocm.docs.amd.com/).
## Prerequisites
- AMD Instinct GPU (MI200, MI300 series) with ROCm drivers on the host
## Option A: Docker (recommended)
The easiest way to run TEI on AMD GPUs is with the pre-built Docker image:
```shell
model=BAAI/bge-base-en-v1.5
volume=$PWD/data # share a volume to avoid re-downloading weights
docker run \
--device /dev/kfd --device /dev/dri \
--group-add video \
--ipc=host \
-p 8080:80 \
-v $volume:/data \
--pull always \
ghcr.io/huggingface/text-embeddings-inference:rocm-latest \
--model-id $model --dtype bfloat16
```
Then test it:
```shell
curl http://localhost:8080/embed \
-X POST \
-H 'Content-Type: application/json' \
-d '{"inputs": "What is Deep Learning?"}'
```
---
## Option B: Manual setup from source
If you prefer to build from source, use AMD's official ROCm PyTorch image as the base environment.
## Step 1: Start the container
```shell
docker run -it --device=/dev/kfd --device=/dev/dri \
--group-add video --shm-size 8g \
-v $PWD:/workspace \
rocm/pytorch:latest bash
```
Inside the container, clone the TEI repository (or mount it via `-v`) and run the remaining steps from the repo root.
## Step 2: Install Rust
```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
```
## Step 3: Install Python dependencies
PyTorch is already provided by the container image, so install the remaining dependencies without pulling a new torch:
```shell
pip install --no-deps -r backends/python/server/requirements-amd.txt
pip install safetensors opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc grpcio-reflection \
grpc-interceptor einops packaging
```
## Step 4: Generate protobuf stubs
```shell
pip install grpcio-tools==1.62.2 mypy-protobuf==3.6.0 types-protobuf
mkdir -p backends/python/server/text_embeddings_server/pb
python -m grpc_tools.protoc \
-I backends/proto \
--python_out=backends/python/server/text_embeddings_server/pb \
--grpc_python_out=backends/python/server/text_embeddings_server/pb \
--mypy_out=backends/python/server/text_embeddings_server/pb \
backends/proto/embed.proto
# Fix relative imports in generated files
find backends/python/server/text_embeddings_server/pb/ -name "*.py" \
-exec sed -i 's/^\(import.*pb2\)/from . \1/g' {} \;
touch backends/python/server/text_embeddings_server/pb/__init__.py
```
## Step 5: Install the Python server package
```shell
pip install -e backends/python/server
```
## Step 6: Build the Rust router
```shell
cargo build --release \
--no-default-features \
--features python,http \
--bin text-embeddings-router
```
## Step 7: Launch TEI
```shell
model=BAAI/bge-base-en-v1.5
./target/release/text-embeddings-router --model-id $model --dtype bfloat16 --port 8080
```
Once the server is ready, you can test it with a simple embed request:
```shell
curl http://localhost:8080/embed \
-X POST \
-H 'Content-Type: application/json' \
-d '{"inputs": "What is Deep Learning?"}'
```
## Verifying GPU detection
After launch you should see a log line confirming ROCm was detected:
```
INFO text_embeddings_server::utils::device: ROCm / HIP version: X.Y.Z
```
You can also verify from Python:
```python
import torch
print(torch.cuda.is_available()) # True
print(torch.version.hip) # e.g. 6.2.12345-...
```
## Notes
This is a work in progress — more model support and optimized operations for AMD GPUs are coming soon.

Xet Storage Details

Size:
3.67 kB
·
Xet hash:
09438df31a66b9e4409c29b46452074c024cf0a61b886f8e9481a30e0a45e038

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.