Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / text-embeddings-inference /pr_860 /en /amd_gpu.md

HuggingFaceDocBuilder

13 days ago

preview code

download

raw

3.67 kB

Using TEI on AMD Instinct GPUs (ROCm)

Text Embeddings Inference supports AMD Instinct GPUs (MI200, MI300 series) using ROCm.

Prerequisites

AMD Instinct GPU (MI200, MI300 series) with ROCm drivers on the host

Option A: Docker (recommended)

The easiest way to run TEI on AMD GPUs is with the pre-built Docker image:

model=BAAI/bge-base-en-v1.5
volume=$PWD/data  # share a volume to avoid re-downloading weights

docker run \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  --ipc=host \
  -p 8080:80 \
  -v $volume:/data \
  --pull always \
  ghcr.io/huggingface/text-embeddings-inference:rocm-latest \
  --model-id $model --dtype bfloat16

Then test it:

curl http://localhost:8080/embed \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"inputs": "What is Deep Learning?"}'

Option B: Manual setup from source

If you prefer to build from source, use AMD's official ROCm PyTorch image as the base environment.

Step 1: Start the container

docker run -it --device=/dev/kfd --device=/dev/dri \
  --group-add video --shm-size 8g \
  -v $PWD:/workspace \
  rocm/pytorch:latest bash

Inside the container, clone the TEI repository (or mount it via -v) and run the remaining steps from the repo root.

Step 2: Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"

Step 3: Install Python dependencies

PyTorch is already provided by the container image, so install the remaining dependencies without pulling a new torch:

pip install --no-deps -r backends/python/server/requirements-amd.txt
pip install safetensors opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-grpc grpcio-reflection \
    grpc-interceptor einops packaging

Step 4: Generate protobuf stubs

pip install grpcio-tools==1.62.2 mypy-protobuf==3.6.0 types-protobuf

mkdir -p backends/python/server/text_embeddings_server/pb

python -m grpc_tools.protoc \
    -I backends/proto \
    --python_out=backends/python/server/text_embeddings_server/pb \
    --grpc_python_out=backends/python/server/text_embeddings_server/pb \
    --mypy_out=backends/python/server/text_embeddings_server/pb \
    backends/proto/embed.proto

# Fix relative imports in generated files
find backends/python/server/text_embeddings_server/pb/ -name "*.py" \
    -exec sed -i 's/^\(import.*pb2\)/from . \1/g' {} \;

touch backends/python/server/text_embeddings_server/pb/__init__.py

Step 5: Install the Python server package

pip install -e backends/python/server

Step 6: Build the Rust router

cargo build --release \
    --no-default-features \
    --features python,http \
    --bin text-embeddings-router

Step 7: Launch TEI

model=BAAI/bge-base-en-v1.5

./target/release/text-embeddings-router --model-id $model --dtype bfloat16 --port 8080

Once the server is ready, you can test it with a simple embed request:

curl http://localhost:8080/embed \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"inputs": "What is Deep Learning?"}'

Verifying GPU detection

After launch you should see a log line confirming ROCm was detected:

INFO text_embeddings_server::utils::device: ROCm / HIP version: X.Y.Z

You can also verify from Python:

import torch
print(torch.cuda.is_available())  # True
print(torch.version.hip)          # e.g. 6.2.12345-...

Notes

This is a work in progress — more model support and optimized operations for AMD GPUs are coming soon.

Xet Storage Details

Size:: 3.67 kB
Xet hash:: 09438df31a66b9e4409c29b46452074c024cf0a61b886f8e9481a30e0a45e038

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.