Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / text-embeddings-inference /pr_860 /en /amd_gpu.md

HuggingFaceDocBuilder

13 days ago

preview code

download

raw

3.67 kB

	# Using TEI on AMD Instinct GPUs (ROCm)

	Text Embeddings Inference supports AMD Instinct GPUs (MI200, MI300 series) using [ROCm](https://rocm.docs.amd.com/).

	## Prerequisites

	- AMD Instinct GPU (MI200, MI300 series) with ROCm drivers on the host

	## Option A: Docker (recommended)

	The easiest way to run TEI on AMD GPUs is with the pre-built Docker image:

	```shell
	model=BAAI/bge-base-en-v1.5
	volume=$PWD/data # share a volume to avoid re-downloading weights

	docker run \
	--device /dev/kfd --device /dev/dri \
	--group-add video \
	--ipc=host \
	-p 8080:80 \
	-v $volume:/data \
	--pull always \
	ghcr.io/huggingface/text-embeddings-inference:rocm-latest \
	--model-id $model --dtype bfloat16
	```

	Then test it:

	```shell
	curl http://localhost:8080/embed \
	-X POST \
	-H 'Content-Type: application/json' \
	-d '{"inputs": "What is Deep Learning?"}'
	```

	---

	## Option B: Manual setup from source

	If you prefer to build from source, use AMD's official ROCm PyTorch image as the base environment.

	## Step 1: Start the container

	```shell
	docker run -it --device=/dev/kfd --device=/dev/dri \
	--group-add video --shm-size 8g \
	-v $PWD:/workspace \
	rocm/pytorch:latest bash
	```

	Inside the container, clone the TEI repository (or mount it via `-v`) and run the remaining steps from the repo root.

	## Step 2: Install Rust

	```shell
	curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \| sh -s -- -y
	source "$HOME/.cargo/env"
	```

	## Step 3: Install Python dependencies

	PyTorch is already provided by the container image, so install the remaining dependencies without pulling a new torch:

	```shell
	pip install --no-deps -r backends/python/server/requirements-amd.txt
	pip install safetensors opentelemetry-api opentelemetry-sdk \
	opentelemetry-exporter-otlp-proto-grpc grpcio-reflection \
	grpc-interceptor einops packaging
	```

	## Step 4: Generate protobuf stubs

	```shell
	pip install grpcio-tools==1.62.2 mypy-protobuf==3.6.0 types-protobuf

	mkdir -p backends/python/server/text_embeddings_server/pb

	python -m grpc_tools.protoc \
	-I backends/proto \
	--python_out=backends/python/server/text_embeddings_server/pb \
	--grpc_python_out=backends/python/server/text_embeddings_server/pb \
	--mypy_out=backends/python/server/text_embeddings_server/pb \
	backends/proto/embed.proto

	# Fix relative imports in generated files
	find backends/python/server/text_embeddings_server/pb/ -name "*.py" \
	-exec sed -i 's/^$import.*pb2$/from . \1/g' {} \;

	touch backends/python/server/text_embeddings_server/pb/__init__.py
	```

	## Step 5: Install the Python server package

	```shell
	pip install -e backends/python/server
	```

	## Step 6: Build the Rust router

	```shell
	cargo build --release \
	--no-default-features \
	--features python,http \
	--bin text-embeddings-router
	```

	## Step 7: Launch TEI

	```shell
	model=BAAI/bge-base-en-v1.5

	./target/release/text-embeddings-router --model-id $model --dtype bfloat16 --port 8080
	```

	Once the server is ready, you can test it with a simple embed request:

	```shell
	curl http://localhost:8080/embed \
	-X POST \
	-H 'Content-Type: application/json' \
	-d '{"inputs": "What is Deep Learning?"}'
	```

	## Verifying GPU detection

	After launch you should see a log line confirming ROCm was detected:

	```
	INFO text_embeddings_server::utils::device: ROCm / HIP version: X.Y.Z
	```

	You can also verify from Python:

	```python
	import torch
	print(torch.cuda.is_available()) # True
	print(torch.version.hip) # e.g. 6.2.12345-...
	```

	## Notes

	This is a work in progress — more model support and optimized operations for AMD GPUs are coming soon.

Xet Storage Details

Size:: 3.67 kB
Xet hash:: 09438df31a66b9e4409c29b46452074c024cf0a61b886f8e9481a30e0a45e038

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.