add tei back to readme

#8
by bowang0911 - opened
Files changed (1) hide show
  1. README.md +51 -0
README.md CHANGED
@@ -134,6 +134,57 @@ packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)
134
 
135
  </details>
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  ## Technical Details
138
 
139
  For comprehensive technical details and evaluation results, see our paper on arXiv: https://arxiv.org/abs/2602.11151.
 
134
 
135
  </details>
136
 
137
+ <details>
138
+ <summary>Using Text Embeddings Inference (TEI)</summary>
139
+
140
+ > [!NOTE]
141
+ > Text Embeddings Inference v1.9.0 will be released stable soon, in the meantime
142
+ > feel free to use the latest containers or rather via SHA ``.
143
+
144
+ > [!IMPORTANT]
145
+ > Currently, only int8-quantized embeddings are available via TEI. Remember to use cosine similarity with unnormalized int8 embeddings.
146
+
147
+ - CPU w/ Candle:
148
+
149
+ ```bash
150
+ docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id perplexity-ai/pplx-embed-v1-0.6B --auto-truncate
151
+ ```
152
+
153
+ - CPU w/ ORT (ONNX Runtime):
154
+
155
+ ```bash
156
+ docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id onnx-community/pplx-embed-v1-0.6B --auto-truncate
157
+ ```
158
+
159
+ - GPU w/ CUDA:
160
+
161
+ ```bash
162
+ docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id perplexity-ai/pplx-embed-v1-0.6B --auto-truncate
163
+ ```
164
+
165
+ > Alternatively, when running in CUDA you can use the architecture / compute capability specific
166
+ > container instead of the `cuda-latest`, as that includes the binaries for Turing, Ampere and
167
+ > Hopper, so using a dedicated container will be lighter e.g., `ampere-latest`.
168
+
169
+ And then you can send requests to it via cURL to `/embed`:
170
+
171
+ ```bash
172
+ curl http://0.0.0.0:8080/embed \
173
+ -H "Content-Type: application/json" \
174
+ -d '{
175
+ "inputs": [
176
+ "Scientists explore the universe driven by curiosity.",
177
+ "Children learn through curious exploration.",
178
+ "Historical discoveries began with curious questions.",
179
+ "Animals use curiosity to adapt and survive.",
180
+ "Philosophy examines the nature of curiosity."
181
+ ],
182
+ "normalize": false
183
+ }'
184
+ ```
185
+
186
+ </details>
187
+
188
  ## Technical Details
189
 
190
  For comprehensive technical details and evaluation results, see our paper on arXiv: https://arxiv.org/abs/2602.11151.