Files changed (1) hide show
  1. README.md +47 -0
README.md CHANGED
@@ -134,6 +134,53 @@ packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)
134
 
135
  </details>
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
  ## Technical Details
139
 
 
134
 
135
  </details>
136
 
137
+ <details>
138
+ <summary>Using Text Embeddings Inference (TEI)</summary>
139
+
140
+ > [!IMPORTANT]
141
+ > Currently, only int8-quantized embeddings are available via TEI. Remember to use cosine similarity with unnormalized int8 embeddings.
142
+
143
+ - CPU w/ Candle:
144
+
145
+ ```bash
146
+ docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id perplexity-ai/pplx-embed-v1-0.6B --auto-truncate
147
+ ```
148
+
149
+ - CPU w/ ORT (ONNX Runtime):
150
+
151
+ ```bash
152
+ docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id onnx-community/pplx-embed-v1-0.6B --auto-truncate
153
+ ```
154
+
155
+ - GPU w/ CUDA:
156
+
157
+ ```bash
158
+ docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id perplexity-ai/pplx-embed-v1-0.6B --auto-truncate
159
+ ```
160
+
161
+ > Alternatively, when running in CUDA you can use the architecture / compute capability specific
162
+ > container instead of the `cuda-latest`, as that includes the binaries for Turing, Ampere and
163
+ > Hopper, so using a dedicated container will be lighter e.g., `ampere-latest`.
164
+
165
+ And then you can send requests to it via cURL to `/embed`:
166
+
167
+ ```bash
168
+ curl http://0.0.0.0:8080/embed \
169
+ -H "Content-Type: application/json" \
170
+ -d '{
171
+ "inputs": [
172
+ "Scientists explore the universe driven by curiosity.",
173
+ "Children learn through curious exploration.",
174
+ "Historical discoveries began with curious questions.",
175
+ "Animals use curiosity to adapt and survive.",
176
+ "Philosophy examines the nature of curiosity."
177
+ ],
178
+ "normalize": false
179
+ }'
180
+ ```
181
+
182
+ </details>
183
+
184
 
185
  ## Technical Details
186