perplexity-ai
/

pplx-embed-v1-4b

Feature Extraction

sentence-transformers

bidirectional_pplx_qwen3

sentence-similarity

Model card Files Files and versions

bowang0911 commited on 12 days ago

Commit

964a2aa

·

1 Parent(s): f862230

Update README.md (#9)

- Update README.md (6dfa7e22117fbfd67b53b889676962385fa8339f)

Files changed (1) hide show

README.md +52 -0

README.md CHANGED Viewed

@@ -134,6 +134,58 @@ packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)
 </details>
 ## Technical Details
 For comprehensive technical details and evaluation results, see our paper on arXiv: https://arxiv.org/abs/2602.11151.

 </details>
+<details>
+<summary>Using Text Embeddings Inference (TEI)</summary>
+> [!NOTE]
+> Text Embeddings Inference v1.9.2+ is required.
+> [!IMPORTANT]
+> Currently, only int8-quantized embeddings are available via TEI. Remember to use cosine similarity with unnormalized int8 embeddings.
+- CPU w/ Candle:
+```bash
+docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32
+```
+- CPU w/ ORT (ONNX Runtime):
+```bash
+docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 --model-id onnx-community/pplx-embed-v1-4B --dtype float32
+```
+- GPU w/ CUDA:
+```bash
+docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32
+```
+> If you hit OOM during warmup, lower --max-batch-tokens and --max-client-batch-size. Set --max-batch-tokens to max_sequence_length × batch_size (e.g., 2048 tokens × 8 sequences = 16384).
+> Alternatively, when running in CUDA you can use the architecture / compute capability specific
+> container instead of the `cuda-1.9`, as that includes the binaries for Turing, Ampere, Hopper and
+> Blackwell, so using a dedicated container will be lighter e.g., `ampere-1.9`.
+And then you can send requests to it via cURL to `/embed`:
+```bash
+curl http://0.0.0.0:8080/embed \
+  -H "Content-Type: application/json" \
+  -d '{
+    "inputs": [
+      "Scientists explore the universe driven by curiosity.",
+      "Children learn through curious exploration.",
+      "Historical discoveries began with curious questions.",
+      "Animals use curiosity to adapt and survive.",
+      "Philosophy examines the nature of curiosity."
+    ],
+    "normalize": false
+  }'
+```
+</details>
 ## Technical Details
 For comprehensive technical details and evaluation results, see our paper on arXiv: https://arxiv.org/abs/2602.11151.