license: apache-2.0
pipeline_tag: feature-extraction
tags:
- feature-extraction
- sentence-similarity
- mteb
- sentence-transformers
language:
- multilingual
pplx-embed-1: Diffusion-LM for Dense and Contextual Retrieval
pplx-embed-1 and pplx-embed-1-context are state-of-the-art text embedding models optimized for real-world, web-scale retrieval tasks.
- Use
pplx-embed-1for independent text embedding (queries, documents, semantic search) - Use
pplx-embed-1-contextfor document chunks in RAG systems where surrounding context matters
pplx-embed-1andpplx-embed-1-contextnatively produce unnormalized int8-quantized embeddings. Ensure that you compare them via cosine similarity.
Models
| Model | Dimensions | Context | MRL | Quantization | Instruction | Pooling |
|---|---|---|---|---|---|---|
pplx-embed-1-0.6B |
1024 | 32K | Yes | INT8/BINARY | No | Mean |
pplx-embed-1-4B |
2560 | 32K | Yes | INT8/BINARY | No | Mean |
pplx-embed-1-context-0.6B |
1024 | 32K | Yes | INT8/BINARY | No | Mean |
pplx-embed-1-context-4B |
2560 | 32K | Yes | INT8/BINARY | No | Mean |
All models are built on diffusion continued pre-trained Qwen3 at Perplexity AI.
Many modern embedding models rely on instruction tuning, where users prepend an instruction string to the text being embedded. This can yield a 2%-3% lift on benchmarks, but it also introduces prompt-selection overhead and can make indexing pipelines brittle (small instruction changes can shift embedding space). We deliberately avoid this requirement: you can embed the text you want to index directly, without having to choose or maintain an instruction prefix.
Usage
Via API
curl -X POST https://api.perplexity.ai/v1/embeddings \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": [
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
"Animals use curiosity to adapt and survive.",
"Philosophy examines the nature of curiosity."
],
"model": "pplx-embed-1-4B"
}'
Using SentenceTransformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"perplexity-ai/pplx-embed-1-4B",
trust_remote_code=True
)
texts = [
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
"Animals use curiosity to adapt and survive.",
"Philosophy examines the nature of curiosity.",
]
embeddings = model.encode(texts) # Shape: (5, 2560), quantized to int8
embeddings = model.encode(texts, quantization="binary") # Shape: (5, 2560), quantized to binary
Using Text Embeddings Inference (TEI)
Text Embeddings Inference v1.9.0 will be released stable soon, in the meantime feel free to use the latest containers or rather via SHA ``.
Currently, only int8-quantized embeddings are available via TEI. Remember to use cosine similarity with unnormalized int8 embeddings.
- CPU w/ Candle:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id perplexity-ai/pplx-embed-1-4B --auto-truncate
- CPU w/ ORT (ONNX Runtime):
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id onnx-community/pplx-embed-1-4B --auto-truncate
- GPU w/ CUDA:
docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id perplexity-ai/pplx-embed-1-4B --auto-truncate
Alternatively, when running in CUDA you can use the architecture / compute capability specific container instead of the
cuda-latest, as that includes the binaries for Turing, Ampere and Hopper, so using a dedicated container will be lighter e.g.,ampere-latest.
And then you can send requests to it via cURL to /embed:
curl http://0.0.0.0:8080/embed \
-H "Content-Type: application/json" \
-d '{
"inputs": [
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
"Animals use curiosity to adapt and survive.",
"Philosophy examines the nature of curiosity."
],
"normalize": false
}'
Technical Details
For comprehensive technical details and evaluation results, see our paper on arXiv.
Contact
- Website: https://perplexity.ai
- API Support: api-support@perplexity.ai
