pplx-embed-context-v1-0.6b-mlx
MLX conversion of perplexity-ai/pplx-embed-context-v1-0.6b for Apple Silicon.
This is a contextual embedding model. It takes a list of documents where each document is a list of chunks, and returns one embedding matrix per document.
Important Loading Note
This artifact is not loadable through vanilla mlx_lm.load() because MLX-LM does not
natively support Perplexity's custom bidirectional_pplx_qwen3 model type. The repository
includes a small pplx_mlx_convert loader package for this artifact.
Source Code
Conversion and validation code lives in https://github.com/thehumanworks/pplx-mlx.
Install
pip install mlx mlx-lm transformers huggingface_hub numpy
Usage
import sys
from huggingface_hub import snapshot_download
repo_path = snapshot_download("agentmish/pplx-embed-context-v1-0.6b-mlx")
sys.path.insert(0, repo_path)
from pplx_mlx_convert import load_embedder
embedder = load_embedder(repo_path)
doc_chunks = [
[
"Curiosity begins in childhood with endless questions about the world.",
"As we grow, curiosity drives us to explore new ideas.",
"Scientific breakthroughs often start with a curious question.",
],
[
"The curiosity rover explores Mars searching for ancient life.",
"Each discovery on Mars sparks new questions about the universe.",
],
]
embeddings = embedder.encode(doc_chunks)
print(embeddings[0].shape) # (3, 1024)
print(embeddings[1].shape) # (2, 1024)
print(embeddings[0].dtype) # int8
The model natively produces unnormalized int8 embeddings by default. Use cosine similarity
for comparison. embedder.encode(..., quantization="none") returns float32 pooled embeddings,
and embedder.encode(..., quantization="binary") returns binary tanh embeddings.
Conversion Details
- Source model:
perplexity-ai/pplx-embed-context-v1-0.6b - Source revision: see
conversion.json - Converted dtype:
bfloat16 - Embedding dimension:
1024 - Output root expected by this workspace:
artifacts/mlx/pplx-embed-context-v1-0.6b
Validation
Local MLX smoke validation passed with finite raw float embeddings and int8 contextual output
shapes [[2, 1024], [1, 1024]].
Compared against the original Transformers remote-code float32 model on sample contextual inputs:
- cosine similarities: 0.9995409, 0.9995418, 0.9997643
- int8 delta: max absolute int8 delta 2; mean absolute int8 delta 0.209 and 0.145
The MLX artifact is bfloat16 while the reference path used float32, so int8 values are not expected to be bit-identical.
License
The source model is MIT licensed. This conversion preserves the MIT license.
- Downloads last month
- 125
Quantized
Model tree for agentmish/pplx-embed-context-v1-0.6b-mlx
Base model
perplexity-ai/pplx-embed-context-v1-0.6b