pplx-embed-context-v1-0.6b-mlx

MLX conversion of perplexity-ai/pplx-embed-context-v1-0.6b for Apple Silicon.

This is a contextual embedding model. It takes a list of documents where each document is a list of chunks, and returns one embedding matrix per document.

Important Loading Note

This artifact is not loadable through vanilla mlx_lm.load() because MLX-LM does not natively support Perplexity's custom bidirectional_pplx_qwen3 model type. The repository includes a small pplx_mlx_convert loader package for this artifact.

Source Code

Conversion and validation code lives in https://github.com/thehumanworks/pplx-mlx.

Install

pip install mlx mlx-lm transformers huggingface_hub numpy

Usage

import sys
from huggingface_hub import snapshot_download

repo_path = snapshot_download("agentmish/pplx-embed-context-v1-0.6b-mlx")
sys.path.insert(0, repo_path)

from pplx_mlx_convert import load_embedder

embedder = load_embedder(repo_path)
doc_chunks = [
    [
        "Curiosity begins in childhood with endless questions about the world.",
        "As we grow, curiosity drives us to explore new ideas.",
        "Scientific breakthroughs often start with a curious question.",
    ],
    [
        "The curiosity rover explores Mars searching for ancient life.",
        "Each discovery on Mars sparks new questions about the universe.",
    ],
]

embeddings = embedder.encode(doc_chunks)
print(embeddings[0].shape)  # (3, 1024)
print(embeddings[1].shape)  # (2, 1024)
print(embeddings[0].dtype)  # int8

The model natively produces unnormalized int8 embeddings by default. Use cosine similarity for comparison. embedder.encode(..., quantization="none") returns float32 pooled embeddings, and embedder.encode(..., quantization="binary") returns binary tanh embeddings.

Conversion Details

  • Source model: perplexity-ai/pplx-embed-context-v1-0.6b
  • Source revision: see conversion.json
  • Converted dtype: bfloat16
  • Embedding dimension: 1024
  • Output root expected by this workspace: artifacts/mlx/pplx-embed-context-v1-0.6b

Validation

Local MLX smoke validation passed with finite raw float embeddings and int8 contextual output shapes [[2, 1024], [1, 1024]].

Compared against the original Transformers remote-code float32 model on sample contextual inputs:

  • cosine similarities: 0.9995409, 0.9995418, 0.9997643
  • int8 delta: max absolute int8 delta 2; mean absolute int8 delta 0.209 and 0.145

The MLX artifact is bfloat16 while the reference path used float32, so int8 values are not expected to be bit-identical.

License

The source model is MIT licensed. This conversion preserves the MIT license.

Downloads last month
125
Safetensors
Model size
0.6B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentmish/pplx-embed-context-v1-0.6b-mlx

Finetuned
(1)
this model