PhraseBERT ONNX

ONNX export of whaleloops/phrase-bert for lightweight inference using ONNX Runtime — no PyTorch or Transformers required.

Model Details

  • Base model: whaleloops/phrase-bert (BERT-base, 12 layers, 768 hidden dim)
  • Pooling: Mean pooling (attention-mask weighted)
  • Format: ONNX
  • Size: ~416 MB

Usage

Install dependencies (no torch/transformers needed)

pip install onnxruntime tokenizers numpy

Download and run

from huggingface_hub import snapshot_download

# Download the model
model_dir = snapshot_download("langminer/phrase-bert-onnx")
import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

# Load model and tokenizer
session = ort.InferenceSession(f"{model_dir}/model.onnx", providers=["CPUExecutionProvider"])
tokenizer = Tokenizer.from_file(f"{model_dir}/tokenizer.json")
tokenizer.enable_padding(pad_id=0, pad_token="[PAD]")
tokenizer.enable_truncation(max_length=512)

# Encode phrases
phrases = ["play an active role", "participate actively", "machine learning"]
encodings = tokenizer.encode_batch(phrases)

input_ids = np.array([e.ids for e in encodings], dtype=np.int64)
attention_mask = np.array([e.attention_mask for e in encodings], dtype=np.int64)
token_type_ids = np.array([e.type_ids for e in encodings], dtype=np.int64)

# Run inference
outputs = session.run(None, {
    "input_ids": input_ids,
    "attention_mask": attention_mask,
    "token_type_ids": token_type_ids,
})
token_embeddings = outputs[0]  # (batch, seq_len, 768)

# Mean pooling
mask = attention_mask[:, :, np.newaxis].astype(np.float32)
embeddings = np.sum(token_embeddings * mask, axis=1) / np.sum(mask, axis=1)

print(embeddings.shape)  # (3, 768)

Citation

@inproceedings{wang2021phrase,
  title={Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration},
  author={Wang, Shufan and Thompson, Laure and Iyyer, Mohit},
  booktitle={EMNLP},
  year={2021}
}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for langminer/phrase-bert-onnx

Quantized
(1)
this model