How to use from the
Use from the
sentence-transformers library
from sentence_transformers import CrossEncoder

model = CrossEncoder("AronowLab/BOND-reranker")

query = "Which planet is known as the Red Planet?"
passages = [
	"Venus is often called Earth's twin because of its similar size and proximity.",
	"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
	"Jupiter, the largest planet in our solar system, has a prominent red spot.",
	"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

BOND-reranker

A cross-encoder reranker model fine-tuned for biomedical ontology entity normalization, designed to work with the BOND (Biomedical Ontology Neural Disambiguation) system.

Model Description

This model is a cross-encoder reranker trained to improve the accuracy of entity normalization by re-ranking candidate ontology terms retrieved by BOND's initial retrieval stage. It takes a query-candidate pair and outputs a relevance score.

Training Framework: Sentence Transformers with cross-encoder architecture

Model Architecture

  • Type: Cross-Encoder
  • Framework: Sentence Transformers
  • Max Sequence Length: 512 tokens
  • Output: Single relevance score per query-candidate pair
  • Parameters: ~110M (based on BiomedBERT-base)

Training Data

The model was trained on biomedical entity normalization data covering multiple ontologies including:

  • MONDO (diseases)
  • HPO (phenotypes)
  • UBERON (anatomy)
  • Cell Ontology (CL)
  • Gene Ontology (GO)
  • And other biomedical ontologies

Training data consists of query-candidate pairs with relevance labels, where queries are biomedical entity mentions and candidates are ontology terms.

Usage

With BOND Pipeline

from bond.config import BondSettings
from bond.pipeline import BondMatcher

# Configure BOND to use this reranker
settings = BondSettings(
    "model_path",  # Replace with your model path
    enable_reranker=True
)

matcher = BondMatcher(settings=settings)

Direct Usage

import torch
from sentence_transformers import CrossEncoder

# Load model from local path
model = CrossEncoder(
    "model_path",  # Replace with your model path
    device='cuda' if torch.cuda.is_available() else 'cpu'
)

# Example: Rank candidates for a query
query = "cell_type: C_BEST4; tissue: descending colon; organism: Homo sapiens"
candidates = [
    "label: smooth muscle fiber of descending colon; synonyms: non-striated muscle fiber of descending colon",
    "label: smooth muscle cell of colon; synonyms: non-striated muscle fiber of colon",
    "label: epithelial cell of colon; synonyms: colon epithelial cell"
]

# Get ranked results with probabilities
ranked_results = model.rank(query, candidates, return_documents=True, top_k=3)

print("Top 3 ranked results")

for result in ranked_results:
    prob = torch.sigmoid(torch.tensor(result['score'])).item()
    print(f"{prob:.8f} - {result['text']}")

Performance

This reranker is designed to work as the final stage in the BOND pipeline:

  1. Retrieval: Exact + BM25 + Dense retrieval with LLM expansion
  2. Reranking: This cross-encoder model scores and re-ranks top candidates
  3. Output: Final ranked list of ontology terms

The reranker significantly improves precision by re-scoring the top-k candidates (typically k=100) retrieved by the initial retrieval stage.

Evaluation Metrics

Evaluated on biomedical entity normalization development set:

Metric Score
Accuracy 97.50%
F1 Score 82.37%
Precision 79.58%
Recall 85.36%
Average Precision 88.67%
Eval Loss 0.230

Best Model: Checkpoint at step 69,500 (epoch 2.28) with best metric score of 0.9734

Model Files

  • config.json - Model configuration
  • model.safetensors - Model weights in SafeTensors format
  • tokenizer.json - Fast tokenizer
  • vocab.txt - Vocabulary file
  • special_tokens_map.json - Special tokens mapping
  • tokenizer_config.json - Tokenizer configuration

License

Apache 2.0

Downloads last month
2
Safetensors
Model size
41.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support