Semantic Highlight Bilingual Model (Preview)

We have released the official v1 version, which has better performance. This model will be taken offline.

What is Semantic Highlight?

Traditional search highlighting works by matching keywords. When you search for "iPhone performance" on an e-commerce site, only the words "iPhone" and "performance" get highlighted in the results. But what if the product description says "Powered by A15 Bionic chip, scores over 1 million in benchmarks, smooth performance with no lag"? This clearly answers the performance question, yet nothing gets highlighted because it doesn't contain the exact word "performance".

Semantic Highlight solves this problem by understanding meaning, not just matching words. It highlights text segments that are semantically relevant to your query, even if they don't contain the exact keywords. This is crucial in RAG (Retrieval-Augmented Generation) scenarios where users need to quickly identify relevant information in long retrieved documents.

Why a Lightweight Model?

Highlighting happens on every search query - it needs to be fast and cost-effective. Large language models would be too slow and expensive for this real-time task. This model is designed to be:

Small: ~560MB, deployable on standard servers
Fast: Millisecond-level inference
Accurate: Trained on context-relevance datasets

Model Details

Base Model: BAAI/bge-reranker-v2-m3
Languages: Chinese and English
Task: Context relevance prediction for semantic highlighting
Status: ⚠️ Preview Version - This is an experimental release

Quick Start

Installation

pip install transformers torch

Usage

English Example

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "Zilliz/semantic-highlight-bilingual-pre",
    trust_remote_code=True
)

question = "What are the symptoms of dehydration?"
context = """
Dehydration occurs when your body loses more fluid than you take in.
Common signs include feeling thirsty and having a dry mouth.
The human body is composed of about 60% water.
Dark yellow urine and infrequent urination are warning signs.
Water is essential for many bodily functions.
Dizziness, fatigue, and headaches can indicate severe dehydration.
Drinking 8 glasses of water daily is often recommended.
"""

result = model.process(
    question=question,
    context=context,
    threshold=0.5,
    language="en",
)

print("Relevant sentences:")
print(result["pruned_context"])
# Output: Only the 3 sentences describing actual symptoms
# - "Common signs include feeling thirsty and having a dry mouth."
# - "Dark yellow urine and infrequent urination are warning signs."
# - "Dizziness, fatigue, and headaches can indicate severe dehydration."

Chinese Example

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "Zilliz/semantic-highlight-bilingual-pre",
    trust_remote_code=True
)

question = "北京有什么好吃的？"
context = """
北京烤鸭是北京最著名的特色美食，皮酥肉嫩，配上薄饼和甜面酱。
故宫是明清两代的皇家宫殿，也是世界上现存规模最大的木质结构古建筑群。
炸酱面是北京的传统面食，用黄酱配上黄瓜丝和豆芽菜。
长城是中国古代的军事防御工程，绵延数千公里。
老北京涮羊肉以铜锅为特色，羊肉鲜嫩，蘸料丰富。
天坛是明清两代皇帝祭天的场所，建筑精美。
豆汁儿是北京独特的传统小吃，口味特别，配上焦圈最地道。
颐和园是清朝的皇家园林，以昆明湖和万寿山为主体。
"""

result = model.process(
    question=question,
    context=context,
    threshold=0.5,
    language="zh",
)

print("相关句子:")
print(result["pruned_context"])
# 输出：只有4个关于美食的句子
# - "北京烤鸭是北京最著名的特色美食，皮酥肉嫩，配上薄饼和甜面酱。"
# - "炸酱面是北京的传统面食，用黄酱配上黄瓜丝和豆芽菜。"
# - "老北京涮羊肉以铜锅为特色，羊肉鲜嫩，蘸料丰富。"
# - "豆汁儿是北京独特的传统小吃，口味特别，配上焦圈最地道。"
# (过滤掉了所有关于景点的句子)

Parameters

question: Query text
context: Document text to highlight
threshold: Relevance threshold (0-1), default 0.5. Lower values include more sentences.
language: Language code ("en", "zh", or "auto")
return_sentence_metrics: Return per-sentence relevance scores

Output

pruned_context: Highlighted text (relevant sentences only)
compression_rate: Percentage of text removed
sentence_probabilities: Relevance score for each sentence (if return_sentence_metrics=True)

Notes

⚠️ This is a preview version. The model is still under development and improvements are ongoing.

Acknowledgments

This model is built upon the excellent work of Open Provence project. Open Provence pioneered the approach of using lightweight reranker models for context pruning in RAG systems, demonstrating that semantic relevance prediction can be achieved efficiently without relying on large language models.

We extend our sincere gratitude to the Open Provence team for:

Developing the training methodology for context-relevance prediction
Creating comprehensive datasets for model training
Open-sourcing the entire framework and making it accessible to the community

This preview model represents our initial exploration of semantic highlighting for bilingual scenarios (Chinese and English), standing on the shoulders of their foundational work.

License

Same as base model: MIT License

Citation

If you use this model, please cite the base model:

@misc{bge-reranker-v2-m3,
  title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
  author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
  year={2024},
  eprint={2402.03216},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Downloads last month: 7

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zilliz/semantic-highlight-bilingual-pre

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5, 2024 • 9