---
tags:
- sentence-transformers
- cross-encoder
- reranker
- generated_from_trainer
- dataset_size:2879
- loss:BinaryCrossEntropyLoss
base_model: BAAI/bge-reranker-v2-m3
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
- accuracy
- accuracy_threshold
- f1
- f1_threshold
- precision
- recall
- average_precision
model-index:
- name: CrossEncoder based on BAAI/bge-reranker-v2-m3
results:
- task:
type: cross-encoder-classification
name: Cross Encoder Classification
dataset:
name: entity matching
type: entity-matching
metrics:
- type: accuracy
value: 0.8525121555915721
name: Accuracy
- type: accuracy_threshold
value: 0.44037526845932007
name: Accuracy Threshold
- type: f1
value: 0.8783068783068781
name: F1
- type: f1_threshold
value: 0.3608097732067108
name: F1 Threshold
- type: precision
value: 0.827930174563591
name: Precision
- type: recall
value: 0.9352112676056338
name: Recall
- type: average_precision
value: 0.9356992398880613
name: Average Precision
---
# CrossEncoder based on BAAI/bge-reranker-v2-m3
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
## Model Details
### Model Description
- **Model Type:** Cross Encoder
- **Base model:** [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3)
- **Maximum Sequence Length:** 64 tokens
- **Number of Output Labels:** 1 label
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("pujithapsx/test_fine_flow")
# Get scores for pairs of texts
pairs = [
['Yamini Durga Fernandes', 'Roy Yamini Durga'],
['C/O Ramesh Yadav Village Bairiya Post Bairiya Ballia', 'Village Bairiya C/O Ramesh Yadav Post Bairiya Ballia'],
['Flat 5 Lotus Tower Brigade Road Bengaluru', 'Flat 6 Lotus Tower Brigade Road Bangalore'],
['House 7 Tinsukia Village Post Tinsukia Assam Assam', 'Tinsukia Village Assam'],
['Rudra', 'Rudhraa'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'Yamini Durga Fernandes',
[
'Roy Yamini Durga',
'Village Bairiya C/O Ramesh Yadav Post Bairiya Ballia',
'Flat 6 Lotus Tower Brigade Road Bangalore',
'Tinsukia Village Assam',
'Rudhraa',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
```
## Evaluation
### Metrics
#### Cross Encoder Classification
* Dataset: `entity-matching`
* Evaluated with [CrossEncoderClassificationEvaluator](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderClassificationEvaluator)
| Metric | Value |
|:----------------------|:-----------|
| accuracy | 0.8525 |
| accuracy_threshold | 0.4404 |
| f1 | 0.8783 |
| f1_threshold | 0.3608 |
| precision | 0.8279 |
| recall | 0.9352 |
| **average_precision** | **0.9357** |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 2,879 training samples
* Columns: sentence1, sentence2, and label
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | label |
|:--------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
| type | string | string | int |
| details |
Village Buxar Bihar | Village Buxar Rohtas Bihar | 0 |
| Dhruv | Dhruvi | 0 |
| Venkat Prakash Verma | Venkat P Verma | 1 |
* Loss: [BinaryCrossEntropyLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
```json
{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 617 evaluation samples
* Columns: sentence1, sentence2, and label
* Approximate statistics based on the first 617 samples:
| | sentence1 | sentence2 | label |
|:--------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
| type | string | string | int |
| details | Yamini Durga Fernandes | Roy Yamini Durga | 0 |
| C/O Ramesh Yadav Village Bairiya Post Bairiya Ballia | Village Bairiya C/O Ramesh Yadav Post Bairiya Ballia | 1 |
| Flat 5 Lotus Tower Brigade Road Bengaluru | Flat 6 Lotus Tower Brigade Road Bangalore | 0 |
* Loss: [BinaryCrossEntropyLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
```json
{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 256
- `per_device_eval_batch_size`: 32
- `learning_rate`: 2e-05
- `weight_decay`: 0.01
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `use_cpu`: True
- `bf16`: True
- `half_precision_backend`: cpu_amp
- `load_best_model_at_end`: True
- `dataloader_pin_memory`: False
#### All Hyperparameters