ce-final-task / README.md
yoriis's picture
Add new CrossEncoder model
55f9fe6 verified
metadata
tags:
  - sentence-transformers
  - cross-encoder
  - generated_from_trainer
  - dataset_size:1674
  - loss:BinaryCrossEntropyLoss
base_model: yoriis/ce-final
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
  - accuracy
  - accuracy_threshold
  - f1
  - f1_threshold
  - precision
  - recall
  - average_precision
model-index:
  - name: CrossEncoder based on yoriis/ce-final
    results:
      - task:
          type: cross-encoder-classification
          name: Cross Encoder Classification
        dataset:
          name: eval
          type: eval
        metrics:
          - type: accuracy
            value: 0.9247311827956989
            name: Accuracy
          - type: accuracy_threshold
            value: 0.6233179569244385
            name: Accuracy Threshold
          - type: f1
            value: 0.9435483870967742
            name: F1
          - type: f1_threshold
            value: 0.4820916950702667
            name: F1 Threshold
          - type: precision
            value: 0.936
            name: Precision
          - type: recall
            value: 0.9512195121951219
            name: Recall
          - type: average_precision
            value: 0.981435073118385
            name: Average Precision

CrossEncoder based on yoriis/ce-final

This is a Cross Encoder model finetuned from yoriis/ce-final using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: yoriis/ce-final
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 1 label

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the ๐Ÿค— Hub
model = CrossEncoder("yoriis/ce-final-task")
# Get scores for pairs of texts
pairs = [
    ['ู‡ู„ ุฐูƒุฑ ุงู„ู‚ุฑุขู† ุฃูˆู‚ุงุช ู†ูˆู… ู…ุนูŠู‘ู†ุฉุŸ', 'ุฃู„ู… ุชุฑ ุฅู„ู‰ ุฑุจูƒ ูƒูŠู ู…ุฏ ุงู„ุธู„ ูˆู„ูˆ ุดุงุก ู„ุฌุนู„ู‡ ุณุงูƒู†ุง ุซู… ุฌุนู„ู†ุง ุงู„ุดู…ุณ ุนู„ูŠู‡ ุฏู„ูŠู„ุง. ุซู… ู‚ุจุถู†ุงู‡ ุฅู„ูŠู†ุง ู‚ุจุถุง ูŠุณูŠุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฌุนู„ ู„ูƒู… ุงู„ู„ูŠู„ ู„ุจุงุณุง ูˆุงู„ู†ูˆู… ุณุจุงุชุง ูˆุฌุนู„ ุงู„ู†ู‡ุงุฑ ู†ุดูˆุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฃุฑุณู„ ุงู„ุฑูŠุงุญ ุจุดุฑุง ุจูŠู† ูŠุฏูŠ ุฑุญู…ุชู‡ ูˆุฃู†ุฒู„ู†ุง ู…ู† ุงู„ุณู…ุงุก ู…ุงุก ุทู‡ูˆุฑุง. ู„ู†ุญูŠูŠ ุจู‡ ุจู„ุฏุฉ ู…ูŠุชุง ูˆู†ุณู‚ูŠู‡ ู…ู…ุง ุฎู„ู‚ู†ุง ุฃู†ุนุงู…ุง ูˆุฃู†ุงุณูŠ ูƒุซูŠุฑุง. ูˆู„ู‚ุฏ ุตุฑูู†ุงู‡ ุจูŠู†ู‡ู… ู„ูŠุฐูƒุฑูˆุง ูุฃุจู‰ ุฃูƒุซุฑ ุงู„ู†ุงุณ ุฅู„ุง ูƒููˆุฑุง. ูˆู„ูˆ ุดุฆู†ุง ู„ุจุนุซู†ุง ููŠ ูƒู„ ู‚ุฑูŠุฉ ู†ุฐูŠุฑุง. ูู„ุง ุชุทุน ุงู„ูƒุงูุฑูŠู† ูˆุฌุงู‡ุฏู‡ู… ุจู‡ ุฌู‡ุงุฏุง ูƒุจูŠุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ู…ุฑุฌ ุงู„ุจุญุฑูŠู† ู‡ุฐุง ุนุฐุจ ูุฑุงุช ูˆู‡ุฐุง ู…ู„ุญ ุฃุฌุงุฌ ูˆุฌุนู„ ุจูŠู†ู‡ู…ุง ุจุฑุฒุฎุง ูˆุญุฌุฑุง ู…ุญุฌูˆุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฎู„ู‚ ู…ู† ุงู„ู…ุงุก ุจุดุฑุง ูุฌุนู„ู‡ ู†ุณุจุง ูˆุตู‡ุฑุง ูˆูƒุงู† ุฑุจูƒ ู‚ุฏูŠุฑุง. ูˆูŠุนุจุฏูˆู† ู…ู† ุฏูˆู† ุงู„ู„ู‡ ู…ุง ู„ุง ูŠู†ูุนู‡ู… ูˆู„ุง ูŠุถุฑู‡ู… ูˆูƒุงู† ุงู„ูƒุงูุฑ ุนู„ู‰ ุฑุจู‡ ุธู‡ูŠุฑุง.'],
    ['ู…ุง ุงู„ุฏู„ุงุฆู„ ุนู„ู‰ ุฃู† ุงู„ู‚ุฑุขู† ุฃูู†ุฒู„ ู„ุฌู…ูŠุน ุงู„ู†ุงุณ ุฃูˆ ุฌู…ูŠุน ุงู„ุฃู…ู… ูˆุงู„ุดุนูˆุจุŸ', 'ูˆุจุงู„ุญู‚ ุฃู†ุฒู„ู†ุงู‡ ูˆุจุงู„ุญู‚ ู†ุฒู„ ูˆู…ุง ุฃุฑุณู„ู†ุงูƒ ุฅู„ุง ู…ุจุดุฑุง ูˆู†ุฐูŠุฑุง. ูˆู‚ุฑุขู†ุง ูุฑู‚ู†ุงู‡ ู„ุชู‚ุฑุฃู‡ ุนู„ู‰ ุงู„ู†ุงุณ ุนู„ู‰ ู…ูƒุซ ูˆู†ุฒู„ู†ุงู‡ ุชู†ุฒูŠู„ุง. ู‚ู„ ุขู…ู†ูˆุง ุจู‡ ุฃูˆ ู„ุง ุชุคู…ู†ูˆุง ุฅู† ุงู„ุฐูŠู† ุฃูˆุชูˆุง ุงู„ุนู„ู… ู…ู† ู‚ุจู„ู‡ ุฅุฐุง ูŠุชู„ู‰ ุนู„ูŠู‡ู… ูŠุฎุฑูˆู† ู„ู„ุฃุฐู‚ุงู† ุณุฌุฏุง. ูˆูŠู‚ูˆู„ูˆู† ุณุจุญุงู† ุฑุจู†ุง ุฅู† ูƒุงู† ูˆุนุฏ ุฑุจู†ุง ู„ู…ูุนูˆู„ุง. ูˆูŠุฎุฑูˆู† ู„ู„ุฃุฐู‚ุงู† ูŠุจูƒูˆู† ูˆูŠุฒูŠุฏู‡ู… ุฎุดูˆุนุง. ู‚ู„ ุงุฏุนูˆุง ุงู„ู„ู‡ ุฃูˆ ุงุฏุนูˆุง ุงู„ุฑุญู…ู† ุฃูŠุง ู…ุง ุชุฏุนูˆุง ูู„ู‡ ุงู„ุฃุณู…ุงุก ุงู„ุญุณู†ู‰ ูˆู„ุง ุชุฌู‡ุฑ ุจุตู„ุงุชูƒ ูˆู„ุง ุชุฎุงูุช ุจู‡ุง ูˆุงุจุชุบ ุจูŠู† ุฐู„ูƒ ุณุจูŠู„ุง. ูˆู‚ู„ ุงู„ุญู…ุฏ ู„ู„ู‡ ุงู„ุฐูŠ ู„ู… ูŠุชุฎุฐ ูˆู„ุฏุง ูˆู„ู… ูŠูƒู† ู„ู‡ ุดุฑูŠูƒ ููŠ ุงู„ู…ู„ูƒ ูˆู„ู… ูŠูƒู† ู„ู‡ ูˆู„ูŠ ู…ู† ุงู„ุฐู„ ูˆูƒุจุฑู‡ ุชูƒุจูŠุฑุง.'],
    ['ู…ุง ู‡ูŠ ุนู‚ูˆุจุฉ ู…ู† ูŠุชู‡ู… ุงู„ู…ุฑุฃุฉ ุจุงู„ุฒู†ุง ุจุบูŠุฑ ุฏู„ูŠู„ุŸ', 'ูŠุง ุฃูŠู‡ุง ุงู„ุฐูŠู† ุขู…ู†ูˆุง ุฅุฐุง ุฌุงุกูƒู… ุงู„ู…ุคู…ู†ุงุช ู…ู‡ุงุฌุฑุงุช ูุงู…ุชุญู†ูˆู‡ู† ุงู„ู„ู‡ ุฃุนู„ู… ุจุฅูŠู…ุงู†ู‡ู† ูุฅู† ุนู„ู…ุชู…ูˆู‡ู† ู…ุคู…ู†ุงุช ูู„ุง ุชุฑุฌุนูˆู‡ู† ุฅู„ู‰ ุงู„ูƒูุงุฑ ู„ุง ู‡ู† ุญู„ ู„ู‡ู… ูˆู„ุง ู‡ู… ูŠุญู„ูˆู† ู„ู‡ู† ูˆุขุชูˆู‡ู… ู…ุง ุฃู†ูู‚ูˆุง ูˆู„ุง ุฌู†ุงุญ ุนู„ูŠูƒู… ุฃู† ุชู†ูƒุญูˆู‡ู† ุฅุฐุง ุขุชูŠุชู…ูˆู‡ู† ุฃุฌูˆุฑู‡ู† ูˆู„ุง ุชู…ุณูƒูˆุง ุจุนุตู… ุงู„ูƒูˆุงูุฑ ูˆุงุณุฃู„ูˆุง ู…ุง ุฃู†ูู‚ุชู… ูˆู„ูŠุณุฃู„ูˆุง ู…ุง ุฃู†ูู‚ูˆุง ุฐู„ูƒู… ุญูƒู… ุงู„ู„ู‡ ูŠุญูƒู… ุจูŠู†ูƒู… ูˆุงู„ู„ู‡ ุนู„ูŠู… ุญูƒูŠู…. ูˆุฅู† ูุงุชูƒู… ุดูŠุก ู…ู† ุฃุฒูˆุงุฌูƒู… ุฅู„ู‰ ุงู„ูƒูุงุฑ ูุนุงู‚ุจุชู… ูุขุชูˆุง ุงู„ุฐูŠู† ุฐู‡ุจุช ุฃุฒูˆุงุฌู‡ู… ู…ุซู„ ู…ุง ุฃู†ูู‚ูˆุง ูˆุงุชู‚ูˆุง ุงู„ู„ู‡ ุงู„ุฐูŠ ุฃู†ุชู… ุจู‡ ู…ุคู…ู†ูˆู†.'],
    ['ู‡ู„ ู‡ู†ุงูƒ ุฏู„ูŠู„ ุฃูˆ ุฃุฏู„ุฉ ุนู„ู‰ ูˆุญุฏุฉ ุงู„ุฃุฏูŠุงู† ุงู„ุชูŠ ุฃู†ุฒู„ู‡ุง ุงู„ู„ู‡ุŸ', 'ู‚ู„ ุขู…ู†ุง ุจุงู„ู„ู‡ ูˆู…ุง ุฃู†ุฒู„ ุนู„ูŠู†ุง ูˆู…ุง ุฃู†ุฒู„ ุนู„ู‰ ุฅุจุฑุงู‡ูŠู… ูˆุฅุณู…ุงุนูŠู„ ูˆุฅุณุญุงู‚ ูˆูŠุนู‚ูˆุจ ูˆุงู„ุฃุณุจุงุท ูˆู…ุง ุฃูˆุชูŠ ู…ูˆุณู‰ ูˆุนูŠุณู‰ ูˆุงู„ู†ุจูŠูˆู† ู…ู† ุฑุจู‡ู… ู„ุง ู†ูุฑู‚ ุจูŠู† ุฃุญุฏ ู…ู†ู‡ู… ูˆู†ุญู† ู„ู‡ ู…ุณู„ู…ูˆู†. ูˆู…ู† ูŠุจุชุบ ุบูŠุฑ ุงู„ุฅุณู„ุงู… ุฏูŠู†ุง ูู„ู† ูŠู‚ุจู„ ู…ู†ู‡ ูˆู‡ูˆ ููŠ ุงู„ุขุฎุฑุฉ ู…ู† ุงู„ุฎุงุณุฑูŠู†.'],
    ['ู…ุงุฐุง ุชุดู…ู„ ุงู„ุฃู…ุงู†ุฉ ููŠ ุงู„ู‚ุฑุขู†ุŸ', 'ูŠุง ุฃูŠู‡ุง ุงู„ุฐูŠู† ุขู…ู†ูˆุง ู„ุง ูŠุญู„ ู„ูƒู… ุฃู† ุชุฑุซูˆุง ุงู„ู†ุณุงุก ูƒุฑู‡ุง ูˆู„ุง ุชุนุถู„ูˆู‡ู† ู„ุชุฐู‡ุจูˆุง ุจุจุนุถ ู…ุง ุขุชูŠุชู…ูˆู‡ู† ุฅู„ุง ุฃู† ูŠุฃุชูŠู† ุจูุงุญุดุฉ ู…ุจูŠู†ุฉ ูˆุนุงุดุฑูˆู‡ู† ุจุงู„ู…ุนุฑูˆู ูุฅู† ูƒุฑู‡ุชู…ูˆู‡ู† ูุนุณู‰ ุฃู† ุชูƒุฑู‡ูˆุง ุดูŠุฆุง ูˆูŠุฌุนู„ ุงู„ู„ู‡ ููŠู‡ ุฎูŠุฑุง ูƒุซูŠุฑุง. ูˆุฅู† ุฃุฑุฏุชู… ุงุณุชุจุฏุงู„ ุฒูˆุฌ ู…ูƒุงู† ุฒูˆุฌ ูˆุขุชูŠุชู… ุฅุญุฏุงู‡ู† ู‚ู†ุทุงุฑุง ูู„ุง ุชุฃุฎุฐูˆุง ู…ู†ู‡ ุดูŠุฆุง ุฃุชุฃุฎุฐูˆู†ู‡ ุจู‡ุชุงู†ุง ูˆุฅุซู…ุง ู…ุจูŠู†ุง. ูˆูƒูŠู ุชุฃุฎุฐูˆู†ู‡ ูˆู‚ุฏ ุฃูุถู‰ ุจุนุถูƒู… ุฅู„ู‰ ุจุนุถ ูˆุฃุฎุฐู† ู…ู†ูƒู… ู…ูŠุซุงู‚ุง ุบู„ูŠุธุง.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'ู‡ู„ ุฐูƒุฑ ุงู„ู‚ุฑุขู† ุฃูˆู‚ุงุช ู†ูˆู… ู…ุนูŠู‘ู†ุฉุŸ',
    [
        'ุฃู„ู… ุชุฑ ุฅู„ู‰ ุฑุจูƒ ูƒูŠู ู…ุฏ ุงู„ุธู„ ูˆู„ูˆ ุดุงุก ู„ุฌุนู„ู‡ ุณุงูƒู†ุง ุซู… ุฌุนู„ู†ุง ุงู„ุดู…ุณ ุนู„ูŠู‡ ุฏู„ูŠู„ุง. ุซู… ู‚ุจุถู†ุงู‡ ุฅู„ูŠู†ุง ู‚ุจุถุง ูŠุณูŠุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฌุนู„ ู„ูƒู… ุงู„ู„ูŠู„ ู„ุจุงุณุง ูˆุงู„ู†ูˆู… ุณุจุงุชุง ูˆุฌุนู„ ุงู„ู†ู‡ุงุฑ ู†ุดูˆุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฃุฑุณู„ ุงู„ุฑูŠุงุญ ุจุดุฑุง ุจูŠู† ูŠุฏูŠ ุฑุญู…ุชู‡ ูˆุฃู†ุฒู„ู†ุง ู…ู† ุงู„ุณู…ุงุก ู…ุงุก ุทู‡ูˆุฑุง. ู„ู†ุญูŠูŠ ุจู‡ ุจู„ุฏุฉ ู…ูŠุชุง ูˆู†ุณู‚ูŠู‡ ู…ู…ุง ุฎู„ู‚ู†ุง ุฃู†ุนุงู…ุง ูˆุฃู†ุงุณูŠ ูƒุซูŠุฑุง. ูˆู„ู‚ุฏ ุตุฑูู†ุงู‡ ุจูŠู†ู‡ู… ู„ูŠุฐูƒุฑูˆุง ูุฃุจู‰ ุฃูƒุซุฑ ุงู„ู†ุงุณ ุฅู„ุง ูƒููˆุฑุง. ูˆู„ูˆ ุดุฆู†ุง ู„ุจุนุซู†ุง ููŠ ูƒู„ ู‚ุฑูŠุฉ ู†ุฐูŠุฑุง. ูู„ุง ุชุทุน ุงู„ูƒุงูุฑูŠู† ูˆุฌุงู‡ุฏู‡ู… ุจู‡ ุฌู‡ุงุฏุง ูƒุจูŠุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ู…ุฑุฌ ุงู„ุจุญุฑูŠู† ู‡ุฐุง ุนุฐุจ ูุฑุงุช ูˆู‡ุฐุง ู…ู„ุญ ุฃุฌุงุฌ ูˆุฌุนู„ ุจูŠู†ู‡ู…ุง ุจุฑุฒุฎุง ูˆุญุฌุฑุง ู…ุญุฌูˆุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฎู„ู‚ ู…ู† ุงู„ู…ุงุก ุจุดุฑุง ูุฌุนู„ู‡ ู†ุณุจุง ูˆุตู‡ุฑุง ูˆูƒุงู† ุฑุจูƒ ู‚ุฏูŠุฑุง. ูˆูŠุนุจุฏูˆู† ู…ู† ุฏูˆู† ุงู„ู„ู‡ ู…ุง ู„ุง ูŠู†ูุนู‡ู… ูˆู„ุง ูŠุถุฑู‡ู… ูˆูƒุงู† ุงู„ูƒุงูุฑ ุนู„ู‰ ุฑุจู‡ ุธู‡ูŠุฑุง.',
        'ูˆุจุงู„ุญู‚ ุฃู†ุฒู„ู†ุงู‡ ูˆุจุงู„ุญู‚ ู†ุฒู„ ูˆู…ุง ุฃุฑุณู„ู†ุงูƒ ุฅู„ุง ู…ุจุดุฑุง ูˆู†ุฐูŠุฑุง. ูˆู‚ุฑุขู†ุง ูุฑู‚ู†ุงู‡ ู„ุชู‚ุฑุฃู‡ ุนู„ู‰ ุงู„ู†ุงุณ ุนู„ู‰ ู…ูƒุซ ูˆู†ุฒู„ู†ุงู‡ ุชู†ุฒูŠู„ุง. ู‚ู„ ุขู…ู†ูˆุง ุจู‡ ุฃูˆ ู„ุง ุชุคู…ู†ูˆุง ุฅู† ุงู„ุฐูŠู† ุฃูˆุชูˆุง ุงู„ุนู„ู… ู…ู† ู‚ุจู„ู‡ ุฅุฐุง ูŠุชู„ู‰ ุนู„ูŠู‡ู… ูŠุฎุฑูˆู† ู„ู„ุฃุฐู‚ุงู† ุณุฌุฏุง. ูˆูŠู‚ูˆู„ูˆู† ุณุจุญุงู† ุฑุจู†ุง ุฅู† ูƒุงู† ูˆุนุฏ ุฑุจู†ุง ู„ู…ูุนูˆู„ุง. ูˆูŠุฎุฑูˆู† ู„ู„ุฃุฐู‚ุงู† ูŠุจูƒูˆู† ูˆูŠุฒูŠุฏู‡ู… ุฎุดูˆุนุง. ู‚ู„ ุงุฏุนูˆุง ุงู„ู„ู‡ ุฃูˆ ุงุฏุนูˆุง ุงู„ุฑุญู…ู† ุฃูŠุง ู…ุง ุชุฏุนูˆุง ูู„ู‡ ุงู„ุฃุณู…ุงุก ุงู„ุญุณู†ู‰ ูˆู„ุง ุชุฌู‡ุฑ ุจุตู„ุงุชูƒ ูˆู„ุง ุชุฎุงูุช ุจู‡ุง ูˆุงุจุชุบ ุจูŠู† ุฐู„ูƒ ุณุจูŠู„ุง. ูˆู‚ู„ ุงู„ุญู…ุฏ ู„ู„ู‡ ุงู„ุฐูŠ ู„ู… ูŠุชุฎุฐ ูˆู„ุฏุง ูˆู„ู… ูŠูƒู† ู„ู‡ ุดุฑูŠูƒ ููŠ ุงู„ู…ู„ูƒ ูˆู„ู… ูŠูƒู† ู„ู‡ ูˆู„ูŠ ู…ู† ุงู„ุฐู„ ูˆูƒุจุฑู‡ ุชูƒุจูŠุฑุง.',
        'ูŠุง ุฃูŠู‡ุง ุงู„ุฐูŠู† ุขู…ู†ูˆุง ุฅุฐุง ุฌุงุกูƒู… ุงู„ู…ุคู…ู†ุงุช ู…ู‡ุงุฌุฑุงุช ูุงู…ุชุญู†ูˆู‡ู† ุงู„ู„ู‡ ุฃุนู„ู… ุจุฅูŠู…ุงู†ู‡ู† ูุฅู† ุนู„ู…ุชู…ูˆู‡ู† ู…ุคู…ู†ุงุช ูู„ุง ุชุฑุฌุนูˆู‡ู† ุฅู„ู‰ ุงู„ูƒูุงุฑ ู„ุง ู‡ู† ุญู„ ู„ู‡ู… ูˆู„ุง ู‡ู… ูŠุญู„ูˆู† ู„ู‡ู† ูˆุขุชูˆู‡ู… ู…ุง ุฃู†ูู‚ูˆุง ูˆู„ุง ุฌู†ุงุญ ุนู„ูŠูƒู… ุฃู† ุชู†ูƒุญูˆู‡ู† ุฅุฐุง ุขุชูŠุชู…ูˆู‡ู† ุฃุฌูˆุฑู‡ู† ูˆู„ุง ุชู…ุณูƒูˆุง ุจุนุตู… ุงู„ูƒูˆุงูุฑ ูˆุงุณุฃู„ูˆุง ู…ุง ุฃู†ูู‚ุชู… ูˆู„ูŠุณุฃู„ูˆุง ู…ุง ุฃู†ูู‚ูˆุง ุฐู„ูƒู… ุญูƒู… ุงู„ู„ู‡ ูŠุญูƒู… ุจูŠู†ูƒู… ูˆุงู„ู„ู‡ ุนู„ูŠู… ุญูƒูŠู…. ูˆุฅู† ูุงุชูƒู… ุดูŠุก ู…ู† ุฃุฒูˆุงุฌูƒู… ุฅู„ู‰ ุงู„ูƒูุงุฑ ูุนุงู‚ุจุชู… ูุขุชูˆุง ุงู„ุฐูŠู† ุฐู‡ุจุช ุฃุฒูˆุงุฌู‡ู… ู…ุซู„ ู…ุง ุฃู†ูู‚ูˆุง ูˆุงุชู‚ูˆุง ุงู„ู„ู‡ ุงู„ุฐูŠ ุฃู†ุชู… ุจู‡ ู…ุคู…ู†ูˆู†.',
        'ู‚ู„ ุขู…ู†ุง ุจุงู„ู„ู‡ ูˆู…ุง ุฃู†ุฒู„ ุนู„ูŠู†ุง ูˆู…ุง ุฃู†ุฒู„ ุนู„ู‰ ุฅุจุฑุงู‡ูŠู… ูˆุฅุณู…ุงุนูŠู„ ูˆุฅุณุญุงู‚ ูˆูŠุนู‚ูˆุจ ูˆุงู„ุฃุณุจุงุท ูˆู…ุง ุฃูˆุชูŠ ู…ูˆุณู‰ ูˆุนูŠุณู‰ ูˆุงู„ู†ุจูŠูˆู† ู…ู† ุฑุจู‡ู… ู„ุง ู†ูุฑู‚ ุจูŠู† ุฃุญุฏ ู…ู†ู‡ู… ูˆู†ุญู† ู„ู‡ ู…ุณู„ู…ูˆู†. ูˆู…ู† ูŠุจุชุบ ุบูŠุฑ ุงู„ุฅุณู„ุงู… ุฏูŠู†ุง ูู„ู† ูŠู‚ุจู„ ู…ู†ู‡ ูˆู‡ูˆ ููŠ ุงู„ุขุฎุฑุฉ ู…ู† ุงู„ุฎุงุณุฑูŠู†.',
        'ูŠุง ุฃูŠู‡ุง ุงู„ุฐูŠู† ุขู…ู†ูˆุง ู„ุง ูŠุญู„ ู„ูƒู… ุฃู† ุชุฑุซูˆุง ุงู„ู†ุณุงุก ูƒุฑู‡ุง ูˆู„ุง ุชุนุถู„ูˆู‡ู† ู„ุชุฐู‡ุจูˆุง ุจุจุนุถ ู…ุง ุขุชูŠุชู…ูˆู‡ู† ุฅู„ุง ุฃู† ูŠุฃุชูŠู† ุจูุงุญุดุฉ ู…ุจูŠู†ุฉ ูˆุนุงุดุฑูˆู‡ู† ุจุงู„ู…ุนุฑูˆู ูุฅู† ูƒุฑู‡ุชู…ูˆู‡ู† ูุนุณู‰ ุฃู† ุชูƒุฑู‡ูˆุง ุดูŠุฆุง ูˆูŠุฌุนู„ ุงู„ู„ู‡ ููŠู‡ ุฎูŠุฑุง ูƒุซูŠุฑุง. ูˆุฅู† ุฃุฑุฏุชู… ุงุณุชุจุฏุงู„ ุฒูˆุฌ ู…ูƒุงู† ุฒูˆุฌ ูˆุขุชูŠุชู… ุฅุญุฏุงู‡ู† ู‚ู†ุทุงุฑุง ูู„ุง ุชุฃุฎุฐูˆุง ู…ู†ู‡ ุดูŠุฆุง ุฃุชุฃุฎุฐูˆู†ู‡ ุจู‡ุชุงู†ุง ูˆุฅุซู…ุง ู…ุจูŠู†ุง. ูˆูƒูŠู ุชุฃุฎุฐูˆู†ู‡ ูˆู‚ุฏ ุฃูุถู‰ ุจุนุถูƒู… ุฅู„ู‰ ุจุนุถ ูˆุฃุฎุฐู† ู…ู†ูƒู… ู…ูŠุซุงู‚ุง ุบู„ูŠุธุง.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Classification

Metric Value
accuracy 0.9247
accuracy_threshold 0.6233
f1 0.9435
f1_threshold 0.4821
precision 0.936
recall 0.9512
average_precision 0.9814

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,674 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 11 characters
    • mean: 41.23 characters
    • max: 201 characters
    • min: 47 characters
    • mean: 381.93 characters
    • max: 1086 characters
    • min: 0.0
    • mean: 0.66
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    ู‡ู„ ุฐูƒุฑ ุงู„ู‚ุฑุขู† ุฃูˆู‚ุงุช ู†ูˆู… ู…ุนูŠู‘ู†ุฉุŸ ุฃู„ู… ุชุฑ ุฅู„ู‰ ุฑุจูƒ ูƒูŠู ู…ุฏ ุงู„ุธู„ ูˆู„ูˆ ุดุงุก ู„ุฌุนู„ู‡ ุณุงูƒู†ุง ุซู… ุฌุนู„ู†ุง ุงู„ุดู…ุณ ุนู„ูŠู‡ ุฏู„ูŠู„ุง. ุซู… ู‚ุจุถู†ุงู‡ ุฅู„ูŠู†ุง ู‚ุจุถุง ูŠุณูŠุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฌุนู„ ู„ูƒู… ุงู„ู„ูŠู„ ู„ุจุงุณุง ูˆุงู„ู†ูˆู… ุณุจุงุชุง ูˆุฌุนู„ ุงู„ู†ู‡ุงุฑ ู†ุดูˆุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฃุฑุณู„ ุงู„ุฑูŠุงุญ ุจุดุฑุง ุจูŠู† ูŠุฏูŠ ุฑุญู…ุชู‡ ูˆุฃู†ุฒู„ู†ุง ู…ู† ุงู„ุณู…ุงุก ู…ุงุก ุทู‡ูˆุฑุง. ู„ู†ุญูŠูŠ ุจู‡ ุจู„ุฏุฉ ู…ูŠุชุง ูˆู†ุณู‚ูŠู‡ ู…ู…ุง ุฎู„ู‚ู†ุง ุฃู†ุนุงู…ุง ูˆุฃู†ุงุณูŠ ูƒุซูŠุฑุง. ูˆู„ู‚ุฏ ุตุฑูู†ุงู‡ ุจูŠู†ู‡ู… ู„ูŠุฐูƒุฑูˆุง ูุฃุจู‰ ุฃูƒุซุฑ ุงู„ู†ุงุณ ุฅู„ุง ูƒููˆุฑุง. ูˆู„ูˆ ุดุฆู†ุง ู„ุจุนุซู†ุง ููŠ ูƒู„ ู‚ุฑูŠุฉ ู†ุฐูŠุฑุง. ูู„ุง ุชุทุน ุงู„ูƒุงูุฑูŠู† ูˆุฌุงู‡ุฏู‡ู… ุจู‡ ุฌู‡ุงุฏุง ูƒุจูŠุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ู…ุฑุฌ ุงู„ุจุญุฑูŠู† ู‡ุฐุง ุนุฐุจ ูุฑุงุช ูˆู‡ุฐุง ู…ู„ุญ ุฃุฌุงุฌ ูˆุฌุนู„ ุจูŠู†ู‡ู…ุง ุจุฑุฒุฎุง ูˆุญุฌุฑุง ู…ุญุฌูˆุฑุง. ูˆู‡ูˆ ุงู„ุฐูŠ ุฎู„ู‚ ู…ู† ุงู„ู…ุงุก ุจุดุฑุง ูุฌุนู„ู‡ ู†ุณุจุง ูˆุตู‡ุฑุง ูˆูƒุงู† ุฑุจูƒ ู‚ุฏูŠุฑุง. ูˆูŠุนุจุฏูˆู† ู…ู† ุฏูˆู† ุงู„ู„ู‡ ู…ุง ู„ุง ูŠู†ูุนู‡ู… ูˆู„ุง ูŠุถุฑู‡ู… ูˆูƒุงู† ุงู„ูƒุงูุฑ ุนู„ู‰ ุฑุจู‡ ุธู‡ูŠุฑุง. 1.0
    ู…ุง ุงู„ุฏู„ุงุฆู„ ุนู„ู‰ ุฃู† ุงู„ู‚ุฑุขู† ุฃูู†ุฒู„ ู„ุฌู…ูŠุน ุงู„ู†ุงุณ ุฃูˆ ุฌู…ูŠุน ุงู„ุฃู…ู… ูˆุงู„ุดุนูˆุจุŸ ูˆุจุงู„ุญู‚ ุฃู†ุฒู„ู†ุงู‡ ูˆุจุงู„ุญู‚ ู†ุฒู„ ูˆู…ุง ุฃุฑุณู„ู†ุงูƒ ุฅู„ุง ู…ุจุดุฑุง ูˆู†ุฐูŠุฑุง. ูˆู‚ุฑุขู†ุง ูุฑู‚ู†ุงู‡ ู„ุชู‚ุฑุฃู‡ ุนู„ู‰ ุงู„ู†ุงุณ ุนู„ู‰ ู…ูƒุซ ูˆู†ุฒู„ู†ุงู‡ ุชู†ุฒูŠู„ุง. ู‚ู„ ุขู…ู†ูˆุง ุจู‡ ุฃูˆ ู„ุง ุชุคู…ู†ูˆุง ุฅู† ุงู„ุฐูŠู† ุฃูˆุชูˆุง ุงู„ุนู„ู… ู…ู† ู‚ุจู„ู‡ ุฅุฐุง ูŠุชู„ู‰ ุนู„ูŠู‡ู… ูŠุฎุฑูˆู† ู„ู„ุฃุฐู‚ุงู† ุณุฌุฏุง. ูˆูŠู‚ูˆู„ูˆู† ุณุจุญุงู† ุฑุจู†ุง ุฅู† ูƒุงู† ูˆุนุฏ ุฑุจู†ุง ู„ู…ูุนูˆู„ุง. ูˆูŠุฎุฑูˆู† ู„ู„ุฃุฐู‚ุงู† ูŠุจูƒูˆู† ูˆูŠุฒูŠุฏู‡ู… ุฎุดูˆุนุง. ู‚ู„ ุงุฏุนูˆุง ุงู„ู„ู‡ ุฃูˆ ุงุฏุนูˆุง ุงู„ุฑุญู…ู† ุฃูŠุง ู…ุง ุชุฏุนูˆุง ูู„ู‡ ุงู„ุฃุณู…ุงุก ุงู„ุญุณู†ู‰ ูˆู„ุง ุชุฌู‡ุฑ ุจุตู„ุงุชูƒ ูˆู„ุง ุชุฎุงูุช ุจู‡ุง ูˆุงุจุชุบ ุจูŠู† ุฐู„ูƒ ุณุจูŠู„ุง. ูˆู‚ู„ ุงู„ุญู…ุฏ ู„ู„ู‡ ุงู„ุฐูŠ ู„ู… ูŠุชุฎุฐ ูˆู„ุฏุง ูˆู„ู… ูŠูƒู† ู„ู‡ ุดุฑูŠูƒ ููŠ ุงู„ู…ู„ูƒ ูˆู„ู… ูŠูƒู† ู„ู‡ ูˆู„ูŠ ู…ู† ุงู„ุฐู„ ูˆูƒุจุฑู‡ ุชูƒุจูŠุฑุง. 1.0
    ู…ุง ู‡ูŠ ุนู‚ูˆุจุฉ ู…ู† ูŠุชู‡ู… ุงู„ู…ุฑุฃุฉ ุจุงู„ุฒู†ุง ุจุบูŠุฑ ุฏู„ูŠู„ุŸ ูŠุง ุฃูŠู‡ุง ุงู„ุฐูŠู† ุขู…ู†ูˆุง ุฅุฐุง ุฌุงุกูƒู… ุงู„ู…ุคู…ู†ุงุช ู…ู‡ุงุฌุฑุงุช ูุงู…ุชุญู†ูˆู‡ู† ุงู„ู„ู‡ ุฃุนู„ู… ุจุฅูŠู…ุงู†ู‡ู† ูุฅู† ุนู„ู…ุชู…ูˆู‡ู† ู…ุคู…ู†ุงุช ูู„ุง ุชุฑุฌุนูˆู‡ู† ุฅู„ู‰ ุงู„ูƒูุงุฑ ู„ุง ู‡ู† ุญู„ ู„ู‡ู… ูˆู„ุง ู‡ู… ูŠุญู„ูˆู† ู„ู‡ู† ูˆุขุชูˆู‡ู… ู…ุง ุฃู†ูู‚ูˆุง ูˆู„ุง ุฌู†ุงุญ ุนู„ูŠูƒู… ุฃู† ุชู†ูƒุญูˆู‡ู† ุฅุฐุง ุขุชูŠุชู…ูˆู‡ู† ุฃุฌูˆุฑู‡ู† ูˆู„ุง ุชู…ุณูƒูˆุง ุจุนุตู… ุงู„ูƒูˆุงูุฑ ูˆุงุณุฃู„ูˆุง ู…ุง ุฃู†ูู‚ุชู… ูˆู„ูŠุณุฃู„ูˆุง ู…ุง ุฃู†ูู‚ูˆุง ุฐู„ูƒู… ุญูƒู… ุงู„ู„ู‡ ูŠุญูƒู… ุจูŠู†ูƒู… ูˆุงู„ู„ู‡ ุนู„ูŠู… ุญูƒูŠู…. ูˆุฅู† ูุงุชูƒู… ุดูŠุก ู…ู† ุฃุฒูˆุงุฌูƒู… ุฅู„ู‰ ุงู„ูƒูุงุฑ ูุนุงู‚ุจุชู… ูุขุชูˆุง ุงู„ุฐูŠู† ุฐู‡ุจุช ุฃุฒูˆุงุฌู‡ู… ู…ุซู„ ู…ุง ุฃู†ูู‚ูˆุง ูˆุงุชู‚ูˆุง ุงู„ู„ู‡ ุงู„ุฐูŠ ุฃู†ุชู… ุจู‡ ู…ุคู…ู†ูˆู†. 0.0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step eval_average_precision
1.0 105 0.9728
2.0 210 0.9801
3.0 315 0.9811
4.0 420 0.9814

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.54.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}