Text Classification
sentence-transformers
Safetensors
xlm-roberta
cross-encoder
reranker
Generated from Trainer
dataset_size:82796
loss:CrossEntropyLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use Chimalpopoka/CrossEncoderRanker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Chimalpopoka/CrossEncoderRanker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Chimalpopoka/CrossEncoderRanker") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Полное обучение
Browse files- README.md +41 -37
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
@@ -24,13 +24,13 @@ model-index:
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
-
value: 0.
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
-
value: 0.
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
-
value: 0.
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
@@ -74,11 +74,11 @@ from sentence_transformers import CrossEncoder
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
-
['
|
| 78 |
-
['
|
| 79 |
-
['
|
| 80 |
-
['
|
| 81 |
-
['
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
@@ -120,9 +120,9 @@ You can finetune this model on your own dataset.
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
-
| **f1_macro** | **0.
|
| 124 |
-
| f1_micro | 0.
|
| 125 |
-
| f1_weighted | 0.
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
@@ -142,19 +142,19 @@ You can finetune this model on your own dataset.
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
-
* Size:
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
-
| details | <ul><li>min:
|
| 152 |
* Samples:
|
| 153 |
-
| sentence_0
|
| 154 |
-
|:--------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:---------------|
|
| 155 |
-
| <code>
|
| 156 |
-
| <code>
|
| 157 |
-
| <code>
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
@@ -287,24 +287,28 @@ You can finetune this model on your own dataset.
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
-
| Epoch | Step
|
| 291 |
-
|:------:|:----:|:-------------:|:------------------------------:|
|
| 292 |
-
| 0.
|
| 293 |
-
| 0.
|
| 294 |
-
| 0.
|
| 295 |
-
| 0.
|
| 296 |
-
| 0.
|
| 297 |
-
| 0.
|
| 298 |
-
| 0.
|
| 299 |
-
| 0.
|
| 300 |
-
| 0.
|
| 301 |
-
| 0.
|
| 302 |
-
| 0.
|
| 303 |
-
| 0.
|
| 304 |
-
| 0.
|
| 305 |
-
| 0.
|
| 306 |
-
| 0.
|
| 307 |
-
| 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
|
| 309 |
|
| 310 |
### Framework Versions
|
|
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:82796
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
+
value: 0.9771728083627488
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
+
value: 0.9771739130434782
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
+
value: 0.9771740511285696
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
+
['Панель №6 IgE (Сазан, карп, щука, судак, кефаль, ледяная рыба, пикша, осетр)', 'Сазан, (Cyprinus carpio), IgE, аллерген - e82. Метод: ИФА'],
|
| 78 |
+
['Определение антител класса M (IgM) к цитомегаловирусу (CytomegАlovirus) в крови', 'Бактериологическое исследование гнойного отделяемого: На аэробные и факультативно-анаэробные микроорганизмы. Метод: культуральный'],
|
| 79 |
+
['Исследования уровня бетта-изомеризованного C-концевого телопептида коллагена 1 типа (Beta-Cross laps) в крови', 'Глюкоза, в венозной крови'],
|
| 80 |
+
['Посев кала на диарогенные эшерихиозы (E. coli), закл., Кал', 'Коклюш (Bordetella pertussis): Антитела: IgG, (количественно). Метод: ИФА'],
|
| 81 |
+
['Ультразвуковое исследование поджелудочной железы (детям)', 'УЗИ поджелудочной железы, для детей'],
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
+
| **f1_macro** | **0.9772** |
|
| 124 |
+
| f1_micro | 0.9772 |
|
| 125 |
+
| f1_weighted | 0.9772 |
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
+
* Size: 82,796 training samples
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
+
| details | <ul><li>min: 4 characters</li><li>mean: 66.18 characters</li><li>max: 504 characters</li></ul> | <ul><li>min: 3 characters</li><li>mean: 62.27 characters</li><li>max: 385 characters</li></ul> | <ul><li>0: ~50.60%</li><li>1: ~49.40%</li></ul> |
|
| 152 |
* Samples:
|
| 153 |
+
| sentence_0 | sentence_1 | label |
|
| 154 |
+
|:---------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 155 |
+
| <code>Панель №6 IgE (Сазан, карп, щука, судак, кефаль, ледяная рыба, пикша, осетр)</code> | <code>Сазан, (Cyprinus carpio), IgE, аллерген - e82. Метод: ИФА</code> | <code>1</code> |
|
| 156 |
+
| <code>Определение антител класса M (IgM) к цитомегаловирусу (CytomegАlovirus) в крови</code> | <code>Бактериологическое исследование гнойного отделяемого: На аэробные и факультативно-анаэробные микроорганизмы. Метод: культуральный</code> | <code>0</code> |
|
| 157 |
+
| <code>Исследования уровня бетта-изомеризованного C-концевого телопептида коллагена 1 типа (Beta-Cross laps) в крови</code> | <code>Глюкоза, в венозной крови</code> | <code>0</code> |
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
+
| Epoch | Step | Training Loss | softmax_accuracy_eval_f1_macro |
|
| 291 |
+
|:------:|:-----:|:-------------:|:------------------------------:|
|
| 292 |
+
| 0.0483 | 500 | 0.5573 | - |
|
| 293 |
+
| 0.0966 | 1000 | 0.2189 | - |
|
| 294 |
+
| 0.1449 | 1500 | 0.2144 | - |
|
| 295 |
+
| 0.1932 | 2000 | 0.1876 | 0.9683 |
|
| 296 |
+
| 0.2415 | 2500 | 0.1812 | - |
|
| 297 |
+
| 0.2899 | 3000 | 0.1657 | - |
|
| 298 |
+
| 0.3382 | 3500 | 0.1796 | - |
|
| 299 |
+
| 0.3865 | 4000 | 0.1592 | 0.9702 |
|
| 300 |
+
| 0.4348 | 4500 | 0.156 | - |
|
| 301 |
+
| 0.4831 | 5000 | 0.1491 | - |
|
| 302 |
+
| 0.5314 | 5500 | 0.1555 | - |
|
| 303 |
+
| 0.5797 | 6000 | 0.1216 | 0.9683 |
|
| 304 |
+
| 0.6280 | 6500 | 0.1276 | - |
|
| 305 |
+
| 0.6763 | 7000 | 0.1305 | - |
|
| 306 |
+
| 0.7246 | 7500 | 0.1156 | - |
|
| 307 |
+
| 0.7729 | 8000 | 0.1197 | 0.9759 |
|
| 308 |
+
| 0.8213 | 8500 | 0.1215 | - |
|
| 309 |
+
| 0.8696 | 9000 | 0.1065 | - |
|
| 310 |
+
| 0.9179 | 9500 | 0.0896 | - |
|
| 311 |
+
| 0.9662 | 10000 | 0.1135 | 0.9772 |
|
| 312 |
|
| 313 |
|
| 314 |
### Framework Versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1436163192
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ed49e2e839ea10f57c59bb591de5c8959cc56706354a02d0c30591220f7f0da1
|
| 3 |
size 1436163192
|