Text Classification
sentence-transformers
Safetensors
xlm-roberta
cross-encoder
reranker
Generated from Trainer
dataset_size:82796
loss:CrossEntropyLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use Chimalpopoka/CrossEncoderRanker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Chimalpopoka/CrossEncoderRanker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Chimalpopoka/CrossEncoderRanker") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Только анализы
Browse files- README.md +37 -45
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
@@ -24,13 +24,13 @@ model-index:
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
-
value: 0.
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
-
value: 0.
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
-
value: 0.
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
@@ -74,11 +74,11 @@ from sentence_transformers import CrossEncoder
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
-
['
|
| 78 |
-
['
|
| 79 |
-
['
|
| 80 |
-
['
|
| 81 |
-
['
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
@@ -120,9 +120,9 @@ You can finetune this model on your own dataset.
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
-
| **f1_macro** | **0.
|
| 124 |
-
| f1_micro | 0.
|
| 125 |
-
| f1_weighted | 0.
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
@@ -142,19 +142,19 @@ You can finetune this model on your own dataset.
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
-
* Size:
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
-
| details | <ul><li>min:
|
| 152 |
* Samples:
|
| 153 |
-
| sentence_0
|
| 154 |
-
|:----------------------------------------------------------------------------------------------------------------------------
|
| 155 |
-
| <code>
|
| 156 |
-
| <code>
|
| 157 |
-
| <code>
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
@@ -287,32 +287,24 @@ You can finetune this model on your own dataset.
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
-
| Epoch | Step
|
| 291 |
-
|:------:|:----
|
| 292 |
-
| 0.
|
| 293 |
-
| 0.
|
| 294 |
-
| 0.
|
| 295 |
-
| 0.
|
| 296 |
-
| 0.
|
| 297 |
-
| 0.
|
| 298 |
-
| 0.
|
| 299 |
-
| 0.
|
| 300 |
-
| 0.
|
| 301 |
-
| 0.
|
| 302 |
-
| 0.
|
| 303 |
-
| 0.
|
| 304 |
-
| 0.
|
| 305 |
-
| 0.
|
| 306 |
-
| 0.
|
| 307 |
-
| 0.
|
| 308 |
-
| 0.6871 | 8500 | 0.1507 | - |
|
| 309 |
-
| 0.7276 | 9000 | 0.1396 | - |
|
| 310 |
-
| 0.7680 | 9500 | 0.1285 | - |
|
| 311 |
-
| 0.8084 | 10000 | 0.1504 | 0.9735 |
|
| 312 |
-
| 0.8488 | 10500 | 0.1507 | - |
|
| 313 |
-
| 0.8892 | 11000 | 0.1346 | - |
|
| 314 |
-
| 0.9297 | 11500 | 0.119 | - |
|
| 315 |
-
| 0.9701 | 12000 | 0.1144 | 0.9736 |
|
| 316 |
|
| 317 |
|
| 318 |
### Framework Versions
|
|
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:64800
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
+
value: 0.9854186637285883
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
+
value: 0.9854186918483544
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
+
value: 0.9854186327968456
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
+
['Яичный белок (f1) IgE, ImmunoCAP', 'Яичный белок, IgE, аллерген - f1. Метод: ImmunoCAP'],
|
| 78 |
+
['АТ к легионелле IgA Legionella pneumophila IgA) детское отделение', 'Посев на микрофлору (урогенитальный тракт), женщины, качественно. Метод: ИФА'],
|
| 79 |
+
['Определение ДНК вируса гепатита B (HepАtitis B virus) в крови методом ПЦР, количественное исследование', 'Хламидия пневмония (Chlamydia pneumoniae). Антитела IgA, количественно. Метод: ИФА'],
|
| 80 |
+
['Соматотропный гормон (СТГ) (кровь-сыворотка)', 'Соматотропный гормон'],
|
| 81 |
+
['на яйца остриц (Enterobius vermicularis)', 'Яйца остриц (Enterobius vermicularis), перианальные складки, качественно. Метод: ИФА'],
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
+
| **f1_macro** | **0.9854** |
|
| 124 |
+
| f1_micro | 0.9854 |
|
| 125 |
+
| f1_weighted | 0.9854 |
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
+
* Size: 64,800 training samples
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
+
| details | <ul><li>min: 5 characters</li><li>mean: 64.06 characters</li><li>max: 480 characters</li></ul> | <ul><li>min: 3 characters</li><li>mean: 65.64 characters</li><li>max: 442 characters</li></ul> | <ul><li>0: ~49.70%</li><li>1: ~50.30%</li></ul> |
|
| 152 |
* Samples:
|
| 153 |
+
| sentence_0 | sentence_1 | label |
|
| 154 |
+
|:--------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:---------------|
|
| 155 |
+
| <code>Яичный белок (f1) IgE, ImmunoCAP</code> | <code>Яичный белок, IgE, аллерген - f1. Метод: ImmunoCAP</code> | <code>1</code> |
|
| 156 |
+
| <code>АТ к легионелле IgA Legionella pneumophila IgA) детское отделение</code> | <code>Посев на микрофлору (урогенитальный тракт), женщины, качественно. Метод: ИФА</code> | <code>0</code> |
|
| 157 |
+
| <code>Определение ДНК вируса гепатита B (HepАtitis B virus) в крови методом ПЦР, количественное исследование</code> | <code>Хламидия пневмония (Chlamydia pneumoniae). Антитела IgA, количественно. Метод: ИФА</code> | <code>0</code> |
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
+
| Epoch | Step | Training Loss | softmax_accuracy_eval_f1_macro |
|
| 291 |
+
|:------:|:----:|:-------------:|:------------------------------:|
|
| 292 |
+
| 0.0617 | 500 | 0.4739 | - |
|
| 293 |
+
| 0.1235 | 1000 | 0.1631 | - |
|
| 294 |
+
| 0.1852 | 1500 | 0.1357 | - |
|
| 295 |
+
| 0.2469 | 2000 | 0.113 | 0.9804 |
|
| 296 |
+
| 0.3086 | 2500 | 0.1382 | - |
|
| 297 |
+
| 0.3704 | 3000 | 0.1105 | - |
|
| 298 |
+
| 0.4321 | 3500 | 0.1154 | - |
|
| 299 |
+
| 0.4938 | 4000 | 0.1136 | 0.9822 |
|
| 300 |
+
| 0.5556 | 4500 | 0.0976 | - |
|
| 301 |
+
| 0.6173 | 5000 | 0.0861 | - |
|
| 302 |
+
| 0.6790 | 5500 | 0.0933 | - |
|
| 303 |
+
| 0.7407 | 6000 | 0.0927 | 0.9842 |
|
| 304 |
+
| 0.8025 | 6500 | 0.0841 | - |
|
| 305 |
+
| 0.8642 | 7000 | 0.0764 | - |
|
| 306 |
+
| 0.9259 | 7500 | 0.0824 | - |
|
| 307 |
+
| 0.9877 | 8000 | 0.0667 | 0.9854 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
|
| 309 |
|
| 310 |
### Framework Versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1436163192
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09d83545dac14a4094a170564c150db2caede6bac78dc9404b4bc576877d3b3d
|
| 3 |
size 1436163192
|