Полное обучение
Browse files- README.md +41 -37
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
@@ -24,13 +24,13 @@ model-index:
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
-
value: 0.
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
-
value: 0.
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
-
value: 0.
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
@@ -74,11 +74,11 @@ from sentence_transformers import CrossEncoder
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
-
['
|
| 78 |
-
['
|
| 79 |
-
['
|
| 80 |
-
['
|
| 81 |
-
['
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
@@ -120,9 +120,9 @@ You can finetune this model on your own dataset.
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
-
| **f1_macro** | **0.
|
| 124 |
-
| f1_micro | 0.
|
| 125 |
-
| f1_weighted | 0.
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
@@ -142,19 +142,19 @@ You can finetune this model on your own dataset.
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
-
* Size:
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
-
| details | <ul><li>min:
|
| 152 |
* Samples:
|
| 153 |
-
| sentence_0
|
| 154 |
-
|
| 155 |
-
| <code
|
| 156 |
-
| <code
|
| 157 |
-
| <code
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
@@ -287,24 +287,28 @@ You can finetune this model on your own dataset.
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
-
| Epoch | Step
|
| 291 |
-
|
| 292 |
-
| 0.
|
| 293 |
-
| 0.
|
| 294 |
-
| 0.
|
| 295 |
-
| 0.
|
| 296 |
-
| 0.
|
| 297 |
-
| 0.
|
| 298 |
-
| 0.
|
| 299 |
-
| 0.
|
| 300 |
-
| 0.
|
| 301 |
-
| 0.
|
| 302 |
-
| 0.
|
| 303 |
-
| 0.
|
| 304 |
-
| 0.
|
| 305 |
-
| 0.
|
| 306 |
-
| 0.
|
| 307 |
-
| 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
|
| 309 |
|
| 310 |
### Framework Versions
|
|
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:82796
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
+
value: 0.9771728083627488
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
+
value: 0.9771739130434782
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
+
value: 0.9771740511285696
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
+
['Панель №6 IgE (Сазан, карп, щука, судак, кефаль, ледяная рыба, пикша, осетр)', 'Сазан, (Cyprinus carpio), IgE, аллерген - e82. Метод: ИФА'],
|
| 78 |
+
['Определение антител класса M (IgM) к цитомегаловирусу (CytomegАlovirus) в крови', 'Бактериологическое исследование гнойного отделяемого: На аэробные и факультативно-анаэробные микроорганизмы. Метод: культуральный'],
|
| 79 |
+
['Исследования уровня бетта-изомеризованного C-концевого телопептида коллагена 1 типа (Beta-Cross laps) в крови', 'Глюкоза, в венозной крови'],
|
| 80 |
+
['Посев кала на диарогенные эшерихиозы (E. coli), закл., Кал', 'Коклюш (Bordetella pertussis): Антитела: IgG, (количественно). Метод: ИФА'],
|
| 81 |
+
['Ультразвуковое исследование поджелудочной железы (детям)', 'УЗИ поджелудочной железы, для детей'],
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
+
| **f1_macro** | **0.9772** |
|
| 124 |
+
| f1_micro | 0.9772 |
|
| 125 |
+
| f1_weighted | 0.9772 |
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
+
* Size: 82,796 training samples
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
+
| details | <ul><li>min: 4 characters</li><li>mean: 66.18 characters</li><li>max: 504 characters</li></ul> | <ul><li>min: 3 characters</li><li>mean: 62.27 characters</li><li>max: 385 characters</li></ul> | <ul><li>0: ~50.60%</li><li>1: ~49.40%</li></ul> |
|
| 152 |
* Samples:
|
| 153 |
+
| sentence_0 | sentence_1 | label |
|
| 154 |
+
|:---------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 155 |
+
| <code>Панель №6 IgE (Сазан, карп, щука, судак, кефаль, ледяная рыба, пикша, осетр)</code> | <code>Сазан, (Cyprinus carpio), IgE, аллерген - e82. Метод: ИФА</code> | <code>1</code> |
|
| 156 |
+
| <code>Определение антител класса M (IgM) к цитомегаловирусу (CytomegАlovirus) в крови</code> | <code>Бактериологическое исследование гнойного отделяемого: На аэробные и факультативно-анаэробные микроорганизмы. Метод: культуральный</code> | <code>0</code> |
|
| 157 |
+
| <code>Исследования уровня бетта-изомеризованного C-концевого телопептида коллагена 1 типа (Beta-Cross laps) в крови</code> | <code>Глюкоза, в венозной крови</code> | <code>0</code> |
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
+
| Epoch | Step | Training Loss | softmax_accuracy_eval_f1_macro |
|
| 291 |
+
|:------:|:-----:|:-------------:|:------------------------------:|
|
| 292 |
+
| 0.0483 | 500 | 0.5573 | - |
|
| 293 |
+
| 0.0966 | 1000 | 0.2189 | - |
|
| 294 |
+
| 0.1449 | 1500 | 0.2144 | - |
|
| 295 |
+
| 0.1932 | 2000 | 0.1876 | 0.9683 |
|
| 296 |
+
| 0.2415 | 2500 | 0.1812 | - |
|
| 297 |
+
| 0.2899 | 3000 | 0.1657 | - |
|
| 298 |
+
| 0.3382 | 3500 | 0.1796 | - |
|
| 299 |
+
| 0.3865 | 4000 | 0.1592 | 0.9702 |
|
| 300 |
+
| 0.4348 | 4500 | 0.156 | - |
|
| 301 |
+
| 0.4831 | 5000 | 0.1491 | - |
|
| 302 |
+
| 0.5314 | 5500 | 0.1555 | - |
|
| 303 |
+
| 0.5797 | 6000 | 0.1216 | 0.9683 |
|
| 304 |
+
| 0.6280 | 6500 | 0.1276 | - |
|
| 305 |
+
| 0.6763 | 7000 | 0.1305 | - |
|
| 306 |
+
| 0.7246 | 7500 | 0.1156 | - |
|
| 307 |
+
| 0.7729 | 8000 | 0.1197 | 0.9759 |
|
| 308 |
+
| 0.8213 | 8500 | 0.1215 | - |
|
| 309 |
+
| 0.8696 | 9000 | 0.1065 | - |
|
| 310 |
+
| 0.9179 | 9500 | 0.0896 | - |
|
| 311 |
+
| 0.9662 | 10000 | 0.1135 | 0.9772 |
|
| 312 |
|
| 313 |
|
| 314 |
### Framework Versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1436163192
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ed49e2e839ea10f57c59bb591de5c8959cc56706354a02d0c30591220f7f0da1
|
| 3 |
size 1436163192
|