Только анализы
Browse files- README.md +37 -45
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
@@ -24,13 +24,13 @@ model-index:
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
-
value: 0.
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
-
value: 0.
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
-
value: 0.
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
@@ -74,11 +74,11 @@ from sentence_transformers import CrossEncoder
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
-
['
|
| 78 |
-
['
|
| 79 |
-
['
|
| 80 |
-
['
|
| 81 |
-
['
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
@@ -120,9 +120,9 @@ You can finetune this model on your own dataset.
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
-
| **f1_macro** | **0.
|
| 124 |
-
| f1_micro | 0.
|
| 125 |
-
| f1_weighted | 0.
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
@@ -142,19 +142,19 @@ You can finetune this model on your own dataset.
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
-
* Size:
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
-
| details | <ul><li>min:
|
| 152 |
* Samples:
|
| 153 |
-
| sentence_0
|
| 154 |
-
|
| 155 |
-
| <code
|
| 156 |
-
| <code
|
| 157 |
-
| <code
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
@@ -287,32 +287,24 @@ You can finetune this model on your own dataset.
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
-
| Epoch | Step
|
| 291 |
-
|
| 292 |
-
| 0.
|
| 293 |
-
| 0.
|
| 294 |
-
| 0.
|
| 295 |
-
| 0.
|
| 296 |
-
| 0.
|
| 297 |
-
| 0.
|
| 298 |
-
| 0.
|
| 299 |
-
| 0.
|
| 300 |
-
| 0.
|
| 301 |
-
| 0.
|
| 302 |
-
| 0.
|
| 303 |
-
| 0.
|
| 304 |
-
| 0.
|
| 305 |
-
| 0.
|
| 306 |
-
| 0.
|
| 307 |
-
| 0.
|
| 308 |
-
| 0.6871 | 8500 | 0.1507 | - |
|
| 309 |
-
| 0.7276 | 9000 | 0.1396 | - |
|
| 310 |
-
| 0.7680 | 9500 | 0.1285 | - |
|
| 311 |
-
| 0.8084 | 10000 | 0.1504 | 0.9735 |
|
| 312 |
-
| 0.8488 | 10500 | 0.1507 | - |
|
| 313 |
-
| 0.8892 | 11000 | 0.1346 | - |
|
| 314 |
-
| 0.9297 | 11500 | 0.119 | - |
|
| 315 |
-
| 0.9701 | 12000 | 0.1144 | 0.9736 |
|
| 316 |
|
| 317 |
|
| 318 |
### Framework Versions
|
|
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:64800
|
| 8 |
- loss:CrossEntropyLoss
|
| 9 |
base_model: deepvk/USER-bge-m3
|
| 10 |
pipeline_tag: text-classification
|
|
|
|
| 24 |
type: softmax_accuracy_eval
|
| 25 |
metrics:
|
| 26 |
- type: f1_macro
|
| 27 |
+
value: 0.9854186637285883
|
| 28 |
name: F1 Macro
|
| 29 |
- type: f1_micro
|
| 30 |
+
value: 0.9854186918483544
|
| 31 |
name: F1 Micro
|
| 32 |
- type: f1_weighted
|
| 33 |
+
value: 0.9854186327968456
|
| 34 |
name: F1 Weighted
|
| 35 |
---
|
| 36 |
|
|
|
|
| 74 |
model = CrossEncoder("Chimalpopoka/CrossEncoderRanker")
|
| 75 |
# Get scores for pairs of texts
|
| 76 |
pairs = [
|
| 77 |
+
['Яичный белок (f1) IgE, ImmunoCAP', 'Яичный белок, IgE, аллерген - f1. Метод: ImmunoCAP'],
|
| 78 |
+
['АТ к легионелле IgA Legionella pneumophila IgA) детское отделение', 'Посев на микрофлору (урогенитальный тракт), женщины, качественно. Метод: ИФА'],
|
| 79 |
+
['Определение ДНК вируса гепатита B (HepАtitis B virus) в крови методом ПЦР, количественное исследование', 'Хламидия пневмония (Chlamydia pneumoniae). Антитела IgA, количественно. Метод: ИФА'],
|
| 80 |
+
['Соматотропный гормон (СТГ) (кровь-сыворотка)', 'Соматотропный гормон'],
|
| 81 |
+
['на яйца остриц (Enterobius vermicularis)', 'Яйца остриц (Enterobius vermicularis), перианальные складки, качественно. Метод: ИФА'],
|
| 82 |
]
|
| 83 |
scores = model.predict(pairs)
|
| 84 |
print(scores.shape)
|
|
|
|
| 120 |
|
| 121 |
| Metric | Value |
|
| 122 |
|:-------------|:-----------|
|
| 123 |
+
| **f1_macro** | **0.9854** |
|
| 124 |
+
| f1_micro | 0.9854 |
|
| 125 |
+
| f1_weighted | 0.9854 |
|
| 126 |
|
| 127 |
<!--
|
| 128 |
## Bias, Risks and Limitations
|
|
|
|
| 142 |
|
| 143 |
#### Unnamed Dataset
|
| 144 |
|
| 145 |
+
* Size: 64,800 training samples
|
| 146 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 147 |
* Approximate statistics based on the first 1000 samples:
|
| 148 |
| | sentence_0 | sentence_1 | label |
|
| 149 |
|:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------|
|
| 150 |
| type | string | string | int |
|
| 151 |
+
| details | <ul><li>min: 5 characters</li><li>mean: 64.06 characters</li><li>max: 480 characters</li></ul> | <ul><li>min: 3 characters</li><li>mean: 65.64 characters</li><li>max: 442 characters</li></ul> | <ul><li>0: ~49.70%</li><li>1: ~50.30%</li></ul> |
|
| 152 |
* Samples:
|
| 153 |
+
| sentence_0 | sentence_1 | label |
|
| 154 |
+
|:--------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:---------------|
|
| 155 |
+
| <code>Яичный белок (f1) IgE, ImmunoCAP</code> | <code>Яичный белок, IgE, аллерген - f1. Метод: ImmunoCAP</code> | <code>1</code> |
|
| 156 |
+
| <code>АТ к легионелле IgA Legionella pneumophila IgA) детское отделение</code> | <code>Посев на микрофлору (урогенитальный тракт), женщины, качественно. Метод: ИФА</code> | <code>0</code> |
|
| 157 |
+
| <code>Определение ДНК вируса гепатита B (HepАtitis B virus) в крови методом ПЦР, количественное исследование</code> | <code>Хламидия пневмония (Chlamydia pneumoniae). Антитела IgA, количественно. Метод: ИФА</code> | <code>0</code> |
|
| 158 |
* Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
|
| 159 |
|
| 160 |
### Training Hyperparameters
|
|
|
|
| 287 |
</details>
|
| 288 |
|
| 289 |
### Training Logs
|
| 290 |
+
| Epoch | Step | Training Loss | softmax_accuracy_eval_f1_macro |
|
| 291 |
+
|:------:|:----:|:-------------:|:------------------------------:|
|
| 292 |
+
| 0.0617 | 500 | 0.4739 | - |
|
| 293 |
+
| 0.1235 | 1000 | 0.1631 | - |
|
| 294 |
+
| 0.1852 | 1500 | 0.1357 | - |
|
| 295 |
+
| 0.2469 | 2000 | 0.113 | 0.9804 |
|
| 296 |
+
| 0.3086 | 2500 | 0.1382 | - |
|
| 297 |
+
| 0.3704 | 3000 | 0.1105 | - |
|
| 298 |
+
| 0.4321 | 3500 | 0.1154 | - |
|
| 299 |
+
| 0.4938 | 4000 | 0.1136 | 0.9822 |
|
| 300 |
+
| 0.5556 | 4500 | 0.0976 | - |
|
| 301 |
+
| 0.6173 | 5000 | 0.0861 | - |
|
| 302 |
+
| 0.6790 | 5500 | 0.0933 | - |
|
| 303 |
+
| 0.7407 | 6000 | 0.0927 | 0.9842 |
|
| 304 |
+
| 0.8025 | 6500 | 0.0841 | - |
|
| 305 |
+
| 0.8642 | 7000 | 0.0764 | - |
|
| 306 |
+
| 0.9259 | 7500 | 0.0824 | - |
|
| 307 |
+
| 0.9877 | 8000 | 0.0667 | 0.9854 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
|
| 309 |
|
| 310 |
### Framework Versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1436163192
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09d83545dac14a4094a170564c150db2caede6bac78dc9404b4bc576877d3b3d
|
| 3 |
size 1436163192
|