Update README.md
Browse files
README.md
CHANGED
|
@@ -78,16 +78,16 @@ widget:
|
|
| 78 |
olmayan şeydir (teknoloji tutkunlarından ayrı olarak), yüksek volatilite dışında.
|
| 79 |
Güvenilir bir işlem yeteneği tamamen eksikliği.'
|
| 80 |
datasets:
|
| 81 |
-
-
|
| 82 |
-
-
|
| 83 |
-
-
|
| 84 |
-
-
|
| 85 |
-
-
|
| 86 |
-
-
|
| 87 |
-
-
|
| 88 |
-
-
|
| 89 |
-
-
|
| 90 |
-
-
|
| 91 |
pipeline_tag: sentence-similarity
|
| 92 |
library_name: sentence-transformers
|
| 93 |
metrics:
|
|
@@ -338,7 +338,7 @@ model-index:
|
|
| 338 |
|
| 339 |
# SentenceTransformer
|
| 340 |
|
| 341 |
-
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [msmarco-tr](https://huggingface.co/datasets/
|
| 342 |
|
| 343 |
## Model Details
|
| 344 |
|
|
@@ -349,14 +349,14 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained on the [m
|
|
| 349 |
- **Output Dimensionality:** 768 dimensions
|
| 350 |
- **Similarity Function:** Cosine Similarity
|
| 351 |
- **Training Datasets:**
|
| 352 |
-
- [msmarco-tr](https://huggingface.co/datasets/
|
| 353 |
-
- [fiqa-tr](https://huggingface.co/datasets/
|
| 354 |
-
- [scifact-tr](https://huggingface.co/datasets/
|
| 355 |
-
- [nfcorpus-tr](https://huggingface.co/datasets/
|
| 356 |
-
- [multinli-tr](https://huggingface.co/datasets/
|
| 357 |
-
- [snli-tr](https://huggingface.co/datasets/
|
| 358 |
-
- [stsb-tr](https://huggingface.co/datasets/
|
| 359 |
-
- [wmt16](https://huggingface.co/datasets/
|
| 360 |
<!-- - **Language:** Unknown -->
|
| 361 |
<!-- - **License:** Unknown -->
|
| 362 |
|
|
@@ -390,7 +390,7 @@ Then you can load this model and run inference.
|
|
| 390 |
from sentence_transformers import SentenceTransformer
|
| 391 |
|
| 392 |
# Download from the 🤗 Hub
|
| 393 |
-
model = SentenceTransformer("
|
| 394 |
# Run inference
|
| 395 |
sentences = [
|
| 396 |
'Stoklara nasıl yatırım yapabilirim?',
|
|
@@ -480,7 +480,7 @@ You can finetune this model on your own dataset.
|
|
| 480 |
|
| 481 |
#### msmarco-tr
|
| 482 |
|
| 483 |
-
* Dataset: [msmarco-tr](https://huggingface.co/datasets/
|
| 484 |
* Size: 253,304 training samples
|
| 485 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 486 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -506,7 +506,7 @@ You can finetune this model on your own dataset.
|
|
| 506 |
|
| 507 |
#### fiqa-tr
|
| 508 |
|
| 509 |
-
* Dataset: [fiqa-tr](https://huggingface.co/datasets/
|
| 510 |
* Size: 14,166 training samples
|
| 511 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 512 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -532,7 +532,7 @@ You can finetune this model on your own dataset.
|
|
| 532 |
|
| 533 |
#### scifact-tr
|
| 534 |
|
| 535 |
-
* Dataset: [scifact-tr](https://huggingface.co/datasets/
|
| 536 |
* Size: 919 training samples
|
| 537 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 538 |
* Approximate statistics based on the first 919 samples:
|
|
@@ -558,7 +558,7 @@ You can finetune this model on your own dataset.
|
|
| 558 |
|
| 559 |
#### nfcorpus-tr
|
| 560 |
|
| 561 |
-
* Dataset: [nfcorpus-tr](https://huggingface.co/datasets/
|
| 562 |
* Size: 110,575 training samples
|
| 563 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 564 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -584,7 +584,7 @@ You can finetune this model on your own dataset.
|
|
| 584 |
|
| 585 |
#### multinli-tr
|
| 586 |
|
| 587 |
-
* Dataset: [multinli-tr](https://huggingface.co/datasets/
|
| 588 |
* Size: 392,702 training samples
|
| 589 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 590 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -604,7 +604,7 @@ You can finetune this model on your own dataset.
|
|
| 604 |
|
| 605 |
#### snli-tr
|
| 606 |
|
| 607 |
-
* Dataset: [snli-tr](https://huggingface.co/datasets/
|
| 608 |
* Size: 550,152 training samples
|
| 609 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 610 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -624,7 +624,7 @@ You can finetune this model on your own dataset.
|
|
| 624 |
|
| 625 |
#### stsb-tr
|
| 626 |
|
| 627 |
-
* Dataset: [stsb-tr](https://huggingface.co/datasets/
|
| 628 |
* Size: 5,740 training samples
|
| 629 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
| 630 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -650,7 +650,7 @@ You can finetune this model on your own dataset.
|
|
| 650 |
|
| 651 |
#### wmt16
|
| 652 |
|
| 653 |
-
* Dataset: [wmt16](https://huggingface.co/datasets/
|
| 654 |
* Size: 205,756 training samples
|
| 655 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 656 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -678,7 +678,7 @@ You can finetune this model on your own dataset.
|
|
| 678 |
|
| 679 |
#### msmarco-tr
|
| 680 |
|
| 681 |
-
* Dataset: [msmarco-tr](https://huggingface.co/datasets/
|
| 682 |
* Size: 31,538 evaluation samples
|
| 683 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 684 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -704,7 +704,7 @@ You can finetune this model on your own dataset.
|
|
| 704 |
|
| 705 |
#### fiqa-tr
|
| 706 |
|
| 707 |
-
* Dataset: [fiqa-tr](https://huggingface.co/datasets/
|
| 708 |
* Size: 1,238 evaluation samples
|
| 709 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 710 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -730,7 +730,7 @@ You can finetune this model on your own dataset.
|
|
| 730 |
|
| 731 |
#### quora-tr
|
| 732 |
|
| 733 |
-
* Dataset: [quora-tr](https://huggingface.co/datasets/
|
| 734 |
* Size: 7,626 evaluation samples
|
| 735 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 736 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -756,7 +756,7 @@ You can finetune this model on your own dataset.
|
|
| 756 |
|
| 757 |
#### nfcorpus-tr
|
| 758 |
|
| 759 |
-
* Dataset: [nfcorpus-tr](https://huggingface.co/datasets/
|
| 760 |
* Size: 11,385 evaluation samples
|
| 761 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 762 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -782,7 +782,7 @@ You can finetune this model on your own dataset.
|
|
| 782 |
|
| 783 |
#### snli-tr
|
| 784 |
|
| 785 |
-
* Dataset: [snli-tr](https://huggingface.co/datasets/
|
| 786 |
* Size: 10,000 evaluation samples
|
| 787 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 788 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -802,7 +802,7 @@ You can finetune this model on your own dataset.
|
|
| 802 |
|
| 803 |
#### xnli-tr
|
| 804 |
|
| 805 |
-
* Dataset: [xnli-tr](https://huggingface.co/datasets/
|
| 806 |
* Size: 2,490 evaluation samples
|
| 807 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 808 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -822,7 +822,7 @@ You can finetune this model on your own dataset.
|
|
| 822 |
|
| 823 |
#### stsb-tr
|
| 824 |
|
| 825 |
-
* Dataset: [stsb-tr](https://huggingface.co/datasets/
|
| 826 |
* Size: 1,496 evaluation samples
|
| 827 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
| 828 |
* Approximate statistics based on the first 1000 samples:
|
|
@@ -848,7 +848,7 @@ You can finetune this model on your own dataset.
|
|
| 848 |
|
| 849 |
#### wmt16
|
| 850 |
|
| 851 |
-
* Dataset: [wmt16](https://huggingface.co/datasets/
|
| 852 |
* Size: 1,001 evaluation samples
|
| 853 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 854 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 78 |
olmayan şeydir (teknoloji tutkunlarından ayrı olarak), yüksek volatilite dışında.
|
| 79 |
Güvenilir bir işlem yeteneği tamamen eksikliği.'
|
| 80 |
datasets:
|
| 81 |
+
- trmteb/msmarco-tr_fine_tuning_dataset
|
| 82 |
+
- trmteb/fiqa-tr_fine_tuning_dataset
|
| 83 |
+
- trmteb/scifact-tr_fine_tuning_dataset
|
| 84 |
+
- trmteb/nfcorpus-tr_fine_tuning_dataset
|
| 85 |
+
- trmteb/multinli_tr_fine_tuning_dataset
|
| 86 |
+
- trmteb/snli_tr_fine_tuning_dataset
|
| 87 |
+
- trmteb/stsb-tr
|
| 88 |
+
- trmteb/wmt16_en_tr_fine_tuning_dataset
|
| 89 |
+
- trmteb/quora-tr_fine_tuning_dataset
|
| 90 |
+
- trmteb/xnli_tr_fine_tuning_dataset
|
| 91 |
pipeline_tag: sentence-similarity
|
| 92 |
library_name: sentence-transformers
|
| 93 |
metrics:
|
|
|
|
| 338 |
|
| 339 |
# SentenceTransformer
|
| 340 |
|
| 341 |
+
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [msmarco-tr](https://huggingface.co/datasets/trmteb/msmarco-tr_fine_tuning_dataset), [fiqa-tr](https://huggingface.co/datasets/trmteb/fiqa-tr_fine_tuning_dataset), [scifact-tr](https://huggingface.co/datasets/trmteb/scifact-tr_fine_tuning_dataset), [nfcorpus-tr](https://huggingface.co/datasets/trmteb/nfcorpus-tr_fine_tuning_dataset), [multinli-tr](https://huggingface.co/datasets/trmteb/multinli_tr_fine_tuning_dataset), [snli-tr](https://huggingface.co/datasets/trmteb/snli_tr_fine_tuning_dataset), [stsb-tr](https://huggingface.co/datasets/trmteb/stsb-tr) and [wmt16](https://huggingface.co/datasets/trmteb/wmt16_en_tr_fine_tuning_dataset) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 342 |
|
| 343 |
## Model Details
|
| 344 |
|
|
|
|
| 349 |
- **Output Dimensionality:** 768 dimensions
|
| 350 |
- **Similarity Function:** Cosine Similarity
|
| 351 |
- **Training Datasets:**
|
| 352 |
+
- [msmarco-tr](https://huggingface.co/datasets/trmteb/msmarco-tr_fine_tuning_dataset)
|
| 353 |
+
- [fiqa-tr](https://huggingface.co/datasets/trmteb/fiqa-tr_fine_tuning_dataset)
|
| 354 |
+
- [scifact-tr](https://huggingface.co/datasets/trmteb/scifact-tr_fine_tuning_dataset)
|
| 355 |
+
- [nfcorpus-tr](https://huggingface.co/datasets/trmteb/nfcorpus-tr_fine_tuning_dataset)
|
| 356 |
+
- [multinli-tr](https://huggingface.co/datasets/trmteb/multinli_tr_fine_tuning_dataset)
|
| 357 |
+
- [snli-tr](https://huggingface.co/datasets/trmteb/snli_tr_fine_tuning_dataset)
|
| 358 |
+
- [stsb-tr](https://huggingface.co/datasets/trmteb/stsb-tr)
|
| 359 |
+
- [wmt16](https://huggingface.co/datasets/trmteb/wmt16_en_tr_fine_tuning_dataset)
|
| 360 |
<!-- - **Language:** Unknown -->
|
| 361 |
<!-- - **License:** Unknown -->
|
| 362 |
|
|
|
|
| 390 |
from sentence_transformers import SentenceTransformer
|
| 391 |
|
| 392 |
# Download from the 🤗 Hub
|
| 393 |
+
model = SentenceTransformer("trmteb/turkish_embedding_model_fine_tuned")
|
| 394 |
# Run inference
|
| 395 |
sentences = [
|
| 396 |
'Stoklara nasıl yatırım yapabilirim?',
|
|
|
|
| 480 |
|
| 481 |
#### msmarco-tr
|
| 482 |
|
| 483 |
+
* Dataset: [msmarco-tr](https://huggingface.co/datasets/trmteb/msmarco-tr_fine_tuning_dataset) at [f03d837](https://huggingface.co/datasets/trmteb/msmarco-tr_fine_tuning_dataset/tree/f03d83704e5ea276665384ca6d8bee3b19632c80)
|
| 484 |
* Size: 253,304 training samples
|
| 485 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 486 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 506 |
|
| 507 |
#### fiqa-tr
|
| 508 |
|
| 509 |
+
* Dataset: [fiqa-tr](https://huggingface.co/datasets/trmteb/fiqa-tr_fine_tuning_dataset) at [bbc9e91](https://huggingface.co/datasets/trmteb/fiqa-tr_fine_tuning_dataset/tree/bbc9e91b5710d0ac4032b5c9e94066470f928c8c)
|
| 510 |
* Size: 14,166 training samples
|
| 511 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 512 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 532 |
|
| 533 |
#### scifact-tr
|
| 534 |
|
| 535 |
+
* Dataset: [scifact-tr](https://huggingface.co/datasets/trmteb/scifact-tr_fine_tuning_dataset) at [382de5b](https://huggingface.co/datasets/trmteb/scifact-tr_fine_tuning_dataset/tree/382de5b316d8c8042a23f34179a73fadc13cb53d)
|
| 536 |
* Size: 919 training samples
|
| 537 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 538 |
* Approximate statistics based on the first 919 samples:
|
|
|
|
| 558 |
|
| 559 |
#### nfcorpus-tr
|
| 560 |
|
| 561 |
+
* Dataset: [nfcorpus-tr](https://huggingface.co/datasets/trmteb/nfcorpus-tr_fine_tuning_dataset) at [22d1ef8](https://huggingface.co/datasets/trmteb/nfcorpus-tr_fine_tuning_dataset/tree/22d1ef8b6a9f1c196d1977541a66ca8eff946f06)
|
| 562 |
* Size: 110,575 training samples
|
| 563 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 564 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 584 |
|
| 585 |
#### multinli-tr
|
| 586 |
|
| 587 |
+
* Dataset: [multinli-tr](https://huggingface.co/datasets/trmteb/multinli_tr_fine_tuning_dataset) at [a700b72](https://huggingface.co/datasets/trmteb/multinli_tr_fine_tuning_dataset/tree/a700b72da7056aa52ceb234d2e8a211d035dc2c7)
|
| 588 |
* Size: 392,702 training samples
|
| 589 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 590 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 604 |
|
| 605 |
#### snli-tr
|
| 606 |
|
| 607 |
+
* Dataset: [snli-tr](https://huggingface.co/datasets/trmteb/snli_tr_fine_tuning_dataset) at [63eb107](https://huggingface.co/datasets/trmteb/snli_tr_fine_tuning_dataset/tree/63eb107dfdaf0b16cfd209db25705f27f2e5e2ca)
|
| 608 |
* Size: 550,152 training samples
|
| 609 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 610 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 624 |
|
| 625 |
#### stsb-tr
|
| 626 |
|
| 627 |
+
* Dataset: [stsb-tr](https://huggingface.co/datasets/trmteb/stsb-tr) at [3d2e87d](https://huggingface.co/datasets/trmteb/stsb-tr/tree/3d2e87d2a94c9af130b87ab8ed8d0c5c2e92e2df)
|
| 628 |
* Size: 5,740 training samples
|
| 629 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
| 630 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 650 |
|
| 651 |
#### wmt16
|
| 652 |
|
| 653 |
+
* Dataset: [wmt16](https://huggingface.co/datasets/trmteb/wmt16_en_tr_fine_tuning_dataset) at [9fc4e73](https://huggingface.co/datasets/trmteb/wmt16_en_tr_fine_tuning_dataset/tree/9fc4e7334bdb195b396c41eed05b0dd447981ef3)
|
| 654 |
* Size: 205,756 training samples
|
| 655 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 656 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 678 |
|
| 679 |
#### msmarco-tr
|
| 680 |
|
| 681 |
+
* Dataset: [msmarco-tr](https://huggingface.co/datasets/trmteb/msmarco-tr_fine_tuning_dataset) at [f03d837](https://huggingface.co/datasets/trmteb/msmarco-tr_fine_tuning_dataset/tree/f03d83704e5ea276665384ca6d8bee3b19632c80)
|
| 682 |
* Size: 31,538 evaluation samples
|
| 683 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 684 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 704 |
|
| 705 |
#### fiqa-tr
|
| 706 |
|
| 707 |
+
* Dataset: [fiqa-tr](https://huggingface.co/datasets/trmteb/fiqa-tr_fine_tuning_dataset) at [bbc9e91](https://huggingface.co/datasets/trmteb/fiqa-tr_fine_tuning_dataset/tree/bbc9e91b5710d0ac4032b5c9e94066470f928c8c)
|
| 708 |
* Size: 1,238 evaluation samples
|
| 709 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 710 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 730 |
|
| 731 |
#### quora-tr
|
| 732 |
|
| 733 |
+
* Dataset: [quora-tr](https://huggingface.co/datasets/trmteb/quora-tr_fine_tuning_dataset) at [6e1eee1](https://huggingface.co/datasets/trmteb/quora-tr_fine_tuning_dataset/tree/6e1eee1e44db0f777eceb1f9b55293a9c2e25d76)
|
| 734 |
* Size: 7,626 evaluation samples
|
| 735 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 736 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 756 |
|
| 757 |
#### nfcorpus-tr
|
| 758 |
|
| 759 |
+
* Dataset: [nfcorpus-tr](https://huggingface.co/datasets/trmteb/nfcorpus-tr_fine_tuning_dataset) at [22d1ef8](https://huggingface.co/datasets/trmteb/nfcorpus-tr_fine_tuning_dataset/tree/22d1ef8b6a9f1c196d1977541a66ca8eff946f06)
|
| 760 |
* Size: 11,385 evaluation samples
|
| 761 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 762 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 782 |
|
| 783 |
#### snli-tr
|
| 784 |
|
| 785 |
+
* Dataset: [snli-tr](https://huggingface.co/datasets/trmteb/snli_tr_fine_tuning_dataset) at [63eb107](https://huggingface.co/datasets/trmteb/snli_tr_fine_tuning_dataset/tree/63eb107dfdaf0b16cfd209db25705f27f2e5e2ca)
|
| 786 |
* Size: 10,000 evaluation samples
|
| 787 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 788 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 802 |
|
| 803 |
#### xnli-tr
|
| 804 |
|
| 805 |
+
* Dataset: [xnli-tr](https://huggingface.co/datasets/trmteb/xnli_tr_fine_tuning_dataset) at [3a66bc8](https://huggingface.co/datasets/trmteb/xnli_tr_fine_tuning_dataset/tree/3a66bc878d3d027177da71f47e4d8dee21cafe63)
|
| 806 |
* Size: 2,490 evaluation samples
|
| 807 |
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
|
| 808 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 822 |
|
| 823 |
#### stsb-tr
|
| 824 |
|
| 825 |
+
* Dataset: [stsb-tr](https://huggingface.co/datasets/trmteb/stsb-tr) at [3d2e87d](https://huggingface.co/datasets/trmteb/stsb-tr/tree/3d2e87d2a94c9af130b87ab8ed8d0c5c2e92e2df)
|
| 826 |
* Size: 1,496 evaluation samples
|
| 827 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
| 828 |
* Approximate statistics based on the first 1000 samples:
|
|
|
|
| 848 |
|
| 849 |
#### wmt16
|
| 850 |
|
| 851 |
+
* Dataset: [wmt16](https://huggingface.co/datasets/trmteb/wmt16_en_tr_fine_tuning_dataset) at [9fc4e73](https://huggingface.co/datasets/trmteb/wmt16_en_tr_fine_tuning_dataset/tree/9fc4e7334bdb195b396c41eed05b0dd447981ef3)
|
| 852 |
* Size: 1,001 evaluation samples
|
| 853 |
* Columns: <code>anchor</code> and <code>positive</code>
|
| 854 |
* Approximate statistics based on the first 1000 samples:
|