Text Ranking
sentence-transformers
PyTorch
JAX
ONNX
Safetensors
OpenVINO
Transformers
English
roberta
text-classification
text-embeddings-inference
Instructions to use cross-encoder/stsb-roberta-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cross-encoder/stsb-roberta-large with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/stsb-roberta-large") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Transformers
How to use cross-encoder/stsb-roberta-large with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("cross-encoder/stsb-roberta-large") model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/stsb-roberta-large") - Notebooks
- Google Colab
- Kaggle
nreimers commited on
Commit ·
6ed5194
1
Parent(s): 2c71c6f
upload
Browse files- CECorrelationEvaluator_sts-dev_results.csv +5 -0
- README.md +19 -0
- config.json +29 -0
- merges.txt +0 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tokenizer_config.json +1 -0
- vocab.json +0 -0
CECorrelationEvaluator_sts-dev_results.csv
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch,steps,Pearson_Correlation,Spearman_Correlation
|
| 2 |
+
0,-1,0.9169803189330423,0.914689659080599
|
| 3 |
+
1,-1,0.9184629642176918,0.9145474423018756
|
| 4 |
+
2,-1,0.9140695557809773,0.911543895130673
|
| 5 |
+
3,-1,0.9156662483528688,0.9134139344421147
|
README.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Cross-Encoder for Quora Duplicate Questions Detection
|
| 2 |
+
This model was trained using [SentenceTransformers](https://sbert.net) [Cross-Encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html) class.
|
| 3 |
+
|
| 4 |
+
## Training Data
|
| 5 |
+
This model was trained on the [STS benchmark dataset](http://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark). The model will predict a score between 0 and 1 how for the semantic similarity of two sentences.
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
## Usage and Performance
|
| 9 |
+
|
| 10 |
+
Pre-trained models can be used like this:
|
| 11 |
+
```
|
| 12 |
+
from sentence_transformers import CrossEncoder
|
| 13 |
+
model = CrossEncoder('model_name')
|
| 14 |
+
scores = model.predict([('Sentence 1', 'Sentence 2'), ('Sentence 3', 'Sentence 4')])
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
The model will predict scores for the pairs `('Sentence 1', 'Sentence 2')` and `('Sentence 3', 'Sentence 4')`.
|
| 18 |
+
|
| 19 |
+
You can use this model also without sentence_transformers and by just using Transformers ``AutoModel`` class
|
config.json
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_num_labels": 3,
|
| 3 |
+
"architectures": [
|
| 4 |
+
"RobertaForSequenceClassification"
|
| 5 |
+
],
|
| 6 |
+
"attention_probs_dropout_prob": 0.1,
|
| 7 |
+
"bos_token_id": 0,
|
| 8 |
+
"eos_token_id": 2,
|
| 9 |
+
"gradient_checkpointing": false,
|
| 10 |
+
"hidden_act": "gelu",
|
| 11 |
+
"hidden_dropout_prob": 0.1,
|
| 12 |
+
"hidden_size": 1024,
|
| 13 |
+
"id2label": {
|
| 14 |
+
"0": "LABEL_0"
|
| 15 |
+
},
|
| 16 |
+
"initializer_range": 0.02,
|
| 17 |
+
"intermediate_size": 4096,
|
| 18 |
+
"label2id": {
|
| 19 |
+
"LABEL_0": 0
|
| 20 |
+
},
|
| 21 |
+
"layer_norm_eps": 1e-05,
|
| 22 |
+
"max_position_embeddings": 514,
|
| 23 |
+
"model_type": "roberta",
|
| 24 |
+
"num_attention_heads": 16,
|
| 25 |
+
"num_hidden_layers": 24,
|
| 26 |
+
"pad_token_id": 1,
|
| 27 |
+
"type_vocab_size": 1,
|
| 28 |
+
"vocab_size": 50265
|
| 29 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:03023f7dcd714c15ff27d534432a80d3bff78c9b50778a44b10585ef5fa7fd25
|
| 3 |
+
size 1421616585
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true}}
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"model_max_length": 512, "special_tokens_map_file": "input-model/roberta-large-mnli/special_tokens_map.json", "full_tokenizer_file": null}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|