Sentence Similarity
sentence-transformers
Safetensors
Transformers
English
bert
feature-extraction
doping
anti-doping
text-embeddings-inference
Instructions to use timotheeplanes/anti-doping-gte-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use timotheeplanes/anti-doping-gte-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("timotheeplanes/anti-doping-gte-base") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use timotheeplanes/anti-doping-gte-base with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("timotheeplanes/anti-doping-gte-base") model = AutoModel.from_pretrained("timotheeplanes/anti-doping-gte-base") - Notebooks
- Google Colab
- Kaggle
Update README.md
#2
by louisbrulenaudet - opened
README.md
CHANGED
|
@@ -5,14 +5,22 @@ tags:
|
|
| 5 |
- feature-extraction
|
| 6 |
- sentence-similarity
|
| 7 |
- transformers
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
#
|
| 12 |
|
| 13 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Usage (Sentence-Transformers)
|
| 18 |
|
|
@@ -28,7 +36,7 @@ Then you can use the model like this:
|
|
| 28 |
from sentence_transformers import SentenceTransformer
|
| 29 |
sentences = ["This is an example sentence", "Each sentence is converted"]
|
| 30 |
|
| 31 |
-
model = SentenceTransformer(
|
| 32 |
embeddings = model.encode(sentences)
|
| 33 |
print(embeddings)
|
| 34 |
```
|
|
@@ -51,8 +59,8 @@ def cls_pooling(model_output, attention_mask):
|
|
| 51 |
sentences = ['This is an example sentence', 'Each sentence is converted']
|
| 52 |
|
| 53 |
# Load model from HuggingFace Hub
|
| 54 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
| 55 |
-
model = AutoModel.from_pretrained(
|
| 56 |
|
| 57 |
# Tokenize sentences
|
| 58 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
|
@@ -68,15 +76,6 @@ print("Sentence embeddings:")
|
|
| 68 |
print(sentence_embeddings)
|
| 69 |
```
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
## Evaluation Results
|
| 74 |
-
|
| 75 |
-
<!--- Describe how your model was evaluated -->
|
| 76 |
-
|
| 77 |
-
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
|
| 78 |
-
|
| 79 |
-
|
| 80 |
## Training
|
| 81 |
The model was trained with the parameters:
|
| 82 |
|
|
@@ -96,7 +95,6 @@ Parameters of the fit()-Method:
|
|
| 96 |
{
|
| 97 |
"epochs": 1,
|
| 98 |
"evaluation_steps": 0,
|
| 99 |
-
"evaluator": "NoneType",
|
| 100 |
"max_grad_norm": 1,
|
| 101 |
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
|
| 102 |
"optimizer_params": {
|
|
@@ -120,4 +118,13 @@ SentenceTransformer(
|
|
| 120 |
|
| 121 |
## Citing & Authors
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- feature-extraction
|
| 6 |
- sentence-similarity
|
| 7 |
- transformers
|
| 8 |
+
- doping
|
| 9 |
+
- anti-doping
|
| 10 |
+
pretty_name: Domain-adapted GTE for anti-doping practice
|
| 11 |
+
license: apache-2.0
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
library_name: sentence-transformers
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# Domain-adapted GTE for anti-doping practice
|
| 18 |
|
| 19 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
| 20 |
|
| 21 |
+
Pretrained transformers model on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval, semantic textual similarity, text reranking, etc. Fitted using Transformer-based Sequential Denoising Auto-Encoder for unsupervised sentence embedding learning with one objective : anti-doping domain adaptation.
|
| 22 |
+
|
| 23 |
+
This way, the model learns an inner representation of the anti-doping language in the training set that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the model as inputs.
|
| 24 |
|
| 25 |
## Usage (Sentence-Transformers)
|
| 26 |
|
|
|
|
| 36 |
from sentence_transformers import SentenceTransformer
|
| 37 |
sentences = ["This is an example sentence", "Each sentence is converted"]
|
| 38 |
|
| 39 |
+
model = SentenceTransformer("timotheeplanes/anti-doping-gte-base")
|
| 40 |
embeddings = model.encode(sentences)
|
| 41 |
print(embeddings)
|
| 42 |
```
|
|
|
|
| 59 |
sentences = ['This is an example sentence', 'Each sentence is converted']
|
| 60 |
|
| 61 |
# Load model from HuggingFace Hub
|
| 62 |
+
tokenizer = AutoTokenizer.from_pretrained("timotheeplanes/anti-doping-gte-base")
|
| 63 |
+
model = AutoModel.from_pretrained("timotheeplanes/anti-doping-gte-base")
|
| 64 |
|
| 65 |
# Tokenize sentences
|
| 66 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
|
|
|
| 76 |
print(sentence_embeddings)
|
| 77 |
```
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
## Training
|
| 80 |
The model was trained with the parameters:
|
| 81 |
|
|
|
|
| 95 |
{
|
| 96 |
"epochs": 1,
|
| 97 |
"evaluation_steps": 0,
|
|
|
|
| 98 |
"max_grad_norm": 1,
|
| 99 |
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
|
| 100 |
"optimizer_params": {
|
|
|
|
| 118 |
|
| 119 |
## Citing & Authors
|
| 120 |
|
| 121 |
+
If you use this code in your research, please use the following BibTeX entry.
|
| 122 |
+
|
| 123 |
+
```BibTeX
|
| 124 |
+
@misc{louisbrulenaudet2023,
|
| 125 |
+
author = {Brulé Naudet (L.), Planes (T.).},
|
| 126 |
+
title = {Domain-adapted GTE for anti-doping practice},
|
| 127 |
+
year = {2023}
|
| 128 |
+
howpublished = {\url{https://huggingface.co/timotheeplanes/anti-doping-gte-base}},
|
| 129 |
+
}
|
| 130 |
+
```
|