| | --- |
| | pipeline_tag: sentence-similarity |
| | tags: |
| | - sentence-transformers |
| | - causal-lm |
| | license: |
| | - cc-by-sa-4.0 |
| | --- |
| | |
| | # TODO: Name of Model |
| |
|
| | TODO: Description |
| |
|
| | ## Model Description |
| | TODO: Add relevant content |
| |
|
| | (0) Base Transformer Type: RobertaModel |
| |
|
| | (1) Pooling mean |
| |
|
| |
|
| | ## Usage (Sentence-Transformers) |
| |
|
| | Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed: |
| |
|
| | ``` |
| | pip install -U sentence-transformers |
| | ``` |
| |
|
| | Then you can use the model like this: |
| |
|
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | sentences = ["This is an example sentence"] |
| | |
| | model = SentenceTransformer(TODO) |
| | embeddings = model.encode(sentences) |
| | print(embeddings) |
| | ``` |
| |
|
| |
|
| | ## Usage (HuggingFace Transformers) |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModel |
| | import torch |
| | |
| | # The next step is optional if you want your own pooling function. |
| | # Max Pooling - Take the max value over time for every dimension. |
| | def max_pooling(model_output, attention_mask): |
| | token_embeddings = model_output[0] #First element of model_output contains all token embeddings |
| | input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() |
| | token_embeddings[input_mask_expanded == 0] = -1e9 # Set padding tokens to large negative value |
| | max_over_time = torch.max(token_embeddings, 1)[0] |
| | return max_over_time |
| | |
| | # Sentences we want sentence embeddings for |
| | sentences = ['This is an example sentence'] |
| | |
| | # Load model from HuggingFace Hub |
| | tokenizer = AutoTokenizer.from_pretrained(TODO) |
| | model = AutoModel.from_pretrained(TODO) |
| | |
| | # Tokenize sentences |
| | encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')) |
| | |
| | # Compute token embeddings |
| | with torch.no_grad(): |
| | model_output = model(**encoded_input) |
| | |
| | # Perform pooling. In this case, max pooling. |
| | sentence_embeddings = max_pooling(model_output, encoded_input['attention_mask']) |
| | |
| | print("Sentence embeddings:") |
| | print(sentence_embeddings) |
| | ``` |
| |
|
| |
|
| |
|
| | ## TODO: Training Procedure |
| |
|
| | ## TODO: Evaluation Results |
| |
|
| | ## TODO: Citing & Authors |
| |
|