| This model has been trained without supervision following the approach described in [Towards Unsupervised Dense Information Retrieval with Contrastive Learning](https://arxiv.org/abs/2112.09118). The associated GitHub repository is available here https://github.com/facebookresearch/contriever. | |
| ## Usage (HuggingFace Transformers) | |
| Using the model directly available in HuggingFace transformers requires to add a mean pooling operation to obtain a sentence embedding. | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModel | |
| tokenizer = AutoTokenizer.from_pretrained('facebook/contriever') | |
| model = AutoModel.from_pretrained('facebook/contriever') | |
| sentences = [ | |
| "Where was Marie Curie born?", | |
| "Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.", | |
| "Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace." | |
| ] | |
| # Apply tokenizer | |
| inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') | |
| # Compute token embeddings | |
| outputs = model(**inputs) | |
| # Mean pooling | |
| def mean_pooling(token_embeddings, mask): | |
| token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.) | |
| sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None] | |
| return sentence_embeddings | |
| embeddings = mean_pooling(outputs[0], inputs['attention_mask']) | |
| ``` |