gabrielloiseau
/

CALE-XLM-R

Sentence Similarity

sentence-transformers

feature-extraction

loss:ContrastiveLoss

text-embeddings-inference

Model card Files Files and versions

CALE-XLM-R / README.md

gabrielloiseau's picture

Update README.md

ff90d7f verified 6 months ago

|

history blame contribute delete

1.65 kB

	---
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- loss:ContrastiveLoss
	base_model: FacebookAI/xlm-roberta-large
	pipeline_tag: sentence-similarity
	datasets:
	- gabrielloiseau/CALE-SPCD
	---

	# CALE-XLM-R

	This is a [sentence-transformers](https://www.SBERT.net) model: It maps occurences of a word to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.



	## Usage (Sentence-Transformers)

	```
	pip install -U sentence-transformers
	```

	Then you can use the model like this:

	```python
	from sentence_transformers import SentenceTransformer

	# 1. Load CALE model
	model = SentenceTransformer("gabrielloiseau/CALE-XLM-R")

	sentences = [
	"the boy could easily <t>distinguish</t> the different note values",
	"he patient’s ability to <t>recognize</t> forms and shapes",
	"the government had refused to <t>recognize</t> their autonomy and existence as a state",
	]

	# 2. Calculate embeddings
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 1024]

	# 3. Calculate the embedding similarities
	similarities = model.similarity(embeddings, embeddings)
	print(similarities)
	# tensor([[1.0000, 0.9332, 0.5331],
	# [0.9332, 1.0000, 0.5619],
	# [0.5331, 0.5619, 1.0000]])
	```

	## Full Model Architecture
	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
	(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
	)
	```