Sentence-Transformers ve E5-Large model aktarımı.

9bbba62 verified about 2 months ago

3.77 kB

	Usage
	=====

	Characteristics of Sparse Encoder models:

	1. Calculates sparse vector representations where most dimensions are zero
	2. Provides efficiency benefits for large-scale retrieval systems due to the sparse nature of embeddings
	3. Often more interpretable than dense embeddings, with non-zero dimensions corresponding to specific tokens
	4. Complementary to dense embeddings, enabling hybrid search systems that combine the strengths of both approaches

	Once you have `installed <../../installation.html>`_ Sentence Transformers, you can easily use Sparse Encoder models:

	.. sidebar:: Documentation

	1. :class:`SparseEncoder <sentence_transformers.sparse_encoder.model.SparseEncoder>`
	2. :meth:`SparseEncoder.encode <sentence_transformers.sparse_encoder.model.SparseEncoder.encode>`
	3. :meth:`SparseEncoder.similarity <sentence_transformers.sparse_encoder.model.SparseEncoder.similarity>`
	4. :meth:`SparseEncoder.sparsity <sentence_transformers.sparse_encoder.model.SparseEncoder.sparsity>`

	::

	from sentence_transformers import SparseEncoder

	# 1. Load a pretrained SparseEncoder model
	model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

	# The sentences to encode
	sentences = [
	"The weather is lovely today.",
	"It's so sunny outside!",
	"He drove to the stadium.",
	]

	# 2. Calculate sparse embeddings by calling model.encode()
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 30522] - sparse representation with vocabulary size dimensions

	# 3. Calculate the embedding similarities (using dot product by default)
	similarities = model.similarity(embeddings, embeddings)
	print(similarities)
	# tensor([[ 35.629, 9.154, 0.098],
	# [ 9.154, 27.478, 0.019],
	# [ 0.098, 0.019, 29.553]])

	# 4. Check sparsity statistics
	stats = SparseEncoder.sparsity(embeddings)
	print(f"Sparsity: {stats['sparsity_ratio']:.2%}") # Typically >99% zeros
	print(f"Avg non-zero dimensions per embedding: {stats['active_dims']:.2f}")

	Prompts
	-------

	Some Sparse Encoder models are trained with specific prompts for different use cases (e.g., queries vs. documents).
	You can use :meth:`SparseEncoder.encode <sentence_transformers.sparse_encoder.model.SparseEncoder.encode>` with the ``prompt_name`` parameter,
	or the convenience methods :meth:`~sentence_transformers.sparse_encoder.model.SparseEncoder.encode_query` and
	:meth:`~sentence_transformers.sparse_encoder.model.SparseEncoder.encode_document`:

	::

	from sentence_transformers import SparseEncoder

	model = SparseEncoder("model-with-prompts")

	# Encode queries and documents with the appropriate prompts
	query_embeddings = model.encode_query(["What is the weather like?"])
	document_embeddings = model.encode_document(["The weather is lovely today."])

	# Equivalent to:
	query_embeddings = model.encode(["What is the weather like?"], prompt_name="query")
	document_embeddings = model.encode(["The weather is lovely today."], prompt_name="document")

	You can inspect or set the available prompts via the ``prompts`` and ``default_prompt_name`` attributes:

	::

	print(model.prompts)
	# {'query': 'query: ', 'document': 'document: '}

	model.default_prompt_name = "query"

	.. toctree::
	:maxdepth: 1
	:caption: Tasks and Advanced Usage

	../../../examples/sparse_encoder/applications/computing_embeddings/README
	../../../examples/sparse_encoder/applications/semantic_textual_similarity/README
	../../../examples/sparse_encoder/applications/semantic_search/README
	../../../examples/sparse_encoder/applications/retrieve_rerank/README
	../../../examples/sparse_encoder/evaluation/README
	efficiency