File size: 2,506 Bytes
bd33eac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
Usage
=====
Characteristics of Sparse Encoder models:
1. Calculates **sparse vector representations** where most dimensions are zero
2. Provides **efficiency benefits** for large-scale retrieval systems due to the sparse nature of embeddings
3. Often **more interpretable** than dense embeddings, with non-zero dimensions corresponding to specific tokens
4. **Complementary to dense embeddings**, enabling hybrid search systems that combine the strengths of both approaches
Once you have `installed <../../installation.html>`_ Sentence Transformers, you can easily use Sparse Encoder models:
.. sidebar:: Documentation
1. :class:`SparseEncoder <sentence_transformers.sparse_encoder.SparseEncoder>`
2. :meth:`SparseEncoder.encode <sentence_transformers.sparse_encoder.SparseEncoder.encode>`
3. :meth:`SparseEncoder.similarity <sentence_transformers.sparse_encoder.SparseEncoder.similarity>`
4. :meth:`SparseEncoder.sparsity <sentence_transformers.sparse_encoder.SparseEncoder.sparsity>`
::
from sentence_transformers import SparseEncoder
# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522] - sparse representation with vocabulary size dimensions
# 3. Calculate the embedding similarities (using dot product by default)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 35.629, 9.154, 0.098],
# [ 9.154, 27.478, 0.019],
# [ 0.098, 0.019, 29.553]])
# 4. Check sparsity statistics
stats = SparseEncoder.sparsity(embeddings)
print(f"Sparsity: {stats['sparsity_ratio']:.2%}") # Typically >99% zeros
print(f"Avg non-zero dimensions per embedding: {stats['active_dims']:.2f}")
.. toctree::
:maxdepth: 1
:caption: Tasks and Advanced Usage
../../../examples/sparse_encoder/applications/computing_embeddings/README
../../../examples/sparse_encoder/applications/semantic_textual_similarity/README
../../../examples/sparse_encoder/applications/semantic_search/README
../../../examples/sparse_encoder/applications/retrieve_rerank/README
../../../examples/sparse_encoder/evaluation/README
|