| Usage |
| ===== |
|
|
| Characteristics of Sparse Encoder models: |
|
|
| 1. Calculates **sparse vector representations** where most dimensions are zero |
| 2. Provides **efficiency benefits** for large-scale retrieval systems due to the sparse nature of embeddings |
| 3. Often **more interpretable** than dense embeddings, with non-zero dimensions corresponding to specific tokens |
| 4. **Complementary to dense embeddings**, enabling hybrid search systems that combine the strengths of both approaches |
|
|
| Once you have `installed <../../installation.html>`_ Sentence Transformers, you can easily use Sparse Encoder models: |
|
|
| .. sidebar:: Documentation |
|
|
| 1. :class:`SparseEncoder <sentence_transformers.sparse_encoder.model.SparseEncoder>` |
| 2. :meth:`SparseEncoder.encode <sentence_transformers.sparse_encoder.model.SparseEncoder.encode>` |
| 3. :meth:`SparseEncoder.similarity <sentence_transformers.sparse_encoder.model.SparseEncoder.similarity>` |
| 4. :meth:`SparseEncoder.sparsity <sentence_transformers.sparse_encoder.model.SparseEncoder.sparsity>` |
|
|
| :: |
|
|
| from sentence_transformers import SparseEncoder |
|
|
| |
| model = SparseEncoder("naver/splade-cocondenser-ensembledistil") |
|
|
| |
| sentences = [ |
| "The weather is lovely today.", |
| "It's so sunny outside!", |
| "He drove to the stadium.", |
| ] |
|
|
| |
| embeddings = model.encode(sentences) |
| print(embeddings.shape) |
| |
|
|
| |
| similarities = model.similarity(embeddings, embeddings) |
| print(similarities) |
| |
| |
| |
|
|
| |
| stats = SparseEncoder.sparsity(embeddings) |
| print(f"Sparsity: {stats['sparsity_ratio']:.2%}") |
| print(f"Avg non-zero dimensions per embedding: {stats['active_dims']:.2f}") |
|
|
| Prompts |
| ------- |
|
|
| Some Sparse Encoder models are trained with specific prompts for different use cases (e.g., queries vs. documents). |
| You can use :meth:`SparseEncoder.encode <sentence_transformers.sparse_encoder.model.SparseEncoder.encode>` with the ``prompt_name`` parameter, |
| or the convenience methods :meth:`~sentence_transformers.sparse_encoder.model.SparseEncoder.encode_query` and |
| :meth:`~sentence_transformers.sparse_encoder.model.SparseEncoder.encode_document`: |
|
|
| :: |
|
|
| from sentence_transformers import SparseEncoder |
|
|
| model = SparseEncoder("model-with-prompts") |
|
|
| |
| query_embeddings = model.encode_query(["What is the weather like?"]) |
| document_embeddings = model.encode_document(["The weather is lovely today."]) |
|
|
| |
| query_embeddings = model.encode(["What is the weather like?"], prompt_name="query") |
| document_embeddings = model.encode(["The weather is lovely today."], prompt_name="document") |
|
|
| You can inspect or set the available prompts via the ``prompts`` and ``default_prompt_name`` attributes: |
|
|
| :: |
|
|
| print(model.prompts) |
| |
|
|
| model.default_prompt_name = "query" |
|
|
| .. toctree:: |
| :maxdepth: 1 |
| :caption: Tasks and Advanced Usage |
|
|
| ../../../examples/sparse_encoder/applications/computing_embeddings/README |
| ../../../examples/sparse_encoder/applications/semantic_textual_similarity/README |
| ../../../examples/sparse_encoder/applications/semantic_search/README |
| ../../../examples/sparse_encoder/applications/retrieve_rerank/README |
| ../../../examples/sparse_encoder/evaluation/README |
| efficiency |
|
|
|
|