bm25 / README.md

artyomboyko

Update README.md

fd5f426 verified about 1 month ago

1.48 kB

license: apache-2.0
language:
  - multilingual
  - ar
  - da
  - nl
  - en
  - fi
  - fr
  - de
  - el
  - hu
  - it
  - 'no'
  - pt
  - ro
  - ru
  - es
  - sv
  - ta
  - tr
pipeline_tag: sentence-similarity

Repository with files to perform BM25 searches with FastEmbed.

BM25 (Best Matching 25) is a ranking function used by search engines to estimate the relevance of documents to a given search query.

Usage

Note: This model is supposed to be used with Qdrant. Vectors have to be configured with Modifier.IDF.

Here's an example of BM25 with FastEmbed.

from fastembed import SparseTextEmbedding

documents = [
    "You should stay, study and sprint.",
    "History can only prepare us to be surprised yet again.",
]

model = SparseTextEmbedding(model_name="Qdrant/bm25")
embeddings = list(model.embed(documents))

# [
#     SparseEmbedding(
#         values=array([1.67419738, 1.67419738, 1.67419738, 1.67419738]),
#         indices=array([171321964, 1881538586, 150760872, 1932363795])),
#     SparseEmbedding(values=array(
#         [1.66973021, 1.66973021, 1.66973021, 1.66973021, 1.66973021]),
#                     indices=array([
#                         578407224, 1849833631, 1008800696, 2090661150,
#                         1117393019
#                     ]))
# ]