Qdrant
/

bm25

Sentence Similarity

Model card Files Files and versions

bm25 / README.md

artyomboyko's picture

Update README.md

fd5f426 verified about 1 month ago

|

1.48 kB

	---
	license: apache-2.0
	language:
	- multilingual
	- ar
	- da
	- nl
	- en
	- fi
	- fr
	- de
	- el
	- hu
	- it
	- 'no'
	- pt
	- ro
	- ru
	- es
	- sv
	- ta
	- tr
	pipeline_tag: sentence-similarity
	---
	Repository with files to perform BM25 searches with [FastEmbed](https://github.com/qdrant/fastembed).

	[BM25 (Best Matching 25)](https://en.wikipedia.org/wiki/Okapi_BM25) is a ranking function used by search engines to estimate the relevance of documents to a given search query.

	### Usage

	> Note:
	This model is supposed to be used with Qdrant. Vectors have to be configured with [Modifier.IDF](https://qdrant.tech/documentation/concepts/indexing/?q=modifier#idf-modifier).

	Here's an example of BM25 with [FastEmbed](https://github.com/qdrant/fastembed).

	```py
	from fastembed import SparseTextEmbedding

	documents = [
	"You should stay, study and sprint.",
	"History can only prepare us to be surprised yet again.",
	]

	model = SparseTextEmbedding(model_name="Qdrant/bm25")
	embeddings = list(model.embed(documents))

	# [
	# SparseEmbedding(
	# values=array([1.67419738, 1.67419738, 1.67419738, 1.67419738]),
	# indices=array([171321964, 1881538586, 150760872, 1932363795])),
	# SparseEmbedding(values=array(
	# [1.66973021, 1.66973021, 1.66973021, 1.66973021, 1.66973021]),
	# indices=array([
	# 578407224, 1849833631, 1008800696, 2090661150,
	# 1117393019
	# ]))
	# ]
	```



	```