| | --- |
| | license: apache-2.0 |
| | language: |
| | - multilingual |
| | - ar |
| | - da |
| | - nl |
| | - en |
| | - fi |
| | - fr |
| | - de |
| | - el |
| | - hu |
| | - it |
| | - 'no' |
| | - pt |
| | - ro |
| | - ru |
| | - es |
| | - sv |
| | - ta |
| | - tr |
| | pipeline_tag: sentence-similarity |
| | --- |
| | Repository with files to perform BM25 searches with [FastEmbed](https://github.com/qdrant/fastembed). |
| |
|
| | [BM25 (Best Matching 25)](https://en.wikipedia.org/wiki/Okapi_BM25) is a ranking function used by search engines to estimate the relevance of documents to a given search query. |
| |
|
| | ### Usage |
| |
|
| | > Note: |
| | This model is supposed to be used with Qdrant. Vectors have to be configured with [Modifier.IDF](https://qdrant.tech/documentation/concepts/indexing/?q=modifier#idf-modifier). |
| |
|
| | Here's an example of BM25 with [FastEmbed](https://github.com/qdrant/fastembed). |
| |
|
| | ```py |
| | from fastembed import SparseTextEmbedding |
| | |
| | documents = [ |
| | "You should stay, study and sprint.", |
| | "History can only prepare us to be surprised yet again.", |
| | ] |
| | |
| | model = SparseTextEmbedding(model_name="Qdrant/bm25") |
| | embeddings = list(model.embed(documents)) |
| | |
| | # [ |
| | # SparseEmbedding( |
| | # values=array([1.67419738, 1.67419738, 1.67419738, 1.67419738]), |
| | # indices=array([171321964, 1881538586, 150760872, 1932363795])), |
| | # SparseEmbedding(values=array( |
| | # [1.66973021, 1.66973021, 1.66973021, 1.66973021, 1.66973021]), |
| | # indices=array([ |
| | # 578407224, 1849833631, 1008800696, 2090661150, |
| | # 1117393019 |
| | # ])) |
| | # ] |
| | ``` |
| |
|
| |
|
| |
|
| | ``` |