rasyosef
/

splade-mini

Feature Extraction

sentence-transformers

Generated from Trainer

dataset_size:1000000

loss:SpladeLoss

loss:SparseMarginMSELoss

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

rasyosef commited on Jul 20, 2025

Commit

ea1c4e6

·

verified ·

1 Parent(s): 57d9bd8

Update README.md

Files changed (1) hide show

README.md +18 -1

README.md CHANGED Viewed

@@ -137,7 +137,24 @@ datasets:
 # SPLADE-BERT-Mini-Distil
-This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [prajjwal1/bert-mini](https://huggingface.co/prajjwal1/bert-mini) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space   and can be used for semantic search and sparse retrieval.
 ## Usage

 # SPLADE-BERT-Mini-Distil
+This is a SPLADE sparse retrieval model based on BERT-Mini (11M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was [ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2).
+This tiny SPLADE model is `6x` smaller than Naver's official `splade-v3-distilbert` while having `85%` of it's performance on the MSMARCO benchmark. This model is small enough to be used without a GPU on a dataset of a few thousand documents.
+- `Collection:` https://huggingface.co/collections/rasyosef/splade-tiny-msmarco-687c548c0691d95babf65b70
+- `Distillation Dataset:` https://huggingface.co/datasets/yosefw/msmarco-train-distil-v2
+- `Code:` https://github.com/rasyosef/splade-tiny-msmarco
+## Performance
+The splade models were evaluated on 55 thousand queries and 8 million documents from the [MSMARCO](https://huggingface.co/datasets/microsoft/ms_marco) dataset.
+||Size (# Params)|MRR@10 (MS MARCO dev)|
+|:---|:----|:-------------------|
+|`BM25`|-|18.6|-|-|
+|`rasyosef/splade-tiny`|4.4M|30.8|
+|`rasyosef/splade-mini`|11.2M|32.8|
+|`naver/splade-v3-distilbert`|67.0M|38.7|
 ## Usage