e5-small-v2-pruned

This model is a token-embedding pruned version of intfloat/e5-small-v2.

Token-embedding pruning clusters semantically similar tokens in the embedding space (using DBSCAN) and merges each cluster into a single shared embedding, shrinking the vocabulary and reducing memory without retraining the transformer layers.

How to use

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jangedoo/e5-small-v2-pruned", trust_remote_code=True)
embeddings = model.encode(["Hello world", "How are you?"])

Note: trust_remote_code=True is required because the model ships a small custom tokenizer class (pruned_tokenizer.py) that applies the id remapping after tokenization. No additional package installation is needed.

Pruning statistics

Base Pruned Reduction
Vocab size 30,522 22,225 27.18%
Total parameters 33,360,000 30,173,952 9.55%
Embedding parameters 11,720,448 8,534,400 27.18%
Embedding size (MB) 44.7 32.6 12.2 MB saved

Evaluation

Dataset / Metric Base Pruned Relative (base = 1.0)
stsb / stsb_pearson_cosine 0.8317 0.8234 0.9900
stsb / stsb_spearman_cosine 0.8492 0.8386 0.9876
nanobeir / NanoClimateFEVER_cosine_accuracy@1 0.3800 0.3600 0.9474
nanobeir / NanoClimateFEVER_cosine_accuracy@3 0.5400 0.5400 1.0000
nanobeir / NanoClimateFEVER_cosine_accuracy@5 0.5400 0.6000 1.1111
nanobeir / NanoClimateFEVER_cosine_accuracy@10 0.6400 0.6800 1.0625
nanobeir / NanoClimateFEVER_cosine_precision@1 0.3800 0.3600 0.9474
nanobeir / NanoClimateFEVER_cosine_precision@3 0.2000 0.2000 1.0000
nanobeir / NanoClimateFEVER_cosine_precision@5 0.1240 0.1400 1.1290
nanobeir / NanoClimateFEVER_cosine_precision@10 0.0860 0.0920 1.0698
nanobeir / NanoClimateFEVER_cosine_recall@1 0.1590 0.1750 1.1006
nanobeir / NanoClimateFEVER_cosine_recall@3 0.2457 0.2623 1.0678
nanobeir / NanoClimateFEVER_cosine_recall@5 0.2523 0.2930 1.1612
nanobeir / NanoClimateFEVER_cosine_recall@10 0.3507 0.3613 1.0304
nanobeir / NanoClimateFEVER_cosine_ndcg@10 0.3105 0.3301 1.0631
nanobeir / NanoClimateFEVER_cosine_mrr@10 0.4614 0.4614 0.9999
nanobeir / NanoClimateFEVER_cosine_map@100 0.2461 0.2706 1.0994
nanobeir / NanoDBPedia_cosine_accuracy@1 0.7200 0.7200 1.0000
nanobeir / NanoDBPedia_cosine_accuracy@3 0.8800 0.8600 0.9773
nanobeir / NanoDBPedia_cosine_accuracy@5 0.8800 0.9200 1.0455
nanobeir / NanoDBPedia_cosine_accuracy@10 0.9600 0.9600 1.0000
nanobeir / NanoDBPedia_cosine_precision@1 0.7200 0.7200 1.0000
nanobeir / NanoDBPedia_cosine_precision@3 0.6067 0.5800 0.9560
nanobeir / NanoDBPedia_cosine_precision@5 0.5240 0.5440 1.0382
nanobeir / NanoDBPedia_cosine_precision@10 0.4580 0.4500 0.9825
nanobeir / NanoDBPedia_cosine_recall@1 0.1074 0.0996 0.9271
nanobeir / NanoDBPedia_cosine_recall@3 0.1772 0.1736 0.9798
nanobeir / NanoDBPedia_cosine_recall@5 0.2166 0.2328 1.0747
nanobeir / NanoDBPedia_cosine_recall@10 0.3203 0.3081 0.9619
nanobeir / NanoDBPedia_cosine_ndcg@10 0.5891 0.5747 0.9757
nanobeir / NanoDBPedia_cosine_mrr@10 0.8120 0.7969 0.9813
nanobeir / NanoDBPedia_cosine_map@100 0.4573 0.4441 0.9712
nanobeir / NanoFEVER_cosine_accuracy@1 0.7000 0.6600 0.9429
nanobeir / NanoFEVER_cosine_accuracy@3 0.9200 0.8600 0.9348
nanobeir / NanoFEVER_cosine_accuracy@5 0.9600 0.9400 0.9792
nanobeir / NanoFEVER_cosine_accuracy@10 0.9800 0.9800 1.0000
nanobeir / NanoFEVER_cosine_precision@1 0.7000 0.6600 0.9429
nanobeir / NanoFEVER_cosine_precision@3 0.3133 0.3000 0.9574
nanobeir / NanoFEVER_cosine_precision@5 0.2000 0.1960 0.9800
nanobeir / NanoFEVER_cosine_precision@10 0.1020 0.1020 1.0000
nanobeir / NanoFEVER_cosine_recall@1 0.6567 0.6167 0.9391
nanobeir / NanoFEVER_cosine_recall@3 0.8667 0.8167 0.9423
nanobeir / NanoFEVER_cosine_recall@5 0.9167 0.8967 0.9782
nanobeir / NanoFEVER_cosine_recall@10 0.9367 0.9367 1.0000
nanobeir / NanoFEVER_cosine_ndcg@10 0.8179 0.7959 0.9731
nanobeir / NanoFEVER_cosine_mrr@10 0.8067 0.7774 0.9637
nanobeir / NanoFEVER_cosine_map@100 0.7671 0.7397 0.9643
nanobeir / NanoFiQA2018_cosine_accuracy@1 0.3400 0.3200 0.9412
nanobeir / NanoFiQA2018_cosine_accuracy@3 0.5400 0.5000 0.9259
nanobeir / NanoFiQA2018_cosine_accuracy@5 0.6200 0.5600 0.9032
nanobeir / NanoFiQA2018_cosine_accuracy@10 0.7200 0.6600 0.9167
nanobeir / NanoFiQA2018_cosine_precision@1 0.3400 0.3200 0.9412
nanobeir / NanoFiQA2018_cosine_precision@3 0.2533 0.2333 0.9211
nanobeir / NanoFiQA2018_cosine_precision@5 0.1960 0.1800 0.9184
nanobeir / NanoFiQA2018_cosine_precision@10 0.1220 0.1140 0.9344
nanobeir / NanoFiQA2018_cosine_recall@1 0.1850 0.1681 0.9084
nanobeir / NanoFiQA2018_cosine_recall@3 0.3576 0.3378 0.9447
nanobeir / NanoFiQA2018_cosine_recall@5 0.4472 0.4265 0.9537
nanobeir / NanoFiQA2018_cosine_recall@10 0.5463 0.5222 0.9559
nanobeir / NanoFiQA2018_cosine_ndcg@10 0.4303 0.4023 0.9349
nanobeir / NanoFiQA2018_cosine_mrr@10 0.4618 0.4242 0.9186
nanobeir / NanoFiQA2018_cosine_map@100 0.3702 0.3428 0.9259
nanobeir / NanoHotpotQA_cosine_accuracy@1 0.8000 0.8400 1.0500
nanobeir / NanoHotpotQA_cosine_accuracy@3 0.9400 0.9600 1.0213
nanobeir / NanoHotpotQA_cosine_accuracy@5 0.9400 0.9600 1.0213
nanobeir / NanoHotpotQA_cosine_accuracy@10 1.0000 0.9800 0.9800
nanobeir / NanoHotpotQA_cosine_precision@1 0.8000 0.8400 1.0500
nanobeir / NanoHotpotQA_cosine_precision@3 0.5133 0.5200 1.0130
nanobeir / NanoHotpotQA_cosine_precision@5 0.3240 0.3200 0.9877
nanobeir / NanoHotpotQA_cosine_precision@10 0.1800 0.1740 0.9667
nanobeir / NanoHotpotQA_cosine_recall@1 0.4000 0.4200 1.0500
nanobeir / NanoHotpotQA_cosine_recall@3 0.7700 0.7800 1.0130
nanobeir / NanoHotpotQA_cosine_recall@5 0.8100 0.8000 0.9877
nanobeir / NanoHotpotQA_cosine_recall@10 0.9000 0.8700 0.9667
nanobeir / NanoHotpotQA_cosine_ndcg@10 0.8212 0.8150 0.9925
nanobeir / NanoHotpotQA_cosine_mrr@10 0.8722 0.8953 1.0265
nanobeir / NanoHotpotQA_cosine_map@100 0.7578 0.7517 0.9919
nanobeir / NanoMSMARCO_cosine_accuracy@1 0.3800 0.4000 1.0526
nanobeir / NanoMSMARCO_cosine_accuracy@3 0.6400 0.6400 1.0000
nanobeir / NanoMSMARCO_cosine_accuracy@5 0.7200 0.7200 1.0000
nanobeir / NanoMSMARCO_cosine_accuracy@10 0.8200 0.8000 0.9756
nanobeir / NanoMSMARCO_cosine_precision@1 0.3800 0.4000 1.0526
nanobeir / NanoMSMARCO_cosine_precision@3 0.2133 0.2133 1.0000
nanobeir / NanoMSMARCO_cosine_precision@5 0.1440 0.1440 1.0000
nanobeir / NanoMSMARCO_cosine_precision@10 0.0820 0.0800 0.9756
nanobeir / NanoMSMARCO_cosine_recall@1 0.3800 0.4000 1.0526
nanobeir / NanoMSMARCO_cosine_recall@3 0.6400 0.6400 1.0000
nanobeir / NanoMSMARCO_cosine_recall@5 0.7200 0.7200 1.0000
nanobeir / NanoMSMARCO_cosine_recall@10 0.8200 0.8000 0.9756
nanobeir / NanoMSMARCO_cosine_ndcg@10 0.6007 0.5978 0.9951
nanobeir / NanoMSMARCO_cosine_mrr@10 0.5309 0.5329 1.0038
nanobeir / NanoMSMARCO_cosine_map@100 0.5393 0.5441 1.0089
nanobeir / NanoNFCorpus_cosine_accuracy@1 0.3800 0.4000 1.0526
nanobeir / NanoNFCorpus_cosine_accuracy@3 0.5200 0.5000 0.9615
nanobeir / NanoNFCorpus_cosine_accuracy@5 0.5800 0.6000 1.0345
nanobeir / NanoNFCorpus_cosine_accuracy@10 0.6800 0.6800 1.0000
nanobeir / NanoNFCorpus_cosine_precision@1 0.3800 0.4000 1.0526
nanobeir / NanoNFCorpus_cosine_precision@3 0.3600 0.3467 0.9630
nanobeir / NanoNFCorpus_cosine_precision@5 0.3320 0.3240 0.9759
nanobeir / NanoNFCorpus_cosine_precision@10 0.2720 0.2700 0.9926
nanobeir / NanoNFCorpus_cosine_recall@1 0.0214 0.0222 1.0366
nanobeir / NanoNFCorpus_cosine_recall@3 0.0711 0.0723 1.0177
nanobeir / NanoNFCorpus_cosine_recall@5 0.0907 0.0940 1.0368
nanobeir / NanoNFCorpus_cosine_recall@10 0.1301 0.1231 0.9462
nanobeir / NanoNFCorpus_cosine_ndcg@10 0.3243 0.3183 0.9815
nanobeir / NanoNFCorpus_cosine_mrr@10 0.4746 0.4735 0.9978
nanobeir / NanoNFCorpus_cosine_map@100 0.1370 0.1305 0.9525
nanobeir / NanoNQ_cosine_accuracy@1 0.4800 0.4400 0.9167
nanobeir / NanoNQ_cosine_accuracy@3 0.7000 0.5800 0.8286
nanobeir / NanoNQ_cosine_accuracy@5 0.7800 0.7200 0.9231
nanobeir / NanoNQ_cosine_accuracy@10 0.8000 0.7200 0.9000
nanobeir / NanoNQ_cosine_precision@1 0.4800 0.4400 0.9167
nanobeir / NanoNQ_cosine_precision@3 0.2467 0.1933 0.7838
nanobeir / NanoNQ_cosine_precision@5 0.1680 0.1440 0.8571
nanobeir / NanoNQ_cosine_precision@10 0.0860 0.0800 0.9302
nanobeir / NanoNQ_cosine_recall@1 0.4400 0.4100 0.9318
nanobeir / NanoNQ_cosine_recall@3 0.6600 0.5300 0.8030
nanobeir / NanoNQ_cosine_recall@5 0.7400 0.6700 0.9054
nanobeir / NanoNQ_cosine_recall@10 0.7600 0.7100 0.9342
nanobeir / NanoNQ_cosine_ndcg@10 0.6228 0.5679 0.9118
nanobeir / NanoNQ_cosine_mrr@10 0.5956 0.5387 0.9045
nanobeir / NanoNQ_cosine_map@100 0.5790 0.5234 0.9039
nanobeir / NanoQuoraRetrieval_cosine_accuracy@1 0.8200 0.8000 0.9756
nanobeir / NanoQuoraRetrieval_cosine_accuracy@3 0.9800 0.9800 1.0000
nanobeir / NanoQuoraRetrieval_cosine_accuracy@5 0.9800 0.9800 1.0000
nanobeir / NanoQuoraRetrieval_cosine_accuracy@10 1.0000 1.0000 1.0000
nanobeir / NanoQuoraRetrieval_cosine_precision@1 0.8200 0.8000 0.9756
nanobeir / NanoQuoraRetrieval_cosine_precision@3 0.3933 0.3933 1.0000
nanobeir / NanoQuoraRetrieval_cosine_precision@5 0.2480 0.2440 0.9839
nanobeir / NanoQuoraRetrieval_cosine_precision@10 0.1320 0.1300 0.9848
nanobeir / NanoQuoraRetrieval_cosine_recall@1 0.7207 0.7007 0.9722
nanobeir / NanoQuoraRetrieval_cosine_recall@3 0.9320 0.9320 1.0000
nanobeir / NanoQuoraRetrieval_cosine_recall@5 0.9460 0.9427 0.9965
nanobeir / NanoQuoraRetrieval_cosine_recall@10 0.9800 0.9733 0.9932
nanobeir / NanoQuoraRetrieval_cosine_ndcg@10 0.9048 0.8980 0.9924
nanobeir / NanoQuoraRetrieval_cosine_mrr@10 0.8967 0.8900 0.9926
nanobeir / NanoQuoraRetrieval_cosine_map@100 0.8709 0.8645 0.9927
nanobeir / NanoSCIDOCS_cosine_accuracy@1 0.3800 0.4800 1.2632
nanobeir / NanoSCIDOCS_cosine_accuracy@3 0.6800 0.6800 1.0000
nanobeir / NanoSCIDOCS_cosine_accuracy@5 0.7400 0.7600 1.0270
nanobeir / NanoSCIDOCS_cosine_accuracy@10 0.8400 0.8400 1.0000
nanobeir / NanoSCIDOCS_cosine_precision@1 0.3800 0.4800 1.2632
nanobeir / NanoSCIDOCS_cosine_precision@3 0.3400 0.3667 1.0784
nanobeir / NanoSCIDOCS_cosine_precision@5 0.2880 0.2960 1.0278
nanobeir / NanoSCIDOCS_cosine_precision@10 0.1980 0.1860 0.9394
nanobeir / NanoSCIDOCS_cosine_recall@1 0.0797 0.1007 1.2636
nanobeir / NanoSCIDOCS_cosine_recall@3 0.2107 0.2267 1.0759
nanobeir / NanoSCIDOCS_cosine_recall@5 0.2957 0.3037 1.0271
nanobeir / NanoSCIDOCS_cosine_recall@10 0.4077 0.3827 0.9387
nanobeir / NanoSCIDOCS_cosine_ndcg@10 0.3786 0.3816 1.0079
nanobeir / NanoSCIDOCS_cosine_mrr@10 0.5416 0.5926 1.0942
nanobeir / NanoSCIDOCS_cosine_map@100 0.2861 0.2954 1.0325
nanobeir / NanoArguAna_cosine_accuracy@1 0.1600 0.1400 0.8750
nanobeir / NanoArguAna_cosine_accuracy@3 0.5000 0.4600 0.9200
nanobeir / NanoArguAna_cosine_accuracy@5 0.6400 0.5800 0.9062
nanobeir / NanoArguAna_cosine_accuracy@10 0.8000 0.8000 1.0000
nanobeir / NanoArguAna_cosine_precision@1 0.1600 0.1400 0.8750
nanobeir / NanoArguAna_cosine_precision@3 0.1667 0.1533 0.9200
nanobeir / NanoArguAna_cosine_precision@5 0.1280 0.1160 0.9063
nanobeir / NanoArguAna_cosine_precision@10 0.0800 0.0800 1.0000
nanobeir / NanoArguAna_cosine_recall@1 0.1600 0.1400 0.8750
nanobeir / NanoArguAna_cosine_recall@3 0.5000 0.4600 0.9200
nanobeir / NanoArguAna_cosine_recall@5 0.6400 0.5800 0.9062
nanobeir / NanoArguAna_cosine_recall@10 0.8000 0.8000 1.0000
nanobeir / NanoArguAna_cosine_ndcg@10 0.4792 0.4573 0.9542
nanobeir / NanoArguAna_cosine_mrr@10 0.3765 0.3501 0.9297
nanobeir / NanoArguAna_cosine_map@100 0.3828 0.3555 0.9286
nanobeir / NanoSciFact_cosine_accuracy@1 0.5800 0.5800 1.0000
nanobeir / NanoSciFact_cosine_accuracy@3 0.7400 0.6600 0.8919
nanobeir / NanoSciFact_cosine_accuracy@5 0.7800 0.7200 0.9231
nanobeir / NanoSciFact_cosine_accuracy@10 0.8800 0.8400 0.9545
nanobeir / NanoSciFact_cosine_precision@1 0.5800 0.5800 1.0000
nanobeir / NanoSciFact_cosine_precision@3 0.2667 0.2333 0.8750
nanobeir / NanoSciFact_cosine_precision@5 0.1680 0.1600 0.9524
nanobeir / NanoSciFact_cosine_precision@10 0.0980 0.0940 0.9592
nanobeir / NanoSciFact_cosine_recall@1 0.5700 0.5700 1.0000
nanobeir / NanoSciFact_cosine_recall@3 0.7300 0.6400 0.8767
nanobeir / NanoSciFact_cosine_recall@5 0.7700 0.7150 0.9286
nanobeir / NanoSciFact_cosine_recall@10 0.8700 0.8300 0.9540
nanobeir / NanoSciFact_cosine_ndcg@10 0.7260 0.6892 0.9493
nanobeir / NanoSciFact_cosine_mrr@10 0.6767 0.6422 0.9491
nanobeir / NanoSciFact_cosine_map@100 0.6847 0.6526 0.9531
nanobeir / NanoTouche2020_cosine_accuracy@1 0.3878 0.3878 1.0000
nanobeir / NanoTouche2020_cosine_accuracy@3 0.7551 0.7755 1.0270
nanobeir / NanoTouche2020_cosine_accuracy@5 0.9388 0.8776 0.9348
nanobeir / NanoTouche2020_cosine_accuracy@10 0.9592 0.9796 1.0213
nanobeir / NanoTouche2020_cosine_precision@1 0.3878 0.3878 1.0000
nanobeir / NanoTouche2020_cosine_precision@3 0.4286 0.4626 1.0794
nanobeir / NanoTouche2020_cosine_precision@5 0.4653 0.4653 1.0000
nanobeir / NanoTouche2020_cosine_precision@10 0.3918 0.3878 0.9896
nanobeir / NanoTouche2020_cosine_recall@1 0.0270 0.0243 0.9011
nanobeir / NanoTouche2020_cosine_recall@3 0.0849 0.0953 1.1221
nanobeir / NanoTouche2020_cosine_recall@5 0.1568 0.1572 1.0024
nanobeir / NanoTouche2020_cosine_recall@10 0.2594 0.2583 0.9958
nanobeir / NanoTouche2020_cosine_ndcg@10 0.4184 0.4168 0.9963
nanobeir / NanoTouche2020_cosine_mrr@10 0.5988 0.5995 1.0012
nanobeir / NanoTouche2020_cosine_map@100 0.2994 0.3003 1.0032
nanobeir / NanoBEIR_mean_cosine_accuracy@1 0.5006 0.5021 1.0031
nanobeir / NanoBEIR_mean_cosine_accuracy@3 0.7181 0.6920 0.9636
nanobeir / NanoBEIR_mean_cosine_accuracy@5 0.7768 0.7644 0.9840
nanobeir / NanoBEIR_mean_cosine_accuracy@10 0.8522 0.8400 0.9856
nanobeir / NanoBEIR_mean_cosine_precision@1 0.5006 0.5021 1.0031
nanobeir / NanoBEIR_mean_cosine_precision@3 0.3309 0.3228 0.9754
nanobeir / NanoBEIR_mean_cosine_precision@5 0.2546 0.2518 0.9891
nanobeir / NanoBEIR_mean_cosine_precision@10 0.1760 0.1723 0.9790
nanobeir / NanoBEIR_mean_cosine_recall@1 0.3005 0.2959 0.9847
nanobeir / NanoBEIR_mean_cosine_recall@3 0.4804 0.4590 0.9553
nanobeir / NanoBEIR_mean_cosine_recall@5 0.5386 0.5255 0.9757
nanobeir / NanoBEIR_mean_cosine_recall@10 0.6216 0.6058 0.9746
nanobeir / NanoBEIR_mean_cosine_ndcg@10 0.5711 0.5573 0.9759
nanobeir / NanoBEIR_mean_cosine_mrr@10 0.6235 0.6134 0.9839
nanobeir / NanoBEIR_mean_cosine_map@100 0.4906 0.4781 0.9745

Citation

If you use this model or the pruning approach, please cite:

@misc{subedi2025tokenpruning,
  author = {Sanjaya Subedi},
  title  = {Token Embedding Pruning for Sentence Transformers},
  year   = {2026},
  note   = {Available at: [link to be added upon publication]}
}
Downloads last month
57
Safetensors
Model size
30.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jangedoo/e5-small-v2-pruned

Finetuned
(39)
this model