Sentence Similarity
sentence-transformers
Safetensors
English
modernbert
feature-extraction
dense
Generated from Trainer
dataset_size:3615666
loss:CachedMultipleNegativesSymmetricRankingLoss
loss:CachedMultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use johnnyboycurtis/ModernBERT-small-retrieval with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use johnnyboycurtis/ModernBERT-small-retrieval with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("johnnyboycurtis/ModernBERT-small-retrieval") sentences = [ "what is the difference between body spray and eau de toilette?", "Eau de Toilette (EDT) is ideal for those that may find the EDP or Perfume oil too strong, with 7%-12% fragrance concentration in alcohol. Gives four to five hours wear. Body Mist is a light refreshing fragrance perfect for layering with other products from the same family. 3-5% fragrance concentration in alcohol.", "To join the Army as an enlisted member you must usually take the Armed Services Vocational Aptitude Battery (ASVAB) test and get a good score. The maximum ASVAB score is 99. For enlistment into the Army you must get a minimum ASVAB score of 31.", "Points needed to redeem rewards with Redbox Perks: 1,500 points = FREE 1-night DVD rental. 1,750 points = FREE Blu-ray rental. 2,500 points = FREE 1-night Game rental." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -337,6 +337,21 @@ model-index:
|
|
| 337 |
|
| 338 |
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3), [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) and [natural_questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 339 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 340 |
## Model Details
|
| 341 |
|
| 342 |
### Model Description
|
|
|
|
| 337 |
|
| 338 |
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3), [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) and [natural_questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 339 |
|
| 340 |
+
|
| 341 |
+
This model is based on the wide architecture of [johnnyboycurtis/ModernBERT-small](https://huggingface.co/johnnyboycurtis/ModernBERT-small)
|
| 342 |
+
|
| 343 |
+
```
|
| 344 |
+
small_modernbert_config = ModernBertConfig(
|
| 345 |
+
hidden_size=384, # A common dimension for small embedding models
|
| 346 |
+
num_hidden_layers=12, # Significantly fewer layers than the base's 22
|
| 347 |
+
num_attention_heads=6, # Must be a divisor of hidden_size
|
| 348 |
+
intermediate_size=1536, # 4 * hidden_size -- VERY WIDE!!
|
| 349 |
+
max_position_embeddings=1024, # Max sequence length for the model; originally 8192
|
| 350 |
+
)
|
| 351 |
+
|
| 352 |
+
model = ModernBertModel(modernbert_small_config)
|
| 353 |
+
```
|
| 354 |
+
|
| 355 |
## Model Details
|
| 356 |
|
| 357 |
### Model Description
|