fine_tuned_model_03 / README.md
ritesh-07's picture
Add new SentenceTransformer model
c391424 verified
---
language:
- nep
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:3385
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: jangedoo/all-MiniLM-L6-v2-nepali
widget:
- source_sentence: नागरिकता टोलीले सर्जमिनको क्रममा कस्तो व्यक्तिको मतदाता परिचयपत्रको
सक्कल प्रति जाँच गर्छ?
sentences:
- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको जन्म, बसोबास, नाताको तथ्यको रेकर्ड
राख्छ।
- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको मतदाता परिचयपत्रको सक्कल प्रति जाँच
गर्छ।
- राहदानीको विद्युतीय अभिलेखमा राहदानी जारी भएको मिति अवधि समाप्त हुने मिति राखिन्छ।
- source_sentence: नागरिकता टोलीले कस्तो अवस्थामा सर्जमिनको समयसीमा लम्ब्याउन सक्छ?
sentences:
- नागरिकता टोलीले सर्जमिनको क्रममा जन्मदर्ता, नागरिकता, स्थानीय तहको सिफारिसको
मूल प्रति माग्न सक्छ।
- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको बसोबास भएको स्थानको नक्सा हेर्न सक्छ।
- नागरिकता टोलीले जटिल तथ्य वा थप प्रमाण आवश्यक भएमा सर्जमिनको समयसीमा लम्ब्याउन
सक्छ।
- source_sentence: नागरिकताको प्रमाणपत्रमा विवरण सच्याउन आवश्यक प्रमाण के-के हुन्?
sentences:
- नागरिकताको प्रमाणपत्रमा विवरण सच्याउन आवश्यक प्रमाणमा निवेदकसँग भएको सबुत प्रमाण
आवश्यकता अनुसार साक्षी सरजमिन समावेश हुन्छ।
- संवत् २०४६ साल चैत्र मसान्तसम्म नेपाल सरहदभित्र जन्म भई नेपालमा स्थायी रुपले बसोबास
गर्दै आएको व्यक्ति जन्मको आधारमा नेपालको नागरिक हुनेछ।
- नागरिकता निवेदनमा निवेदकको जन्म मिति विक्रम संवत् वा ईस्वी संवत्मा स्पष्ट रूपमा
उल्लेख गर्नुपर्छ।
- source_sentence: राहदानी कुन कुन अवस्थामा रद्द गरिन्छ?
sentences:
- विदेशी नागरिकता त्यागेर पुनः नेपाली नागरिकता कायम गर्न अनुसूची-११ बमोजिमको ढाँचामा
निवेदन दिनुपर्छ, जसमा पूरा नाम, थर, जन्मस्थान, जन्म मिति, उमेर, साविकको नागरिकता
नम्बर, जारी मिति, नागरिकताको किसिम, नेपालमा बसोबास गरेको मिति, हालको बसोबासको
स्थान, बाबुको नाम, थर, ठेगाना, नागरिकता नम्बर, दस्तखत, औंठाको छाप, विदेशी नागरिकता
त्यागेको निस्सा उल्लेख हुनुपर्छ।
- राहदानी हराएको, च्यातिएको, प्रयोग हुन नसक्ने, अवधि सकिएको, वा बुझी नलिएको अवस्थामा
रद्द गरिन्छ।
- दफा को उपदफा (४) बमोजिम अंगीकृत नागरिकता प्रमाणपत्र अनुसूची-८ बमोजिमको ढाँचामा
जारी गरिन्छ, जसमा नागरिकताको किसिम, पूरा नाम, थर, जन्मस्थान, जन्म मिति, लिङ्ग,
स्थायी वासस्थान, दुवै कान देखिने अटो साइजको फोटो, निर्णय मिति उल्लेख हुन्छ।
- source_sentence: राहदानी रद्द गर्न कस्तो सत्यताको घोषणा चाहिन्छ?
sentences:
- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको बसोबास भएको स्थानको नक्सा हेर्न सक्छ।
- राहदानी रद्द गर्न निवेदकले उल्लेखित विवरण साँचो भएको प्रचलित कानून बमोजिम अपराध
ठहरिने कुनै काम नगरेको सत्यताको घोषणा गर्नुपर्छ।
- नागरिकता टोलीले गलत तथ्य वा अपूर्ण जानकारी भएमा सर्जमिनको प्रतिवेदन रद्द गर्न
सक्छ।
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: sentenceTransformer_nepali_embedding
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 384
type: dim_384
metrics:
- type: cosine_accuracy@1
value: 0.2891246684350133
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5013262599469496
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6153846153846154
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7771883289124668
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.2891246684350133
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.16710875331564987
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.12307692307692306
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.07771883289124668
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.2891246684350133
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5013262599469496
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6153846153846154
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7771883289124668
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5114393487220035
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.42878931413414173
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.4378957928577126
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.29708222811671087
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5225464190981433
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6259946949602122
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7771883289124668
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.29708222811671087
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.17418213969938107
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.12519893899204243
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.07771883289124668
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.29708222811671087
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5225464190981433
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6259946949602122
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7771883289124668
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5196017799940188
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.43912361584775383
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.44830863398887005
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.2891246684350133
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5039787798408488
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6127320954907162
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7771883289124668
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.2891246684350133
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.16799292661361626
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.12254641909814322
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.07771883289124668
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.2891246684350133
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5039787798408488
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6127320954907162
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7771883289124668
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.513425703936886
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.43126815713022615
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.4397863110473721
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.28116710875331563
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.493368700265252
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.610079575596817
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7639257294429708
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.28116710875331563
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.16445623342175067
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.12201591511936338
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.07639257294429708
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.28116710875331563
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.493368700265252
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.610079575596817
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7639257294429708
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5039737400654479
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.42297061176371525
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.43166547136933925
name: Cosine Map@100
---
# sentenceTransformer_nepali_embedding
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jangedoo/all-MiniLM-L6-v2-nepali](https://huggingface.co/jangedoo/all-MiniLM-L6-v2-nepali) on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [jangedoo/all-MiniLM-L6-v2-nepali](https://huggingface.co/jangedoo/all-MiniLM-L6-v2-nepali) <!-- at revision 418f7cf08ecbbc2ff0e8460bb6eb6457291102df -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- json
- **Language:** nep
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ritesh-07/fine_tuned_model_03")
# Run inference
sentences = [
'राहदानी रद्द गर्न कस्तो सत्यताको घोषणा चाहिन्छ?',
'राहदानी रद्द गर्न निवेदकले उल्लेखित विवरण साँचो भएको र प्रचलित कानून बमोजिम अपराध ठहरिने कुनै काम नगरेको सत्यताको घोषणा गर्नुपर्छ।',
'नागरिकता टोलीले गलत तथ्य वा अपूर्ण जानकारी भएमा सर्जमिनको प्रतिवेदन रद्द गर्न सक्छ।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
## Evaluation
### Metrics
#### Information Retrieval
* Dataset: `dim_384`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
```json
{
"truncate_dim": 384
}
```
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.2891 |
| cosine_accuracy@3 | 0.5013 |
| cosine_accuracy@5 | 0.6154 |
| cosine_accuracy@10 | 0.7772 |
| cosine_precision@1 | 0.2891 |
| cosine_precision@3 | 0.1671 |
| cosine_precision@5 | 0.1231 |
| cosine_precision@10 | 0.0777 |
| cosine_recall@1 | 0.2891 |
| cosine_recall@3 | 0.5013 |
| cosine_recall@5 | 0.6154 |
| cosine_recall@10 | 0.7772 |
| **cosine_ndcg@10** | **0.5114** |
| cosine_mrr@10 | 0.4288 |
| cosine_map@100 | 0.4379 |
#### Information Retrieval
* Dataset: `dim_256`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
```json
{
"truncate_dim": 256
}
```
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.2971 |
| cosine_accuracy@3 | 0.5225 |
| cosine_accuracy@5 | 0.626 |
| cosine_accuracy@10 | 0.7772 |
| cosine_precision@1 | 0.2971 |
| cosine_precision@3 | 0.1742 |
| cosine_precision@5 | 0.1252 |
| cosine_precision@10 | 0.0777 |
| cosine_recall@1 | 0.2971 |
| cosine_recall@3 | 0.5225 |
| cosine_recall@5 | 0.626 |
| cosine_recall@10 | 0.7772 |
| **cosine_ndcg@10** | **0.5196** |
| cosine_mrr@10 | 0.4391 |
| cosine_map@100 | 0.4483 |
#### Information Retrieval
* Dataset: `dim_128`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
```json
{
"truncate_dim": 128
}
```
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.2891 |
| cosine_accuracy@3 | 0.504 |
| cosine_accuracy@5 | 0.6127 |
| cosine_accuracy@10 | 0.7772 |
| cosine_precision@1 | 0.2891 |
| cosine_precision@3 | 0.168 |
| cosine_precision@5 | 0.1225 |
| cosine_precision@10 | 0.0777 |
| cosine_recall@1 | 0.2891 |
| cosine_recall@3 | 0.504 |
| cosine_recall@5 | 0.6127 |
| cosine_recall@10 | 0.7772 |
| **cosine_ndcg@10** | **0.5134** |
| cosine_mrr@10 | 0.4313 |
| cosine_map@100 | 0.4398 |
#### Information Retrieval
* Dataset: `dim_64`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
```json
{
"truncate_dim": 64
}
```
| Metric | Value |
|:--------------------|:----------|
| cosine_accuracy@1 | 0.2812 |
| cosine_accuracy@3 | 0.4934 |
| cosine_accuracy@5 | 0.6101 |
| cosine_accuracy@10 | 0.7639 |
| cosine_precision@1 | 0.2812 |
| cosine_precision@3 | 0.1645 |
| cosine_precision@5 | 0.122 |
| cosine_precision@10 | 0.0764 |
| cosine_recall@1 | 0.2812 |
| cosine_recall@3 | 0.4934 |
| cosine_recall@5 | 0.6101 |
| cosine_recall@10 | 0.7639 |
| **cosine_ndcg@10** | **0.504** |
| cosine_mrr@10 | 0.423 |
| cosine_map@100 | 0.4317 |
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### json
* Dataset: json
* Size: 3,385 training samples
* Columns: <code>anchor</code> and <code>positive</code>
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details | <ul><li>min: 18 tokens</li><li>mean: 49.31 tokens</li><li>max: 103 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 81.7 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
| anchor | positive |
|:--------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>राहदानी नियमावली, २०७७ मा दस्तुर बुझाउने प्रक्रिया कस्तो छ?</code> | <code>राहदानी नियमावली, २०७७ मा दस्तुर तोकिएको बैङ्कमा बुझाई रसिद निवेदनमा संलग्न गर्नुपर्छ।</code> |
| <code>दफा ३ को उपदफा (६) मा विदेशी नागरिकसँग विवाह गरेकी नेपाली महिलाको सन्तानले कसरी नागरिकता प्राप्त गर्छ?</code> | <code>दफा ३ को उपदफा (६) मा विदेशी नागरिकसँग विवाह गरेकी नेपाली महिला नागरिकबाट नेपालमा जन्मिएको व्यक्तिले, यदि निजको आमा र बाबु दुवै नेपाली नागरिक रहेछन् भने, वंशजको आधारमा नेपालको नागरिकता प्राप्त गर्नेछ।</code> |
| <code>दफा ३ को उपदफा (४) मा कस्तो व्यवस्था थपिएको छ?</code> | <code>दफा ३ को उपदफा (४) मा थपिएको व्यवस्था अनुसार, संवत् २०७२ साल असोज ३ गतेभन्दा अघि जन्मको आधारमा नेपालको नागरिकता प्राप्त गरेको नागरिकको सन्तानले, यदि बाबु र आमा दुवै नेपालको नागरिक रहेछन् भने, निजको उमेर सोह्र वर्ष पूरा भएपछि वंशजको आधारमा नेपालको नागरिकता प्राप्त गर्नेछ।</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
384,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: False
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 16
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: False
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `hub_revision`: None
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `liger_kernel_config`: None
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional
</details>
### Training Logs
| Epoch | Step | Training Loss | dim_384_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|:-------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
| 1.0 | 7 | - | 0.4635 | 0.4673 | 0.4674 | 0.4406 |
| 1.4528 | 10 | 2.6919 | - | - | - | - |
| 2.0 | 14 | - | 0.4977 | 0.5140 | 0.4963 | 0.4759 |
| 2.9057 | 20 | 1.0521 | - | - | - | - |
| 3.0 | 21 | - | 0.5111 | 0.5242 | 0.5130 | 0.5017 |
| **4.0** | **28** | **-** | **0.5114** | **0.5196** | **0.5134** | **0.504** |
* The bold row denotes the saved checkpoint.
### Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.54.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.9.0
- Datasets: 4.0.0
- Tokenizers: 0.21.2
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->