Add new SentenceTransformer model

c391424 verified 5 months ago

32.2 kB

	---
	language:
	- nep
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:3385
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: jangedoo/all-MiniLM-L6-v2-nepali
	widget:
	- source_sentence: नागरिकता टोलीले सर्जमिनको क्रममा कस्तो व्यक्तिको मतदाता परिचयपत्रको
	सक्कल प्रति जाँच गर्छ?
	sentences:
	- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको जन्म, बसोबास, र नाताको तथ्यको रेकर्ड
	राख्छ।
	- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको मतदाता परिचयपत्रको सक्कल प्रति जाँच
	गर्छ।
	- राहदानीको विद्युतीय अभिलेखमा राहदानी जारी भएको मिति र अवधि समाप्त हुने मिति राखिन्छ।
	- source_sentence: नागरिकता टोलीले कस्तो अवस्थामा सर्जमिनको समयसीमा लम्ब्याउन सक्छ?
	sentences:
	- नागरिकता टोलीले सर्जमिनको क्रममा जन्मदर्ता, नागरिकता, र स्थानीय तहको सिफारिसको
	मूल प्रति माग्न सक्छ।
	- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको बसोबास भएको स्थानको नक्सा हेर्न सक्छ।
	- नागरिकता टोलीले जटिल तथ्य वा थप प्रमाण आवश्यक भएमा सर्जमिनको समयसीमा लम्ब्याउन
	सक्छ।
	- source_sentence: नागरिकताको प्रमाणपत्रमा विवरण सच्याउन आवश्यक प्रमाण के-के हुन्?
	sentences:
	- नागरिकताको प्रमाणपत्रमा विवरण सच्याउन आवश्यक प्रमाणमा निवेदकसँग भएको सबुत प्रमाण
	र आवश्यकता अनुसार साक्षी र सरजमिन समावेश हुन्छ।
	- संवत् २०४६ साल चैत्र मसान्तसम्म नेपाल सरहदभित्र जन्म भई नेपालमा स्थायी रुपले बसोबास
	गर्दै आएको व्यक्ति जन्मको आधारमा नेपालको नागरिक हुनेछ।
	- नागरिकता निवेदनमा निवेदकको जन्म मिति विक्रम संवत् वा ईस्वी संवत्मा स्पष्ट रूपमा
	उल्लेख गर्नुपर्छ।
	- source_sentence: राहदानी कुन कुन अवस्थामा रद्द गरिन्छ?
	sentences:
	- विदेशी नागरिकता त्यागेर पुनः नेपाली नागरिकता कायम गर्न अनुसूची-११ बमोजिमको ढाँचामा
	निवेदन दिनुपर्छ, जसमा पूरा नाम, थर, जन्मस्थान, जन्म मिति, उमेर, साविकको नागरिकता
	नम्बर, जारी मिति, नागरिकताको किसिम, नेपालमा बसोबास गरेको मिति, हालको बसोबासको
	स्थान, बाबुको नाम, थर, ठेगाना, नागरिकता नम्बर, दस्तखत, औंठाको छाप, र विदेशी नागरिकता
	त्यागेको निस्सा उल्लेख हुनुपर्छ।
	- राहदानी हराएको, च्यातिएको, प्रयोग हुन नसक्ने, अवधि सकिएको, वा बुझी नलिएको अवस्थामा
	रद्द गरिन्छ।
	- दफा ५ को उपदफा (४) बमोजिम अंगीकृत नागरिकता प्रमाणपत्र अनुसूची-८ बमोजिमको ढाँचामा
	जारी गरिन्छ, जसमा नागरिकताको किसिम, पूरा नाम, थर, जन्मस्थान, जन्म मिति, लिङ्ग,
	स्थायी वासस्थान, दुवै कान देखिने अटो साइजको फोटो, र निर्णय मिति उल्लेख हुन्छ।
	- source_sentence: राहदानी रद्द गर्न कस्तो सत्यताको घोषणा चाहिन्छ?
	sentences:
	- नागरिकता टोलीले सर्जमिनको क्रममा निवेदकको बसोबास भएको स्थानको नक्सा हेर्न सक्छ।
	- राहदानी रद्द गर्न निवेदकले उल्लेखित विवरण साँचो भएको र प्रचलित कानून बमोजिम अपराध
	ठहरिने कुनै काम नगरेको सत्यताको घोषणा गर्नुपर्छ।
	- नागरिकता टोलीले गलत तथ्य वा अपूर्ण जानकारी भएमा सर्जमिनको प्रतिवेदन रद्द गर्न
	सक्छ।
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: sentenceTransformer_nepali_embedding
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 384
	type: dim_384
	metrics:
	- type: cosine_accuracy@1
	value: 0.2891246684350133
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.5013262599469496
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.6153846153846154
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.7771883289124668
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.2891246684350133
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.16710875331564987
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.12307692307692306
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.07771883289124668
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.2891246684350133
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.5013262599469496
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.6153846153846154
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.7771883289124668
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.5114393487220035
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.42878931413414173
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.4378957928577126
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 256
	type: dim_256
	metrics:
	- type: cosine_accuracy@1
	value: 0.29708222811671087
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.5225464190981433
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.6259946949602122
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.7771883289124668
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.29708222811671087
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.17418213969938107
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.12519893899204243
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.07771883289124668
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.29708222811671087
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.5225464190981433
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.6259946949602122
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.7771883289124668
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.5196017799940188
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.43912361584775383
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.44830863398887005
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 128
	type: dim_128
	metrics:
	- type: cosine_accuracy@1
	value: 0.2891246684350133
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.5039787798408488
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.6127320954907162
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.7771883289124668
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.2891246684350133
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.16799292661361626
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.12254641909814322
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.07771883289124668
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.2891246684350133
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.5039787798408488
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.6127320954907162
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.7771883289124668
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.513425703936886
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.43126815713022615
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.4397863110473721
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 64
	type: dim_64
	metrics:
	- type: cosine_accuracy@1
	value: 0.28116710875331563
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.493368700265252
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.610079575596817
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.7639257294429708
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.28116710875331563
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.16445623342175067
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.12201591511936338
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.07639257294429708
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.28116710875331563
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.493368700265252
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.610079575596817
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.7639257294429708
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.5039737400654479
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.42297061176371525
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.43166547136933925
	name: Cosine Map@100
	---

	# sentenceTransformer_nepali_embedding

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jangedoo/all-MiniLM-L6-v2-nepali](https://huggingface.co/jangedoo/all-MiniLM-L6-v2-nepali) on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [jangedoo/all-MiniLM-L6-v2-nepali](https://huggingface.co/jangedoo/all-MiniLM-L6-v2-nepali) <!-- at revision 418f7cf08ecbbc2ff0e8460bb6eb6457291102df -->
	- Maximum Sequence Length: 256 tokens
	- Output Dimensionality: 384 dimensions
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- json
	- Language: nep
	- License: apache-2.0

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("ritesh-07/fine_tuned_model_03")
	# Run inference
	sentences = [
	'राहदानी रद्द गर्न कस्तो सत्यताको घोषणा चाहिन्छ?',
	'राहदानी रद्द गर्न निवेदकले उल्लेखित विवरण साँचो भएको र प्रचलित कानून बमोजिम अपराध ठहरिने कुनै काम नगरेको सत्यताको घोषणा गर्नुपर्छ।',
	'नागरिकता टोलीले गलत तथ्य वा अपूर्ण जानकारी भएमा सर्जमिनको प्रतिवेदन रद्द गर्न सक्छ।',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 384]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Dataset: `dim_384`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 384
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.2891 \|
	\| cosine_accuracy@3 \| 0.5013 \|
	\| cosine_accuracy@5 \| 0.6154 \|
	\| cosine_accuracy@10 \| 0.7772 \|
	\| cosine_precision@1 \| 0.2891 \|
	\| cosine_precision@3 \| 0.1671 \|
	\| cosine_precision@5 \| 0.1231 \|
	\| cosine_precision@10 \| 0.0777 \|
	\| cosine_recall@1 \| 0.2891 \|
	\| cosine_recall@3 \| 0.5013 \|
	\| cosine_recall@5 \| 0.6154 \|
	\| cosine_recall@10 \| 0.7772 \|
	\| cosine_ndcg@10 \| 0.5114 \|
	\| cosine_mrr@10 \| 0.4288 \|
	\| cosine_map@100 \| 0.4379 \|

	#### Information Retrieval

	* Dataset: `dim_256`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 256
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.2971 \|
	\| cosine_accuracy@3 \| 0.5225 \|
	\| cosine_accuracy@5 \| 0.626 \|
	\| cosine_accuracy@10 \| 0.7772 \|
	\| cosine_precision@1 \| 0.2971 \|
	\| cosine_precision@3 \| 0.1742 \|
	\| cosine_precision@5 \| 0.1252 \|
	\| cosine_precision@10 \| 0.0777 \|
	\| cosine_recall@1 \| 0.2971 \|
	\| cosine_recall@3 \| 0.5225 \|
	\| cosine_recall@5 \| 0.626 \|
	\| cosine_recall@10 \| 0.7772 \|
	\| cosine_ndcg@10 \| 0.5196 \|
	\| cosine_mrr@10 \| 0.4391 \|
	\| cosine_map@100 \| 0.4483 \|

	#### Information Retrieval

	* Dataset: `dim_128`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 128
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.2891 \|
	\| cosine_accuracy@3 \| 0.504 \|
	\| cosine_accuracy@5 \| 0.6127 \|
	\| cosine_accuracy@10 \| 0.7772 \|
	\| cosine_precision@1 \| 0.2891 \|
	\| cosine_precision@3 \| 0.168 \|
	\| cosine_precision@5 \| 0.1225 \|
	\| cosine_precision@10 \| 0.0777 \|
	\| cosine_recall@1 \| 0.2891 \|
	\| cosine_recall@3 \| 0.504 \|
	\| cosine_recall@5 \| 0.6127 \|
	\| cosine_recall@10 \| 0.7772 \|
	\| cosine_ndcg@10 \| 0.5134 \|
	\| cosine_mrr@10 \| 0.4313 \|
	\| cosine_map@100 \| 0.4398 \|

	#### Information Retrieval

	* Dataset: `dim_64`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 64
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:----------\|
	\| cosine_accuracy@1 \| 0.2812 \|
	\| cosine_accuracy@3 \| 0.4934 \|
	\| cosine_accuracy@5 \| 0.6101 \|
	\| cosine_accuracy@10 \| 0.7639 \|
	\| cosine_precision@1 \| 0.2812 \|
	\| cosine_precision@3 \| 0.1645 \|
	\| cosine_precision@5 \| 0.122 \|
	\| cosine_precision@10 \| 0.0764 \|
	\| cosine_recall@1 \| 0.2812 \|
	\| cosine_recall@3 \| 0.4934 \|
	\| cosine_recall@5 \| 0.6101 \|
	\| cosine_recall@10 \| 0.7639 \|
	\| cosine_ndcg@10 \| 0.504 \|
	\| cosine_mrr@10 \| 0.423 \|
	\| cosine_map@100 \| 0.4317 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### json

	* Dataset: json
	* Size: 3,385 training samples
	* Columns: <code>anchor</code> and <code>positive</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| anchor \| positive \|
	\|:--------\|:------------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 18 tokens</li><li>mean: 49.31 tokens</li><li>max: 103 tokens</li></ul> \| <ul><li>min: 17 tokens</li><li>mean: 81.7 tokens</li><li>max: 256 tokens</li></ul> \|
	* Samples:
	\| anchor \| positive \|
	\|:--------------------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>राहदानी नियमावली, २०७७ मा दस्तुर बुझाउने प्रक्रिया कस्तो छ?</code> \| <code>राहदानी नियमावली, २०७७ मा दस्तुर तोकिएको बैङ्कमा बुझाई रसिद निवेदनमा संलग्न गर्नुपर्छ।</code> \|
	\| <code>दफा ३ को उपदफा (६) मा विदेशी नागरिकसँग विवाह गरेकी नेपाली महिलाको सन्तानले कसरी नागरिकता प्राप्त गर्छ?</code> \| <code>दफा ३ को उपदफा (६) मा विदेशी नागरिकसँग विवाह गरेकी नेपाली महिला नागरिकबाट नेपालमा जन्मिएको व्यक्तिले, यदि निजको आमा र बाबु दुवै नेपाली नागरिक रहेछन् भने, वंशजको आधारमा नेपालको नागरिकता प्राप्त गर्नेछ।</code> \|
	\| <code>दफा ३ को उपदफा (४) मा कस्तो व्यवस्था थपिएको छ?</code> \| <code>दफा ३ को उपदफा (४) मा थपिएको व्यवस्था अनुसार, संवत् २०७२ साल असोज ३ गतेभन्दा अघि जन्मको आधारमा नेपालको नागरिकता प्राप्त गरेको नागरिकको सन्तानले, यदि बाबु र आमा दुवै नेपालको नागरिक रहेछन् भने, निजको उमेर सोह्र वर्ष पूरा भएपछि वंशजको आधारमा नेपालको नागरिकता प्राप्त गर्नेछ।</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	384,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: epoch
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 16
	- `gradient_accumulation_steps`: 16
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 4
	- `lr_scheduler_type`: cosine
	- `warmup_ratio`: 0.1
	- `bf16`: True
	- `tf32`: False
	- `load_best_model_at_end`: True
	- `optim`: adamw_torch_fused
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: epoch
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 16
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 16
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 4
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: True
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: False
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `hub_revision`: None
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `liger_kernel_config`: None
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| dim_384_cosine_ndcg@10 \| dim_256_cosine_ndcg@10 \| dim_128_cosine_ndcg@10 \| dim_64_cosine_ndcg@10 \|
	\|:-------:\|:------:\|:-------------:\|:----------------------:\|:----------------------:\|:----------------------:\|:---------------------:\|
	\| 1.0 \| 7 \| - \| 0.4635 \| 0.4673 \| 0.4674 \| 0.4406 \|
	\| 1.4528 \| 10 \| 2.6919 \| - \| - \| - \| - \|
	\| 2.0 \| 14 \| - \| 0.4977 \| 0.5140 \| 0.4963 \| 0.4759 \|
	\| 2.9057 \| 20 \| 1.0521 \| - \| - \| - \| - \|
	\| 3.0 \| 21 \| - \| 0.5111 \| 0.5242 \| 0.5130 \| 0.5017 \|
	\| 4.0 \| 28 \| - \| 0.5114 \| 0.5196 \| 0.5134 \| 0.504 \|

	* The bold row denotes the saved checkpoint.

	### Framework Versions
	- Python: 3.11.13
	- Sentence Transformers: 4.1.0
	- Transformers: 4.54.0
	- PyTorch: 2.6.0+cu124
	- Accelerate: 1.9.0
	- Datasets: 4.0.0
	- Tokenizers: 0.21.2

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->