--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:3312 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: nomic-ai/modernbert-embed-base widget: - source_sentence: S3 buckets. Before creating a bucket, make sure that you choose the bucket type that best fits your application and performance requirements. For more information about the various bucket types and the appropriate use cases for each, see Buckets. The following sections provide more information about general purpose buckets, including bucket naming rules, quotas, and bucket configuration details. For a list of restriction and limitations related to Amazon S3 buckets see, General purpose bucket quotas, limitations, and restrictions. Topics • General purpose buckets overview • Common general purpose bucket patterns • Permissions • Managing public access to general purpose buckets • General purpose buckets configuration options • General purpose buckets operations General purpose buckets overview API Version 2006-03-01 53 Amazon sentences: - What should you test for your DB instance? - Where can you find a list of restrictions and limitations related to Amazon S3 buckets? - What does the 'Get started' section provide? - source_sentence: geographies use DynamoDB to build modern, serverless applications that can start small and scale globally. DynamoDB scales to support tables of virtually any size while providing consistent single-digit millisecond performance and high availability. For events, such as Amazon Prime Day, DynamoDB powers multiple high-traffic Amazon properties and systems, including Alexa, Amazon.com sites, and all Amazon fulfillment centers. For such events, DynamoDB APIs have handled trillions of calls from Amazon properties and systems. DynamoDB continuously serves hundreds of customers with tables that have peak traffic of over half a million requests per second. It also serves hundreds of customers whose table sizes exceed 200 TB, and processes over one billion requests per hour. Topics • Characteristics of DynamoDB • DynamoDB use sentences: - What ensures that tasks are always started on secure and patched infrastructure? - What state is the environment in while Elastic Beanstalk creates your AWS resources? - What is the peak traffic that DynamoDB serves for some customers? - source_sentence: Amazon Bedrock? 1 Amazon Bedrock User Guide • Create applications that reason through how to help a customer – Build agents that use foundation models, make API calls, and (optionally) query knowledge bases in order to reason through and carry out tasks for your customers. • Adapt models to specific tasks and domains with training data – Customize an Amazon Bedrock foundation model by providing training data for fine-tuning or continued-pretraining in order to adjust a model's parameters and improve its performance on specific tasks or in certain domains. • Improve your FM-based application's efficiency and output – Purchase Provisioned Throughput for a foundation model in order to run inference on models more efficiently and at discounted rates. • Determine sentences: - How can you access Amazon API Gateway? - What allocation strategy is recommended for Spot best practice? - What is the purpose of adapting models to specific tasks and domains? - source_sentence: 'you create the example application, Elastic Beanstalk creates the following resources: • EC2 instance – An Amazon EC2 virtual machine configured to run web apps on the platform you selected. Every platform runs a different set of software, configuration files, and scripts to support a specific language version, framework, web container, or combination thereof. Most platforms use either Apache or nginx as a reverse proxy to forward web traffic to your web app, serve static assets, and generate access and error logs. You can connect to your Amazon EC2 instances to view configuration and logs. Step 2 - Deploy your application 10 AWS Elastic Beanstalk Developer Guide • Instance security group – An Amazon EC2 security group will be created' sentences: - What allows a client to securely access private API resources inside a VPC? - What resources does Elastic Beanstalk create when you create the example application? - Where can you find more information about using ACLs? - source_sentence: change). Saved configuration A saved configuration is a template that you can use as a starting point for creating unique environment configurations. You can create and modify saved configurations, and apply them to environments, using the Elastic Beanstalk console, EB CLI, AWS CLI, or API. The API and the AWS CLI refer to saved configurations as configuration templates. Platform A platform is a combination of an operating system, programming language runtime, web server, application server, and Elastic Beanstalk components. You design and target your web application to a platform. Elastic Beanstalk provides a variety of platforms on which you can build your applications. For details, see Elastic Beanstalk platforms. Elastic Beanstalk web server environments The following diagram shows an example sentences: - What can you grant other people permission to do in your AWS account? - How can the fleet request be deleted? - What do the API and the AWS CLI refer to saved configurations as? pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: Embed AWS Docs results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.002717391304347826 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.22554347826086957 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.5081521739130435 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.6983695652173914 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.002717391304347826 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.07518115942028984 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.10163043478260869 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.06983695652173912 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.002717391304347826 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.22554347826086957 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.5081521739130435 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.6983695652173914 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.30319890292610013 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.18024823153899258 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.1931834404953386 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.0 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.17119565217391305 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.49728260869565216 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.6766304347826086 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.0 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.057065217391304345 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.09945652173913044 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.06766304347826087 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.0 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.17119565217391305 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.49728260869565216 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.6766304347826086 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.2883913649143213 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.16803291062801948 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.18227351655190474 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.008152173913043478 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.1875 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.4945652173913043 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.6657608695652174 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.008152173913043478 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.06249999999999999 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.09891304347826087 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.06657608695652174 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.008152173913043478 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.1875 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.4945652173913043 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.6657608695652174 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.28990281751237307 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.17309459109730865 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.18770616923880445 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.002717391304347826 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.1875 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.44021739130434784 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.5842391304347826 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.002717391304347826 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.0625 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.08804347826086956 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.058423913043478264 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.002717391304347826 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.1875 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.44021739130434784 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.5842391304347826 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.25437162359674753 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.151576518288475 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.16929779832410816 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.008152173913043478 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.15760869565217392 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.33695652173913043 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.4782608695652174 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.008152173913043478 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.05253623188405797 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.0673913043478261 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.047826086956521734 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.008152173913043478 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.15760869565217392 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.33695652173913043 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.4782608695652174 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.2095240678369969 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.12627782091097317 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.14429296766748773 name: Cosine Map@100 --- # Embed AWS Docs This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - json - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("CadenShokat/modernbert-embed-aws") # Run inference sentences = [ 'change). Saved configuration A saved configuration is a template that you can use as a starting point for creating unique environment configurations. You can create and modify saved configurations, and apply them to environments, using the Elastic Beanstalk console, EB CLI, AWS CLI, or API. The API and the AWS CLI refer to saved configurations as configuration templates. Platform A platform is a combination of an operating system, programming language runtime, web server, application server, and Elastic Beanstalk components. You design and target your web application to a platform. Elastic Beanstalk provides a variety of platforms on which you can build your applications. For details, see Elastic Beanstalk platforms. Elastic Beanstalk web server environments The following diagram shows an example', 'What do the API and the AWS CLI refer to saved configurations as?', 'What can you grant other people permission to do in your AWS account?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.5572, 0.1425], # [0.5572, 1.0000, 0.1790], # [0.1425, 0.1790, 1.0000]]) ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 768 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0027 | | cosine_accuracy@3 | 0.2255 | | cosine_accuracy@5 | 0.5082 | | cosine_accuracy@10 | 0.6984 | | cosine_precision@1 | 0.0027 | | cosine_precision@3 | 0.0752 | | cosine_precision@5 | 0.1016 | | cosine_precision@10 | 0.0698 | | cosine_recall@1 | 0.0027 | | cosine_recall@3 | 0.2255 | | cosine_recall@5 | 0.5082 | | cosine_recall@10 | 0.6984 | | **cosine_ndcg@10** | **0.3032** | | cosine_mrr@10 | 0.1802 | | cosine_map@100 | 0.1932 | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 512 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0 | | cosine_accuracy@3 | 0.1712 | | cosine_accuracy@5 | 0.4973 | | cosine_accuracy@10 | 0.6766 | | cosine_precision@1 | 0.0 | | cosine_precision@3 | 0.0571 | | cosine_precision@5 | 0.0995 | | cosine_precision@10 | 0.0677 | | cosine_recall@1 | 0.0 | | cosine_recall@3 | 0.1712 | | cosine_recall@5 | 0.4973 | | cosine_recall@10 | 0.6766 | | **cosine_ndcg@10** | **0.2884** | | cosine_mrr@10 | 0.168 | | cosine_map@100 | 0.1823 | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 256 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0082 | | cosine_accuracy@3 | 0.1875 | | cosine_accuracy@5 | 0.4946 | | cosine_accuracy@10 | 0.6658 | | cosine_precision@1 | 0.0082 | | cosine_precision@3 | 0.0625 | | cosine_precision@5 | 0.0989 | | cosine_precision@10 | 0.0666 | | cosine_recall@1 | 0.0082 | | cosine_recall@3 | 0.1875 | | cosine_recall@5 | 0.4946 | | cosine_recall@10 | 0.6658 | | **cosine_ndcg@10** | **0.2899** | | cosine_mrr@10 | 0.1731 | | cosine_map@100 | 0.1877 | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 128 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0027 | | cosine_accuracy@3 | 0.1875 | | cosine_accuracy@5 | 0.4402 | | cosine_accuracy@10 | 0.5842 | | cosine_precision@1 | 0.0027 | | cosine_precision@3 | 0.0625 | | cosine_precision@5 | 0.088 | | cosine_precision@10 | 0.0584 | | cosine_recall@1 | 0.0027 | | cosine_recall@3 | 0.1875 | | cosine_recall@5 | 0.4402 | | cosine_recall@10 | 0.5842 | | **cosine_ndcg@10** | **0.2544** | | cosine_mrr@10 | 0.1516 | | cosine_map@100 | 0.1693 | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 64 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0082 | | cosine_accuracy@3 | 0.1576 | | cosine_accuracy@5 | 0.337 | | cosine_accuracy@10 | 0.4783 | | cosine_precision@1 | 0.0082 | | cosine_precision@3 | 0.0525 | | cosine_precision@5 | 0.0674 | | cosine_precision@10 | 0.0478 | | cosine_recall@1 | 0.0082 | | cosine_recall@3 | 0.1576 | | cosine_recall@5 | 0.337 | | cosine_recall@10 | 0.4783 | | **cosine_ndcg@10** | **0.2095** | | cosine_mrr@10 | 0.1263 | | cosine_map@100 | 0.1443 | ## Training Details ### Training Dataset #### json * Dataset: json * Size: 3,312 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------| | such as the Kubernetes Dashboard and the section called “Horizontal Pod Autoscaler”. In this topic you learn how to install the Metrics Server. • the section called “Deploy apps with Helm” – The Helm package manager for Kubernetes helps you install and manage applications on your Kubernetes cluster. This topic helps you install and run the Helm binaries so that you can install and manage charts using the Helm CLI on your local computer. • the section called “Tagging your resources” – To help you manage your Amazon EKS resources, you can assign your own metadata to each resource in the form of tags. This topic describes tags and shows you how to create them. • the section called “Service | What is the section called that helps you install the Metrics Server? | | out orchestrations through cyclically interpreting inputs and producing outputs by using a foundation model. An agent can be used to carry out customer requests. For more information, see Automate tasks in your application using AI agents. • Retrieval augmented generation (RAG) – The process involves: 1. Querying and retrieving information from a data source 2. Augmenting a prompt with this information to provide better context to the foundation model 3. Obtaining a better response from the foundation model using the additional context For more information, see Retrieve data and generate AI responses with Amazon Bedrock Knowledge Bases. • Model customization – The process of using training data to adjust the model parameter values in a base model in order to | Where can you find more information about AI agents? | | An application that allows your customers to register, discover, and subscribe to your API products (API Gateway usage plans), manage their API keys, and view their usage metrics for your APIs. Edge-optimized API endpoint The default hostname of an API Gateway API that is deployed to the specified Region while using a CloudFront distribution to facilitate client access typically from across AWS Regions. API API Gateway concepts 9 Amazon API Gateway Developer Guide requests are routed to the nearest CloudFront Point of Presence (POP), which typically improves connection time for geographically diverse clients. See API endpoints. Integration request The internal interface of a WebSocket API route or REST API method in API Gateway, in which you map the body of | What is the internal interface of a WebSocket API route or REST API method in API Gateway called? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 4 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `tf32`: False - `load_best_model_at_end`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: False - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | |:-------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | 1.0 | 7 | - | 0.2693 | 0.2644 | 0.2627 | 0.2275 | 0.1783 | | 1.4615 | 10 | 5.1989 | - | - | - | - | - | | 2.0 | 14 | - | 0.2949 | 0.2901 | 0.2832 | 0.2446 | 0.1976 | | 2.9231 | 20 | 2.6407 | - | - | - | - | - | | 3.0 | 21 | - | 0.3075 | 0.2905 | 0.2876 | 0.2504 | 0.2081 | | **4.0** | **28** | **-** | **0.3032** | **0.2884** | **0.2899** | **0.2544** | **0.2095** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.18 - Sentence Transformers: 5.1.0 - Transformers: 4.55.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.0 - Datasets: 4.0.0 - Tokenizers: 0.21.4 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```