Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 14
How to use ronit01/golden_rag_tuned_minilm_100 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("ronit01/golden_rag_tuned_minilm_100")
sentences = [
"How do Interactive Control (IC) Ops execute relative to chunk boundaries, and why is this execution model important for correctness?",
"Compute Metrics Function\n------\n\nOptional user-provided function specifying custom evaluation metrics based on the generated \noutputs and ground truth.\n\nIt is passed to the :code:`compute_metrics` argument of :class:`RFModelConfig`. \nAlso read: :doc:`the LoRA and Model Configs page</models>`.\nYou can create multiple variants of these functions and pass them all as a single \n:code:`List` to your :class:`RFModelConfig` to create a multi-config specification.\n\nThis function is invoked by the underlying HF trainer at a cadence controlled by the \n:code:`eval_strategy` and :code:`eval_steps` arguments.\nAlso read: :doc:`the Trainer Configs page</trainers>`.\n\n.. py:function:: fit.compute_metrics_fn(eval_preds: Tuple) -> Dict[str, float]\n\n :param eval_preds: Tuple containing generated predictions and ground truth labels from the eval dataset.\n :type eval_preds: Tuple[List[str], List[str]]\n\n :return: Dictionary with user-defined metrics with names keys and numbers as values\n :rtype: Dict[str, float]\n\n\n**Example:**\n\n.. code-block:: python\n\n\t# From the SFT tutorial notebook\n\tdef sample_compute_metrics(eval_preds): \n\t\t\"\"\"Optional function to compute eval metrics based on predictions and labels\"\"\"\n\t\tpredictions, labels = eval_preds\n\n\t\t# Standard text-based eval metrics: Rouge and BLEU\n\t\timport evaluate\n\t\trouge = evaluate.load(\"rouge\")\n\t\tbleu = evaluate.load(\"bleu\")\n\n\t\trouge_output = rouge.compute(predictions=predictions, references=labels, use_stemmer=True)\n\t\trouge_l = rouge_output[\"rougeL\"]\n\t\tbleu_output = bleu.compute(predictions=predictions, references=labels)\n\t\tbleu_score = bleu_output[\"bleu\"]\n\n\t\treturn {\"rougeL\": round(rouge_l, 4), \"bleu\": round(bleu_score, 4)}",
"Port conflicts (services already running)\n----------------------------------------\n\nIf you encounter port conflicts, you can kill existing processes.\n\n.. code-block:: bash\n\n lsof -t -i:8852 | xargs kill -9 # mlflow\n lsof -t -i:8851 | xargs kill -9 # dispatcher\n lsof -t -i:8853 | xargs kill -9 # frontend server\n\nSelect specific GPU(s) to use\n-----------------------------\n\nSet the ``CUDA_VISIBLE_DEVICES`` environment variable BEFORE running ``rapidfireai start`` to control which GPU(s) RapidFire can see and use.\n\n.. code-block:: bash\n\n export CUDA_VISIBLE_DEVICES=2 # use GPU index 2 only\n rapidfireai start\n\nMultiple GPUs (example: GPUs 0 and 2):\n\n.. code-block:: bash\n\n export CUDA_VISIBLE_DEVICES=0,2\n rapidfireai start\n\nFrom a Python script (set before importing/starting RapidFire):\n\n.. code-block:: python\n\n import os\n os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"2\"\n # then start your RapidFire workflow\n",
"Compute Metrics Function\n------\n\nOptional user-provided function specifying custom evaluation metrics based on the generated \noutputs and ground truth.\n\nIt is passed to the :code:`compute_metrics` argument of :class:`RFModelConfig`. \nAlso read: :doc:`the LoRA and Model Configs page</models>`.\nYou can create multiple variants of these functions and pass them all as a single \n:code:`List` to your :class:`RFModelConfig` to create a multi-config specification.\n\nThis function is invoked by the underlying HF trainer at a cadence controlled by the \n:code:`eval_strategy` and :code:`eval_steps` arguments.\nAlso read: :doc:`the Trainer Configs page</trainers>`.\n\n.. py:function:: fit.compute_metrics_fn(eval_preds: Tuple) -> Dict[str, float]\n\n :param eval_preds: Tuple containing generated predictions and ground truth labels from the eval dataset.\n :type eval_preds: Tuple[List[str], List[str]]\n\n :return: Dictionary with user-defined metrics with names keys and numbers as values\n :rtype: Dict[str, float]\n\n\n**Example:**\n\n.. code-block:: python\n\n\t# From the SFT tutorial notebook\n\tdef sample_compute_metrics(eval_preds): \n\t\t\"\"\"Optional function to compute eval metrics based on predictions and labels\"\"\"\n\t\tpredictions, labels = eval_preds\n\n\t\t# Standard text-based eval metrics: Rouge and BLEU\n\t\timport evaluate\n\t\trouge = evaluate.load(\"rouge\")\n\t\tbleu = evaluate.load(\"bleu\")\n\n\t\trouge_output = rouge.compute(predictions=predictions, references=labels, use_stemmer=True)\n\t\trouge_l = rouge_output[\"rougeL\"]\n\t\tbleu_output = bleu.compute(predictions=predictions, references=labels)\n\t\tbleu_score = bleu_output[\"bleu\"]\n\n\t\treturn {\"rougeL\": round(rouge_l, 4), \"bleu\": round(bleu_score, 4)}"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
(2): Normalize({})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/golden_rag_tuned_minilm_100")
# Run inference
sentences = [
'What are all the Experiment class methods (experiment ops) provided by RapidFire AI, and what does each one do?',
'.. py:class:: RFLangChainRagSpec\n\n .. py:method:: __init__(document_loader: BaseLoader = None, text_splitter: TextSplitter = None, embedding_cfg: dict[str, Any] = None, vector_store_cfg: dict[str, Any] = None, retriever: BaseRetriever = None, search_cfg: dict[str, Any] = None, reranker_cfg: dict[str, Any] = None, enable_gpu_search: bool = False, document_template: Callable[[Document], str] = None)\n\n Initialize the RAG specification with document loading, chunking, embedding, indexing, retrieval, and reranking configurations.\n\n :param document_loader: The loader for source documents from various sources (files, directories, databases, etc.). Must be a LangChain BaseLoader implementation.\n :type document_loader: BaseLoader, optional\n\n :param text_splitter: The text splitter for chunking documents for RAG purposes. Controls chunk size, overlap, and splitting strategy. Must be a LangChain TextSplitter.\n :type text_splitter: TextSplitter, optional\n\n :param embedding_cfg: The embedding class and its kwargs to convert a chunk/query into a vector, provided as a single dictionary. Must include a key :code:`"class"` with the class itself as value, not an instance. Options for the class include :class:`HuggingFaceEmbeddings` and :class:`OpenAIEmbeddings`. The kwargs that follow must contain all parameters needed to initialize the embedding class; required parameters vary by embedding class. For example, :class:`HuggingFaceEmbeddings` needs :code:`model_name`, :code:`model_kwargs` and :code:`device`, while :class:`OpenAIEmbeddings` needs :code:`"model"` and :code:`"api_key"`.\n :type embedding_cfg: dict[str, Any], optional\n\n :param vector_store_cfg: The vector store type and args to store and possibly index embedding vectors for retrieval, provided as a single dictionary. \n \n - :code:`"type"`: The type of vector store to use. Must be one of :code:`"faiss"`, :code:`"pgvector"`, or :code:`"pinecone"`. Required.\n - :code:`"batch_size"`: Number of vectors per insert batch. Applies to all 3 types of stores. Optional; default is 128.\n\n The remaining keys are type-specific args as listed below. The vector store operates in one of 3 modes depending on the rest of the RAG spec:\n\n - **Create mode:** When :code:`document_loader` is provided and no pre-existing index/collection names are specified, a new vector store is *created* and populated from the loaded documents.\n - **Read mode:** When :code:`document_loader` is absent and pre-existing index/collection names are specified, the vector store is opened in *read-only* mode for retrieval against the existing index.\n - **Update mode:** When both :code:`document_loader` and pre-existing index/collection names are provided, the existing index/collection is *updated* with the new documents added to it.\n\n Supported vector store types and their arg keys:\n\n - **FAISS:** No additional keys. Uses a flat L2 index by default. Set :code:`enable_gpu_search=True` on the constructor to use GPU-accelerated FAISS. Only supports Create mode since it\'s an in-memory store that is not persistent. So, the notion of pre-existing indexes does not apply.\n\n - **Pinecone:**\n\n - :code:`"pinecone_api_key"`: Pinecone API key. Optional if the :code:`PINECONE_API_KEY` environment variable is set.\n - :code:`"index_namespace"`: A 2-tuple of strings (:code:`tuple[str, str]`) with index name and namespace. Required for Read/Update mode and must be a pre-existing index and namespace (NB: namespace can be empty string :code:`""` in Pinecone). N/A for Create mode.\n - :code:`"spec"`: A :code:`ServerlessSpec` or :code:`PodSpec` instance specifying the Pinecone deployment (e.g., cloud and region). Required for Create mode. N/A for Read/Update mode.\n - :code:`"metric"`: Distance metric for the index, must be one of :code:`"cosine"`, :code:`"euclidean"`, or :code:`"dotproduct"`. Optional for Create mode; default is :code:`"cosine"`. N/A for Read/Update mode.\n - :code:`"embedding_cfg"`: Embedding config dict (same format as the top-level :code:`embedding_cfg`). Required for any mode either here or in the top-level config for any mode. If provided here, *this takes precedence* over the top-level embedding config. For Create mode, we recommend providing it in the top-level config unless you want to couple different embedding configs with different vector stores.\n - :code:`"text_key"`: The metadata field name used to store the original raw text content associated with a vector in Pinecone. Optional; default is :code:`"text"`. Applicable to all modes. This is useful when the Pinecone index was populated by an external tool that stored text under a non-default metadata field name (e.g., :code:`"content"`, :code:`"original_text"`).\n - :code:`"vector_type"`: Vector type for the index. Accepts a :code:`VectorType` value or string. Optional for Create mode; default is :code:`"dense"`. N/A for Read/Update mode.\n - :code:`"tags"`: Arbitrary string key-value tags to attach to the index. Optional for Create mode; default is :code:`None`. N/A for Read/Update mode.\n - :code:`"timeout"`: Timeout in seconds for index operations. Optional for Create mode; default is :code:`None`. N/A for Read/Update mode.\n - :code:`"deletion_protection"`: Whether deletion protection is enabled. Accepts a :code:`DeletionProtection` value or string. Optional for Create mode; default is :code:`"disabled"`. N/A for Read/Update mode.\n\n To recap, for all 3 modes :code:`"pinecone_api_key"` is needed either here or as an environment variable; :code:`embedding_cfg` is also required either here or in the top-level config. The :code:`"text_key"` is optional for all modes and defaults to :code:`"text"`. \n \n For Create mode, :code:`"spec"` is required but the following are all optional: :code:`"metric"`, :code:`"vector_type"`, :code:`"tags"`, :code:`"timeout"`, and :code:`"deletion_protection"`. Although the argument :code:`"index_namespace"` is inapplicable, internally RapidFire AI creates an index name automatically with prefix "rf-" and an SHA hash per pre-processing worker to avoid naming conflicts; the namespace created is the default empty string.\n \n For Read/Update mode, :code:`"index_namespace"` is required and must point to a pre-existing index and namespace. All the other arguments are inapplicable.\n\n - **Postgres PGVector:**\n\n - :code:`"connection"`: DB connection string or engine. Required for all modes.\n - :code:`"collection_name"`: A pre-existing PGVector collection/table name to use for retrieval. Required for Read/Update mode. Inapplicable to Create mode; an SHA-based random name will be generated.\n - :code:`"embedding_cfg"`: Same explanation as above under Pinecone.\n - :code:`"pre_delete_collection"`: If :code:`True`, *deletes* the collection if it already exists before writing. **Use with caution.** Optional; default is :code:`False`. Applicable only to Update mode.\n\n The store is built from the documents provided via :code:`document_loader`. If this entire config is skipped, a default FAISS flat vector store will be created automatically.\n :type vector_store_cfg: dict[str, Any], optional\n\n :param retriever: The retriever for chunk retrieval. If not provided, a default FAISS vector store will be created automatically using the specified search configuration below. Must be a LangChain BaseRetriever implementation.\n :type retriever: BaseRetriever, optional\n\n :param search_cfg: The search algorithm type and its kwargs to use for retrieval of vectors/chunks, provided as a single dictionary. Must include a key :code:`"type"` with one of the following three options listed as value; default is :code:`"similarity"`.\n\n * :code:`"similarity"`: Standard cosine similarity search.\n * :code:`"similarity_score_threshold"`: Similarity search with minimum score threshold (SST).\n * :code:`"mmr"`: Maximum Marginal Relevance (MMR) search for diversity.\n\n Additional parameters for search configuration depend on the type; the keys can include the following:\n\n * :code:`"k"`: Number of documents to retrieve. Default is 5.\n * :code:`"filter"`: Optional filter criteria function for search results.\n * :code:`"score_threshold"`: Only for SST. Minimum similarity score threshold. \n * :code:`"fetch_k"`: Only for MMR. Number of documents to fetch before MMR reranking. Default is 20.\n * :code:`"lambda_mult"`: Only for MMR. Diversity parameter for MMR balancing relevance vs. diversity. Default is 0.5.\n :type search_cfg: dict, optional\n\n :param reranker_cfg: The reranker class and its kwargs for reordering retrieved chunks by relevance, provided as a single dictionary. Must include a key :code:`"class"` with the class itself as value, not an instance. Options include :class:`CrossEncoderReranker` from :code:`langchain.retrievers.document_compressors`. The instantiated reranker is applied to each query\'s results individually. The kwargs that follow must contain all parameters needed to initialize the reranker class; required parameters vary by reranker class. For example, :class:`CrossEncoderReranker` needs :code:`model_name`, :code:`model_kwargs` and :code:`top_n`.\n :type reranker_cfg: dict[str, Any], optional\n\n :param enable_gpu_search: If :code:`True`, uses GPU-accelerated FAISS (IndexFlatL2 on GPU) with matrix multiply for exact search. Otherwise uses CPU-based FAISS HNSW index (IndexHNSWFlat) for approximate search. GPU mode requires :code:`faiss-gpu` package and CUDA-compatible GPU. Default is :code:`False`.\n :type enable_gpu_search: bool, optional\n\n :param document_template: Optional function to format each retrieved chunk for context injection into prompts. Should accept a single LangChain :class:`Document` object and return a formatted string. Multiple documents are separated by double newlines when serialized. If not provided, the following default template is used:\n \n .. code-block:: python\n \n def default_template(doc: Document) -> str:\n """Default document formatting template."""\n metadata = "; ".join([f"{k}: {v}" for k, v in doc.metadata.items()])\n return f"{metadata}:\\n{doc.page_content}"\n \n You can provide a custom template to control what metadata fields are included and how the content is formatted. For example, to include only a specific metadata field:\n \n .. code-block:: python\n \n def sample_template(doc: Document) -> str:\n doc_source = doc.metadata.get("source", "")\n return f"Document Source: {doc_source}:\\nContent: {doc.page_content}"\n \n Or for a dataset like SciFact where documents have a :code:`"title"` metadata field ingested via :code:`metadata_func` in the document loader:\n \n .. code-block:: python\n \n def custom_template(doc: Document) -> str:\n return f"{doc.metadata[\'title\']}: {doc.page_content}"\n \n :type document_template: Callable[[Document], str], optional',
'Preprocess Function\n-------------------\n\nMandatory user-provided function to prepare the inputs to be given to the generator model. \nIt is invoked for each batch during the evaluation process before generation.\nPass it directly to the :code:`preprocess_fn` key in your eval config dictionary.\n\nThe system injects into this function the batch data, as well as the RAG spec and \nthe prompt manager of an individual leaf config.\n\n\n.. py:function:: preprocess_fn(batch: dict[str, list], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager) -> dict[str, list]\n\n :param batch: Dictionary with a batch of examples with dataset field names as keys and lists as values\n :type batch: dict[str, list]\n\n :param rag: RAG specification object for document chunk retrieval and context serialization\n :type rag: RFLangChainRagSpec\n\n :param prompt_manager: Prompt manager object for handling instructions and few-shot examples\n :type prompt_manager: RFPromptManager\n\n :return: Dictionary with the preprocessed batch. It must have a reserved key :code:`"prompts"` for the fully formatted prompts for the generator. Other key-value pairs from the original batch can also be copied over if you want.\n :rtype: dict[str, list]\n\n\n**Examples:**\n\n.. code-block:: python\n\n # Example 1 from FiQA use case: RAG-based preprocessing with document chunk retrieval\n # This example demonstrates how metadata fields ingested via metadata_func in the\n # document loader (e.g., "corpus_id") are accessible on each Document object\'s\n # .metadata dict after retrieval, enabling retrieval evaluation.\n def sample_preprocess_fn(batch: dict[str, list], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager) -> dict[str, list]:\n\t\t"""Function to prepare the final inputs given to the generator model"""\n \n\t\tINSTRUCTIONS = "Utilize your financial knowledge, give your answer or opinion to the input question or subject matter."\n \n\t\t# Perform batched retrieval over all queries; returns a list of lists of k documents per query\n\t\tall_context = rag.get_context(batch_queries=batch["query"], serialize=False)\n \n\t\t# Extract the retrieved document ids from the context.\n\t\t# The "corpus_id" metadata field was ingested via metadata_func in the document loader\n\t\t# (see RFLangChainRagSpec examples) and is now accessible on each Document object.\n\t\tretrieved_documents = [\n\t\t\t[doc.metadata["corpus_id"] for doc in docs] for docs in all_context\n\t\t]\n \n\t\t# Serialize the retrieved documents into a single string per query using the document_template.\n\t\t# If a custom document_template was provided in the RAG spec (e.g., to include title metadata),\n\t\t# it is applied here; otherwise the default "metadata:\\ncontent" template is used.\n\t\tserialized_context = rag.serialize_documents(all_context)\n\t\tbatch["query_id"] = [int(query_id) for query_id in batch["query_id"]]\n \n\t\t# Each batch to contain conversational prompt, retrieved documents, and original \'query_id\', \'query\', \'metadata\'\n\t\treturn {\n\t\t\t"prompts": [\n\t\t\t\t[\n\t\t\t\t\t{"role": "system", "content": INSTRUCTIONS},\n\t\t\t\t\t{\n\t\t\t\t\t\t"role": "user",\n\t\t\t\t\t\t"content": f"Here is some relevant context:\\n{context}. \\nNow answer the following question using the context provided earlier:\\n{question}",\n\t\t\t\t\t},\n\t\t\t\t]\n\t\t\t\tfor question, context in zip(batch["query"], serialized_context)\n\t\t\t],\n\t\t\t"retrieved_documents": retrieved_documents,\n\t\t\t**batch,\n\t\t}\n\n.. code-block:: python\n\n # Example 2 from GSM8K use case: Few-shot learning preprocessing without RAG\n def sample_preprocess_fn(batch: dict[str, list], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager) -> dict[str, list]:\n\t\t"""Function to prepare the final inputs given to the generator model"""\n\n\t\treturn {\n\t\t\t"prompts": [\n\t\t\t\t[\n\t\t\t\t\t{"role": "system", "content": prompt_manager.get_instructions()},\n\t\t\t\t\t{\n\t\t\t\t\t\t"role": "user",\n\t\t\t\t\t\t"content": f"Here are some examples: \\n{examples}. \\nNow answer the following question:\\n{question}",\n\t\t\t\t\t},\n\t\t\t\t]\n\t\t\t\tfor question, examples in zip(\n\t\t\t\t\tbatch["question"],\n\t\t\t\t\tprompt_manager.get_fewshot_examples(user_queries=batch["question"]),\n\t\t\t\t)\n\t\t\t],\n\t\t\t**batch,\n\t\t}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.1773, 0.4343],
# [0.1773, 1.0000, 0.4398],
# [0.4343, 0.4398, 1.0000]])
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
How does RapidFire AI's approach to multi-config experimentation unify training (run_fit) and evaluation (run_evals) workflows under a common adaptive execution model, and what are the key differences in how each workflow exposes parallelism controls, return values, and user-provided functions? |
Step 5: Monitor training behaviors on ML metrics dashboard |
0.0 |
How does the num_shards parameter in run_evals() relate to the online aggregation confidence interval computation, and what is the end-to-end flow from setting num_shards to seeing narrowing confidence intervals on eval metrics? |
|
0.0 |
How does the num_shards parameter in run_evals() relate to the online aggregation confidence interval computation, and what is the end-to-end flow from setting num_shards to seeing narrowing confidence intervals on eval metrics? |
|
1.0 |
ContrastiveLoss with these parameters:{
"distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
"margin": 0.5,
"size_average": true
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 100multi_dataset_batch_sampler: round_robindo_predict: Falseprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 100max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 17.8571 | 500 | 0.0078 |
| 35.7143 | 1000 | 0.0032 |
| 53.5714 | 1500 | 0.0027 |
| 71.4286 | 2000 | 0.0024 |
| 89.2857 | 2500 | 0.0022 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@inproceedings{hadsell2006dimensionality,
author={Hadsell, R. and Chopra, S. and LeCun, Y.},
booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
title={Dimensionality Reduction by Learning an Invariant Mapping},
year={2006},
volume={2},
number={},
pages={1735-1742},
doi={10.1109/CVPR.2006.100}
}
Base model
nreimers/MiniLM-L6-H384-uncased