Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +1025 -0
config.json +23 -0
config_sentence_transformers.json +10 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +73 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,1025 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:5146
+- loss:MatryoshkaLoss
+- loss:MultipleNegativesRankingLoss
+base_model: sentence-transformers/all-mpnet-base-v2
+widget:
+- source_sentence: 'import subprocess
+    zen_of_python = subprocess.check_output(["python", "-c", "import this"])
+    corpus = zen_of_python.split()
+    num_partitions = 3
+    chunk = len(corpus) // num_partitions
+    partitions = [
+    corpus[i * chunk: (i + 1) * chunk] for i in range(num_partitions)
+    ]
+    Mapping Data#
+    To determine the map phase, we require a map function to use on each document.
+    The output is the pair (word, 1) for every word found in a document.
+    For basic text documents we load as Python strings, the process is as follows:
+    def map_function(document):
+    for word in document.lower().split():
+    yield word, 1
+    We use the apply_map function on a large collection of documents by marking it
+    as a task in Ray using the @ray.remote decorator.
+    When we call apply_map, we apply it to three sets of document data (num_partitions=3).
+    The apply_map function returns three lists, one for each partition so that Ray
+    can rearrange the results of the map phase and distribute them to the appropriate
+    nodes.
+    import ray'
+  sentences:
+  - What does the map_function yield for each word in a document?
+  - What does PBT do differently from traditional hyperparameter tuning methods?
+  - What is returned by task_with_static_multiple_returns_good in the Actor class?
+- source_sentence: '192.168.0.15 7241 Worker ffffffffffffffffffffffffffffffffffffffff0100000001000000
+    10 MiB PINNED_IN_MEMORY (deserialize task arg)
+    __main__.f
+    192.168.0.15 7207 Driver ffffffffffffffffffffffffffffffffffffffff0100000001000000
+    15 MiB USED_BY_PENDING_TASK (put object)
+    test.py:
+    <module>:28
+    While the task is running, we see that ray memory shows both a LOCAL_REFERENCE
+    and a USED_BY_PENDING_TASK reference for the object in the driver process. The
+    worker process also holds a reference to the object because the Python arg is
+    directly referencing the memory in the plasma, so it can’t be evicted; therefore
+    it is PINNED_IN_MEMORY.
+    4. Serialized ObjectRef references
+    @ray.remote
+    def f(arg):
+    while True:
+    pass
+    a = ray.put(None)
+    b = f.remote([a])'
+  sentences:
+  - How can a dataset be created from in-memory data?
+  - What does Algorithm.training_step return for the new API stack?
+  - Why can't the object be evicted while the worker process holds a reference?
+- source_sentence: 'For distributed systems engineers, Ray automatically handles key
+    processes:
+    Orchestration–Managing the various components of a distributed system.
+    Scheduling–Coordinating when and where tasks are executed.
+    Fault tolerance–Ensuring tasks complete regardless of inevitable points of failure.
+    Auto-scaling–Adjusting the number of resources allocated to dynamic demand.
+    What you can do with Ray#
+    These are some common ML workloads that individuals, organizations, and companies
+    leverage Ray to build their AI applications:
+    Batch inference on CPUs and GPUs
+    Model serving
+    Distributed training of large models
+    Parallel hyperparameter tuning experiments
+    Reinforcement learning
+    ML platform
+    Ray framework#
+    Stack of Ray libraries - unified toolkit for ML workloads.
+    Ray’s unified compute framework consists of three layers:'
+  sentences:
+  - What does remote_worker_envs control when num_envs_per_env_runner > 1?
+  - How is the learning rate set in the config?
+  - According to the excerpt, what does Ray automatically handle for distributed systems
+    engineers?
+- source_sentence: 'RLlib component tree#
+    The following is the structure of the RLlib component tree, showing under which
+    name you can
+    access a subcomponent’s own checkpoint within the higher-level checkpoint. At
+    the highest level
+    is the Algorithm class:
+    algorithm/
+    learner_group/
+    learner/
+    rl_module/
+    default_policy/ # <- single-agent case
+    [module ID 1]/ # <- multi-agent case
+    [module ID 2]/ # ...
+    env_runner/
+    env_to_module_connector/
+    module_to_env_connector/
+    Note
+    The env_runner/ subcomponent currently doesn’t hold a copy of the RLModule
+    checkpoint because it’s already saved under learner/. The Ray team is working
+    on resolving
+    this issue, probably through soft-linking to avoid duplicate files and unnecessary
+    disk usage.
+    Creating instances from a checkpoint with from_checkpoint#
+    Once you have a checkpoint of either a trained Algorithm or
+    any of its subcomponents, you can recreate new objects directly
+    from this checkpoint.
+    The following are two examples:'
+  sentences:
+  - Why does RLlib convert each row into a single-step episode by default?
+  - What is at the highest level of the RLlib component tree?
+  - What is recommended regarding AOF when using storage options that do not support
+    append operations?
+- source_sentence: 'Option 2: Manually Create URL (slower to implement, but recommended
+    for production environments)#
+    The second option is to manually create this URL by pattern-matching your specific
+    use case with one of the following examples.
+    This is recommended because it provides finer-grained control over which repository
+    branch and commit to use when generating your dependency zip file.
+    These options prevent consistency issues on Ray Clusters (see the warning above
+    for more info).
+    To create the URL, pick a URL template below that fits your use case, and fill
+    in all parameters in brackets (e.g. [username], [repository], etc.) with the specific
+    values from your repository.
+    For instance, suppose your GitHub username is example_user, the repository’s name
+    is example_repository, and the desired commit hash is abcdefg.
+    If example_repository is public and you want to retrieve the abcdefg commit (which
+    matches the first example use case), the URL would be:'
+  sentences:
+  - What can Ray Train and Ray Tune be used together for?
+  - How do you create the URL for Option 2?
+  - Which function can you use to read a CSV file for batch processing in Ray?
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: Fine-tune-all-mpnet-base-v2
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: dim 768
+      type: dim_768
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.5874125874125874
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.6818181818181818
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.7954545454545454
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.8863636363636364
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.5874125874125874
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.5180652680652681
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.3944055944055945
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.23199300699300698
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.263986013986014
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.6073717948717948
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.7521853146853147
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.8780594405594405
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.7386606603331115
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.6635614385614379
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.6988731642119342
+      name: Cosine Map@100
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: dim 512
+      type: dim_512
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.5734265734265734
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.666083916083916
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.8006993006993007
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.8811188811188811
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.5734265734265734
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.5052447552447552
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.39370629370629373
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.23094405594405593
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.26005244755244755
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.5914918414918414
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.7543706293706294
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.8726689976689977
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.7303335650898982
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.652235958485958
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.689387057080973
+      name: Cosine Map@100
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: dim 256
+      type: dim_256
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.5664335664335665
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.666083916083916
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.7797202797202797
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.8583916083916084
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.5664335664335665
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.5011655011655011
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.38636363636363635
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.22534965034965035
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.2577214452214452
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.5893065268065268
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.7354312354312353
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.8487762237762237
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.7167871578299232
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.6432942057942053
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.6823584299690649
+      name: Cosine Map@100
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: dim 128
+      type: dim_128
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.5402097902097902
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.6398601398601399
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.743006993006993
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.8304195804195804
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.5402097902097902
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.47960372960372966
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.3678321678321678
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.2181818181818182
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.24519230769230768
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.5623543123543123
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.701048951048951
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.8228438228438228
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.6886328428362513
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.6146582584082584
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.6543671947827556
+      name: Cosine Map@100
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: dim 64
+      type: dim_64
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.4353146853146853
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.5332167832167832
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.6311188811188811
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.7622377622377622
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.4353146853146853
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.3945221445221445
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.3094405594405594
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.19825174825174827
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.19842657342657344
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.46547202797202797
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.5910547785547785
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.7467948717948718
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.5953015131317417
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.5138784826284825
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.559206100539383
+      name: Cosine Map@100
+---
+# Fine-tune-all-mpnet-base-v2
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 12e86a3c702fc3c50205a8db88f0ec7c0b6b94a0 -->
+- **Maximum Sequence Length:** 384 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+- **Training Dataset:**
+    - json
+- **Language:** en
+- **License:** apache-2.0
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("thanhpham1/Fine-tune-all-mpnet-base-v2")
+# Run inference
+sentences = [
+    'Option 2: Manually Create URL (slower to implement, but recommended for production environments)#\nThe second option is to manually create this URL by pattern-matching your specific use case with one of the following examples.\nThis is recommended because it provides finer-grained control over which repository branch and commit to use when generating your dependency zip file.\nThese options prevent consistency issues on Ray Clusters (see the warning above for more info).\nTo create the URL, pick a URL template below that fits your use case, and fill in all parameters in brackets (e.g. [username], [repository], etc.) with the specific values from your repository.\nFor instance, suppose your GitHub username is example_user, the repository’s name is example_repository, and the desired commit hash is abcdefg.\nIf example_repository is public and you want to retrieve the abcdefg commit (which matches the first example use case), the URL would be:',
+    'How do you create the URL for Option 2?',
+    'What can Ray Train and Ray Tune be used together for?',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Dataset: `dim_768`
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "truncate_dim": 768
+  }
+  ```
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.5874     |
+| cosine_accuracy@3   | 0.6818     |
+| cosine_accuracy@5   | 0.7955     |
+| cosine_accuracy@10  | 0.8864     |
+| cosine_precision@1  | 0.5874     |
+| cosine_precision@3  | 0.5181     |
+| cosine_precision@5  | 0.3944     |
+| cosine_precision@10 | 0.232      |
+| cosine_recall@1     | 0.264      |
+| cosine_recall@3     | 0.6074     |
+| cosine_recall@5     | 0.7522     |
+| cosine_recall@10    | 0.8781     |
+| **cosine_ndcg@10**  | **0.7387** |
+| cosine_mrr@10       | 0.6636     |
+| cosine_map@100      | 0.6989     |
+#### Information Retrieval
+* Dataset: `dim_512`
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "truncate_dim": 512
+  }
+  ```
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.5734     |
+| cosine_accuracy@3   | 0.6661     |
+| cosine_accuracy@5   | 0.8007     |
+| cosine_accuracy@10  | 0.8811     |
+| cosine_precision@1  | 0.5734     |
+| cosine_precision@3  | 0.5052     |
+| cosine_precision@5  | 0.3937     |
+| cosine_precision@10 | 0.2309     |
+| cosine_recall@1     | 0.2601     |
+| cosine_recall@3     | 0.5915     |
+| cosine_recall@5     | 0.7544     |
+| cosine_recall@10    | 0.8727     |
+| **cosine_ndcg@10**  | **0.7303** |
+| cosine_mrr@10       | 0.6522     |
+| cosine_map@100      | 0.6894     |
+#### Information Retrieval
+* Dataset: `dim_256`
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "truncate_dim": 256
+  }
+  ```
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.5664     |
+| cosine_accuracy@3   | 0.6661     |
+| cosine_accuracy@5   | 0.7797     |
+| cosine_accuracy@10  | 0.8584     |
+| cosine_precision@1  | 0.5664     |
+| cosine_precision@3  | 0.5012     |
+| cosine_precision@5  | 0.3864     |
+| cosine_precision@10 | 0.2253     |
+| cosine_recall@1     | 0.2577     |
+| cosine_recall@3     | 0.5893     |
+| cosine_recall@5     | 0.7354     |
+| cosine_recall@10    | 0.8488     |
+| **cosine_ndcg@10**  | **0.7168** |
+| cosine_mrr@10       | 0.6433     |
+| cosine_map@100      | 0.6824     |
+#### Information Retrieval
+* Dataset: `dim_128`
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "truncate_dim": 128
+  }
+  ```
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.5402     |
+| cosine_accuracy@3   | 0.6399     |
+| cosine_accuracy@5   | 0.743      |
+| cosine_accuracy@10  | 0.8304     |
+| cosine_precision@1  | 0.5402     |
+| cosine_precision@3  | 0.4796     |
+| cosine_precision@5  | 0.3678     |
+| cosine_precision@10 | 0.2182     |
+| cosine_recall@1     | 0.2452     |
+| cosine_recall@3     | 0.5624     |
+| cosine_recall@5     | 0.701      |
+| cosine_recall@10    | 0.8228     |
+| **cosine_ndcg@10**  | **0.6886** |
+| cosine_mrr@10       | 0.6147     |
+| cosine_map@100      | 0.6544     |
+#### Information Retrieval
+* Dataset: `dim_64`
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "truncate_dim": 64
+  }
+  ```
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.4353     |
+| cosine_accuracy@3   | 0.5332     |
+| cosine_accuracy@5   | 0.6311     |
+| cosine_accuracy@10  | 0.7622     |
+| cosine_precision@1  | 0.4353     |
+| cosine_precision@3  | 0.3945     |
+| cosine_precision@5  | 0.3094     |
+| cosine_precision@10 | 0.1983     |
+| cosine_recall@1     | 0.1984     |
+| cosine_recall@3     | 0.4655     |
+| cosine_recall@5     | 0.5911     |
+| cosine_recall@10    | 0.7468     |
+| **cosine_ndcg@10**  | **0.5953** |
+| cosine_mrr@10       | 0.5139     |
+| cosine_map@100      | 0.5592     |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### json
+* Dataset: json
+* Size: 5,146 training samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                           | positive                                                                             |
+  |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                               |
+  | details | <ul><li>min: 8 tokens</li><li>mean: 17.8 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 66 tokens</li><li>mean: 225.02 tokens</li><li>max: 384 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                    | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+  |:----------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>Does Ray Train work with vanilla TensorFlow in addition to TensorFlow with Keras?</code>            | <code>Get Started with Distributed Training using TensorFlow/Keras#<br>Ray Train’s TensorFlow integration enables you<br>to scale your TensorFlow and Keras training functions to many machines and GPUs.<br>On a technical level, Ray Train schedules your training workers<br>and configures TF_CONFIG for you, allowing you to run<br>your MultiWorkerMirroredStrategy training script. See Distributed<br>training with TensorFlow<br>for more information.<br>Most of the examples in this guide use TensorFlow with Keras, but<br>Ray Train also works with vanilla TensorFlow.<br><br>Quickstart#<br>import ray<br>import tensorflow as tf<br><br>from ray import train<br>from ray.train import ScalingConfig<br>from ray.train.tensorflow import TensorflowTrainer<br>from ray.train.tensorflow.keras import ReportCheckpointCallback<br><br># If using GPUs, set this to True.<br>use_gpu = False<br><br>a = 5<br>b = 10<br>size = 100</code>                                                                                                             |
+  | <code>What type of failure can Ray automatically recover from?</code>                                     | <code>Ray can automatically recover from data loss but not owner failure.<br><br>Recovering from data loss#<br>When an object value is lost from the object store, such as during node<br>failures, Ray will use lineage reconstruction to recover the object.<br>Ray will first automatically attempt to recover the value by looking<br>for copies of the same object on other nodes. If none are found, then Ray will<br>automatically recover the value by re-executing<br>the task that previously created the value. Arguments to the task are<br>recursively reconstructed through the same mechanism.<br>Lineage reconstruction currently has the following limitations:</code>                                                                                                                                                                                                                                                                                                                                                                             |
+  | <code>From which directory should you run the zip command to ensure the proper zip file structure?</code> | <code>Suppose instead you want to host your files in your /some_path/example_dir directory remotely and provide a remote URI.<br>You would need to first compress the example_dir directory into a zip file.<br>There should be no other files or directories at the top level of the zip file, other than example_dir.<br>You can use the following command in the Terminal to do this:<br>cd /some_path<br>zip -r zip_file_name.zip example_dir<br><br>Note that this command must be run from the parent directory of the desired working_dir to ensure that the resulting zip file contains a single top-level directory.<br>In general, the zip file’s name and the top-level directory’s name can be anything.<br>The top-level directory’s contents will be used as the working_dir (or py_module).<br>You can check that the zip file contains a single top-level directory by running the following command in the Terminal:<br>zipinfo -1 zip_file_name.zip<br># example_dir/<br># example_dir/my_file_1.txt<br># example_dir/subdir/my_file_2.txt</code> |
+* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
+  ```json
+  {
+      "loss": "MultipleNegativesRankingLoss",
+      "matryoshka_dims": [
+          768,
+          512,
+          256,
+          128,
+          64
+      ],
+      "matryoshka_weights": [
+          1,
+          1,
+          1,
+          1,
+          1
+      ],
+      "n_dims_per_step": -1
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: epoch
+- `per_device_train_batch_size`: 32
+- `per_device_eval_batch_size`: 16
+- `gradient_accumulation_steps`: 16
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 4
+- `lr_scheduler_type`: cosine
+- `warmup_ratio`: 0.1
+- `bf16`: True
+- `tf32`: False
+- `load_best_model_at_end`: True
+- `optim`: adamw_torch_fused
+- `batch_sampler`: no_duplicates
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: epoch
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 32
+- `per_device_eval_batch_size`: 16
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 16
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 4
+- `max_steps`: -1
+- `lr_scheduler_type`: cosine
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: True
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: False
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: True
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch_fused
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch   | Step   | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
+|:-------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
+| 0.9938  | 10     | 44.0311       | -                      | -                      | -                      | -                      | -                     |
+| 1.0     | 11     | -             | 0.6797                 | 0.6651                 | 0.6439                 | 0.6180                 | 0.4996                |
+| 0.9938  | 10     | 14.5908       | -                      | -                      | -                      | -                      | -                     |
+| 1.0     | 11     | -             | 0.7179                 | 0.7034                 | 0.6927                 | 0.6658                 | 0.5720                |
+| 1.8944  | 20     | 8.5538        | -                      | -                      | -                      | -                      | -                     |
+| 2.0     | 22     | -             | 0.7295                 | 0.7209                 | 0.7109                 | 0.6793                 | 0.5942                |
+| 2.7950  | 30     | 6.916         | -                      | -                      | -                      | -                      | -                     |
+| **3.0** | **33** | **-**         | **0.7382**             | **0.7293**             | **0.7149**             | **0.6916**             | **0.5939**            |
+| 3.6957  | 40     | 6.5704        | -                      | -                      | -                      | -                      | -                     |
+| 4.0     | 44     | -             | 0.7387                 | 0.7303                 | 0.7168                 | 0.6886                 | 0.5953                |
+* The bold row denotes the saved checkpoint.
+### Framework Versions
+- Python: 3.11.12
+- Sentence Transformers: 4.1.0
+- Transformers: 4.52.3
+- PyTorch: 2.6.0+cu124
+- Accelerate: 1.7.0
+- Datasets: 3.6.0
+- Tokenizers: 0.21.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MatryoshkaLoss
+```bibtex
+@misc{kusupati2024matryoshka,
+    title={Matryoshka Representation Learning},
+    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
+    year={2024},
+    eprint={2205.13147},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "relative_attention_num_buckets": 32,
+  "torch_dtype": "float32",
+  "transformers_version": "4.52.3",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "4.1.0",
+    "transformers": "4.52.3",
+    "pytorch": "2.6.0+cu124"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac5fd8d14b8c8509a1708bad750086732938e2b9a4c60527eac35a92e247911e
+size 437967672

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 384,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30526": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": 128,
+  "model_max_length": 384,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "</s>",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff