Sentence Similarity
sentence-transformers
Safetensors
bert
feature-extraction
dense
Generated from Trainer
dataset_size:111470
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use redis/model-b-structured with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use redis/model-b-structured with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("redis/model-b-structured") sentences = [ "when was the first elephant brought to america", "Old Bet The first elephant brought to the United States was in 1796, aboard the America which set sail from Calcutta for New York on December 3, 1795.[4] However, it is not certain that this was Old Bet.[2] The first references to Old Bet start in 1804 in Boston as part of a menagerie.[1] In 1808, while residing in Somers, New York, Hachaliah Bailey purchased the menagerie elephant for $1,000 and named it \"Old Bet\".[5][6]", "Cronus Rhea secretly gave birth to Zeus in Crete, and handed Cronus a stone wrapped in swaddling clothes, also known as the Omphalos Stone, which he promptly swallowed, thinking that it was his son.", "Renal artery One or two accessory renal arteries are frequently found, especially on the left side since they usually arise from the aorta, and may come off above (more common) or below the main artery. Instead of entering the kidney at the hilus, they usually pierce the upper or lower part of the organ." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Upload model trained on identity trade-off experiment
Browse files- 1_Pooling/config.json +10 -0
- README.md +372 -0
- config.json +24 -0
- config_sentence_transformers.json +14 -0
- model.safetensors +3 -0
- modules.json +14 -0
- sentence_bert_config.json +4 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +58 -0
- vocab.txt +0 -0
1_Pooling/config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"word_embedding_dimension": 512,
|
| 3 |
+
"pooling_mode_cls_token": true,
|
| 4 |
+
"pooling_mode_mean_tokens": false,
|
| 5 |
+
"pooling_mode_max_tokens": false,
|
| 6 |
+
"pooling_mode_mean_sqrt_len_tokens": false,
|
| 7 |
+
"pooling_mode_weightedmean_tokens": false,
|
| 8 |
+
"pooling_mode_lasttoken": false,
|
| 9 |
+
"include_prompt": true
|
| 10 |
+
}
|
README.md
ADDED
|
@@ -0,0 +1,372 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- sentence-transformers
|
| 4 |
+
- sentence-similarity
|
| 5 |
+
- feature-extraction
|
| 6 |
+
- dense
|
| 7 |
+
- generated_from_trainer
|
| 8 |
+
- dataset_size:100000
|
| 9 |
+
- loss:MultipleNegativesRankingLoss
|
| 10 |
+
base_model: prajjwal1/bert-small
|
| 11 |
+
widget:
|
| 12 |
+
- source_sentence: How do I calculate IQ?
|
| 13 |
+
sentences:
|
| 14 |
+
- What is the easiest way to know my IQ?
|
| 15 |
+
- How do I calculate not IQ ?
|
| 16 |
+
- What are some creative and innovative business ideas with less investment in India?
|
| 17 |
+
- source_sentence: How can I learn martial arts in my home?
|
| 18 |
+
sentences:
|
| 19 |
+
- How can I learn martial arts by myself?
|
| 20 |
+
- What are the advantages and disadvantages of investing in gold?
|
| 21 |
+
- Can people see that I have looked at their pictures on instagram if I am not following
|
| 22 |
+
them?
|
| 23 |
+
- source_sentence: When Enterprise picks you up do you have to take them back?
|
| 24 |
+
sentences:
|
| 25 |
+
- Are there any software Training institute in Tuticorin?
|
| 26 |
+
- When Enterprise picks you up do you have to take them back?
|
| 27 |
+
- When Enterprise picks you up do them have to take youback?
|
| 28 |
+
- source_sentence: What are some non-capital goods?
|
| 29 |
+
sentences:
|
| 30 |
+
- What are capital goods?
|
| 31 |
+
- How is the value of [math]\pi[/math] calculated?
|
| 32 |
+
- What are some non-capital goods?
|
| 33 |
+
- source_sentence: What is the QuickBooks technical support phone number in New York?
|
| 34 |
+
sentences:
|
| 35 |
+
- What caused the Great Depression?
|
| 36 |
+
- Can I apply for PR in Canada?
|
| 37 |
+
- Which is the best QuickBooks Hosting Support Number in New York?
|
| 38 |
+
pipeline_tag: sentence-similarity
|
| 39 |
+
library_name: sentence-transformers
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
# SentenceTransformer based on prajjwal1/bert-small
|
| 43 |
+
|
| 44 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [prajjwal1/bert-small](https://huggingface.co/prajjwal1/bert-small). It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 45 |
+
|
| 46 |
+
## Model Details
|
| 47 |
+
|
| 48 |
+
### Model Description
|
| 49 |
+
- **Model Type:** Sentence Transformer
|
| 50 |
+
- **Base model:** [prajjwal1/bert-small](https://huggingface.co/prajjwal1/bert-small) <!-- at revision 0ec5f86f27c1a77d704439db5e01c307ea11b9d4 -->
|
| 51 |
+
- **Maximum Sequence Length:** 128 tokens
|
| 52 |
+
- **Output Dimensionality:** 512 dimensions
|
| 53 |
+
- **Similarity Function:** Cosine Similarity
|
| 54 |
+
<!-- - **Training Dataset:** Unknown -->
|
| 55 |
+
<!-- - **Language:** Unknown -->
|
| 56 |
+
<!-- - **License:** Unknown -->
|
| 57 |
+
|
| 58 |
+
### Model Sources
|
| 59 |
+
|
| 60 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
| 61 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
|
| 62 |
+
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 63 |
+
|
| 64 |
+
### Full Model Architecture
|
| 65 |
+
|
| 66 |
+
```
|
| 67 |
+
SentenceTransformer(
|
| 68 |
+
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
|
| 69 |
+
(1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 70 |
+
)
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
## Usage
|
| 74 |
+
|
| 75 |
+
### Direct Usage (Sentence Transformers)
|
| 76 |
+
|
| 77 |
+
First install the Sentence Transformers library:
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
pip install -U sentence-transformers
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
Then you can load this model and run inference.
|
| 84 |
+
```python
|
| 85 |
+
from sentence_transformers import SentenceTransformer
|
| 86 |
+
|
| 87 |
+
# Download from the 🤗 Hub
|
| 88 |
+
model = SentenceTransformer("redis/model-b-structured")
|
| 89 |
+
# Run inference
|
| 90 |
+
sentences = [
|
| 91 |
+
'What is the QuickBooks technical support phone number in New York?',
|
| 92 |
+
'Which is the best QuickBooks Hosting Support Number in New York?',
|
| 93 |
+
'Can I apply for PR in Canada?',
|
| 94 |
+
]
|
| 95 |
+
embeddings = model.encode(sentences)
|
| 96 |
+
print(embeddings.shape)
|
| 97 |
+
# [3, 512]
|
| 98 |
+
|
| 99 |
+
# Get the similarity scores for the embeddings
|
| 100 |
+
similarities = model.similarity(embeddings, embeddings)
|
| 101 |
+
print(similarities)
|
| 102 |
+
# tensor([[1.0000, 0.8563, 0.0594],
|
| 103 |
+
# [0.8563, 1.0000, 0.1245],
|
| 104 |
+
# [0.0594, 0.1245, 1.0000]])
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
<!--
|
| 108 |
+
### Direct Usage (Transformers)
|
| 109 |
+
|
| 110 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
| 111 |
+
|
| 112 |
+
</details>
|
| 113 |
+
-->
|
| 114 |
+
|
| 115 |
+
<!--
|
| 116 |
+
### Downstream Usage (Sentence Transformers)
|
| 117 |
+
|
| 118 |
+
You can finetune this model on your own dataset.
|
| 119 |
+
|
| 120 |
+
<details><summary>Click to expand</summary>
|
| 121 |
+
|
| 122 |
+
</details>
|
| 123 |
+
-->
|
| 124 |
+
|
| 125 |
+
<!--
|
| 126 |
+
### Out-of-Scope Use
|
| 127 |
+
|
| 128 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 129 |
+
-->
|
| 130 |
+
|
| 131 |
+
<!--
|
| 132 |
+
## Bias, Risks and Limitations
|
| 133 |
+
|
| 134 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
| 135 |
+
-->
|
| 136 |
+
|
| 137 |
+
<!--
|
| 138 |
+
### Recommendations
|
| 139 |
+
|
| 140 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
| 141 |
+
-->
|
| 142 |
+
|
| 143 |
+
## Training Details
|
| 144 |
+
|
| 145 |
+
### Training Dataset
|
| 146 |
+
|
| 147 |
+
#### Unnamed Dataset
|
| 148 |
+
|
| 149 |
+
* Size: 100,000 training samples
|
| 150 |
+
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
|
| 151 |
+
* Approximate statistics based on the first 1000 samples:
|
| 152 |
+
| | sentence_0 | sentence_1 | sentence_2 |
|
| 153 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 154 |
+
| type | string | string | string |
|
| 155 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 15.79 tokens</li><li>max: 66 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.68 tokens</li><li>max: 66 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 16.37 tokens</li><li>max: 67 tokens</li></ul> |
|
| 156 |
+
* Samples:
|
| 157 |
+
| sentence_0 | sentence_1 | sentence_2 |
|
| 158 |
+
|:-----------------------------------------------------------------|:-----------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 159 |
+
| <code>Is masturbating bad for boys?</code> | <code>Is masturbating bad for boys?</code> | <code>How harmful or unhealthy is masturbation?</code> |
|
| 160 |
+
| <code>Does a train engine move in reverse?</code> | <code>Does a train engine move in reverse?</code> | <code>Time moves forward, not in reverse. Doesn't that make time a vector?</code> |
|
| 161 |
+
| <code>What is the most badass thing anyone has ever done?</code> | <code>What is the most badass thing anyone has ever done?</code> | <code>anyone is the most badass thing Whathas ever done?</code> |
|
| 162 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 163 |
+
```json
|
| 164 |
+
{
|
| 165 |
+
"scale": 20.0,
|
| 166 |
+
"similarity_fct": "cos_sim",
|
| 167 |
+
"gather_across_devices": false
|
| 168 |
+
}
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
### Training Hyperparameters
|
| 172 |
+
#### Non-Default Hyperparameters
|
| 173 |
+
|
| 174 |
+
- `per_device_train_batch_size`: 64
|
| 175 |
+
- `per_device_eval_batch_size`: 64
|
| 176 |
+
- `fp16`: True
|
| 177 |
+
- `multi_dataset_batch_sampler`: round_robin
|
| 178 |
+
|
| 179 |
+
#### All Hyperparameters
|
| 180 |
+
<details><summary>Click to expand</summary>
|
| 181 |
+
|
| 182 |
+
- `overwrite_output_dir`: False
|
| 183 |
+
- `do_predict`: False
|
| 184 |
+
- `eval_strategy`: no
|
| 185 |
+
- `prediction_loss_only`: True
|
| 186 |
+
- `per_device_train_batch_size`: 64
|
| 187 |
+
- `per_device_eval_batch_size`: 64
|
| 188 |
+
- `per_gpu_train_batch_size`: None
|
| 189 |
+
- `per_gpu_eval_batch_size`: None
|
| 190 |
+
- `gradient_accumulation_steps`: 1
|
| 191 |
+
- `eval_accumulation_steps`: None
|
| 192 |
+
- `torch_empty_cache_steps`: None
|
| 193 |
+
- `learning_rate`: 5e-05
|
| 194 |
+
- `weight_decay`: 0.0
|
| 195 |
+
- `adam_beta1`: 0.9
|
| 196 |
+
- `adam_beta2`: 0.999
|
| 197 |
+
- `adam_epsilon`: 1e-08
|
| 198 |
+
- `max_grad_norm`: 1
|
| 199 |
+
- `num_train_epochs`: 3
|
| 200 |
+
- `max_steps`: -1
|
| 201 |
+
- `lr_scheduler_type`: linear
|
| 202 |
+
- `lr_scheduler_kwargs`: {}
|
| 203 |
+
- `warmup_ratio`: 0.0
|
| 204 |
+
- `warmup_steps`: 0
|
| 205 |
+
- `log_level`: passive
|
| 206 |
+
- `log_level_replica`: warning
|
| 207 |
+
- `log_on_each_node`: True
|
| 208 |
+
- `logging_nan_inf_filter`: True
|
| 209 |
+
- `save_safetensors`: True
|
| 210 |
+
- `save_on_each_node`: False
|
| 211 |
+
- `save_only_model`: False
|
| 212 |
+
- `restore_callback_states_from_checkpoint`: False
|
| 213 |
+
- `no_cuda`: False
|
| 214 |
+
- `use_cpu`: False
|
| 215 |
+
- `use_mps_device`: False
|
| 216 |
+
- `seed`: 42
|
| 217 |
+
- `data_seed`: None
|
| 218 |
+
- `jit_mode_eval`: False
|
| 219 |
+
- `bf16`: False
|
| 220 |
+
- `fp16`: True
|
| 221 |
+
- `fp16_opt_level`: O1
|
| 222 |
+
- `half_precision_backend`: auto
|
| 223 |
+
- `bf16_full_eval`: False
|
| 224 |
+
- `fp16_full_eval`: False
|
| 225 |
+
- `tf32`: None
|
| 226 |
+
- `local_rank`: 0
|
| 227 |
+
- `ddp_backend`: None
|
| 228 |
+
- `tpu_num_cores`: None
|
| 229 |
+
- `tpu_metrics_debug`: False
|
| 230 |
+
- `debug`: []
|
| 231 |
+
- `dataloader_drop_last`: False
|
| 232 |
+
- `dataloader_num_workers`: 0
|
| 233 |
+
- `dataloader_prefetch_factor`: None
|
| 234 |
+
- `past_index`: -1
|
| 235 |
+
- `disable_tqdm`: False
|
| 236 |
+
- `remove_unused_columns`: True
|
| 237 |
+
- `label_names`: None
|
| 238 |
+
- `load_best_model_at_end`: False
|
| 239 |
+
- `ignore_data_skip`: False
|
| 240 |
+
- `fsdp`: []
|
| 241 |
+
- `fsdp_min_num_params`: 0
|
| 242 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
| 243 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
| 244 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
| 245 |
+
- `parallelism_config`: None
|
| 246 |
+
- `deepspeed`: None
|
| 247 |
+
- `label_smoothing_factor`: 0.0
|
| 248 |
+
- `optim`: adamw_torch_fused
|
| 249 |
+
- `optim_args`: None
|
| 250 |
+
- `adafactor`: False
|
| 251 |
+
- `group_by_length`: False
|
| 252 |
+
- `length_column_name`: length
|
| 253 |
+
- `project`: huggingface
|
| 254 |
+
- `trackio_space_id`: trackio
|
| 255 |
+
- `ddp_find_unused_parameters`: None
|
| 256 |
+
- `ddp_bucket_cap_mb`: None
|
| 257 |
+
- `ddp_broadcast_buffers`: False
|
| 258 |
+
- `dataloader_pin_memory`: True
|
| 259 |
+
- `dataloader_persistent_workers`: False
|
| 260 |
+
- `skip_memory_metrics`: True
|
| 261 |
+
- `use_legacy_prediction_loop`: False
|
| 262 |
+
- `push_to_hub`: False
|
| 263 |
+
- `resume_from_checkpoint`: None
|
| 264 |
+
- `hub_model_id`: None
|
| 265 |
+
- `hub_strategy`: every_save
|
| 266 |
+
- `hub_private_repo`: None
|
| 267 |
+
- `hub_always_push`: False
|
| 268 |
+
- `hub_revision`: None
|
| 269 |
+
- `gradient_checkpointing`: False
|
| 270 |
+
- `gradient_checkpointing_kwargs`: None
|
| 271 |
+
- `include_inputs_for_metrics`: False
|
| 272 |
+
- `include_for_metrics`: []
|
| 273 |
+
- `eval_do_concat_batches`: True
|
| 274 |
+
- `fp16_backend`: auto
|
| 275 |
+
- `push_to_hub_model_id`: None
|
| 276 |
+
- `push_to_hub_organization`: None
|
| 277 |
+
- `mp_parameters`:
|
| 278 |
+
- `auto_find_batch_size`: False
|
| 279 |
+
- `full_determinism`: False
|
| 280 |
+
- `torchdynamo`: None
|
| 281 |
+
- `ray_scope`: last
|
| 282 |
+
- `ddp_timeout`: 1800
|
| 283 |
+
- `torch_compile`: False
|
| 284 |
+
- `torch_compile_backend`: None
|
| 285 |
+
- `torch_compile_mode`: None
|
| 286 |
+
- `include_tokens_per_second`: False
|
| 287 |
+
- `include_num_input_tokens_seen`: no
|
| 288 |
+
- `neftune_noise_alpha`: None
|
| 289 |
+
- `optim_target_modules`: None
|
| 290 |
+
- `batch_eval_metrics`: False
|
| 291 |
+
- `eval_on_start`: False
|
| 292 |
+
- `use_liger_kernel`: False
|
| 293 |
+
- `liger_kernel_config`: None
|
| 294 |
+
- `eval_use_gather_object`: False
|
| 295 |
+
- `average_tokens_across_devices`: True
|
| 296 |
+
- `prompts`: None
|
| 297 |
+
- `batch_sampler`: batch_sampler
|
| 298 |
+
- `multi_dataset_batch_sampler`: round_robin
|
| 299 |
+
- `router_mapping`: {}
|
| 300 |
+
- `learning_rate_mapping`: {}
|
| 301 |
+
|
| 302 |
+
</details>
|
| 303 |
+
|
| 304 |
+
### Training Logs
|
| 305 |
+
| Epoch | Step | Training Loss |
|
| 306 |
+
|:------:|:----:|:-------------:|
|
| 307 |
+
| 0.3199 | 500 | 0.4294 |
|
| 308 |
+
| 0.6398 | 1000 | 0.1268 |
|
| 309 |
+
| 0.9597 | 1500 | 0.1 |
|
| 310 |
+
| 1.2796 | 2000 | 0.0792 |
|
| 311 |
+
| 1.5995 | 2500 | 0.0706 |
|
| 312 |
+
| 1.9194 | 3000 | 0.0687 |
|
| 313 |
+
| 2.2393 | 3500 | 0.0584 |
|
| 314 |
+
| 2.5592 | 4000 | 0.057 |
|
| 315 |
+
| 2.8791 | 4500 | 0.0581 |
|
| 316 |
+
|
| 317 |
+
|
| 318 |
+
### Framework Versions
|
| 319 |
+
- Python: 3.10.18
|
| 320 |
+
- Sentence Transformers: 5.2.0
|
| 321 |
+
- Transformers: 4.57.3
|
| 322 |
+
- PyTorch: 2.9.1+cu128
|
| 323 |
+
- Accelerate: 1.12.0
|
| 324 |
+
- Datasets: 4.4.2
|
| 325 |
+
- Tokenizers: 0.22.1
|
| 326 |
+
|
| 327 |
+
## Citation
|
| 328 |
+
|
| 329 |
+
### BibTeX
|
| 330 |
+
|
| 331 |
+
#### Sentence Transformers
|
| 332 |
+
```bibtex
|
| 333 |
+
@inproceedings{reimers-2019-sentence-bert,
|
| 334 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
| 335 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
| 336 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
| 337 |
+
month = "11",
|
| 338 |
+
year = "2019",
|
| 339 |
+
publisher = "Association for Computational Linguistics",
|
| 340 |
+
url = "https://arxiv.org/abs/1908.10084",
|
| 341 |
+
}
|
| 342 |
+
```
|
| 343 |
+
|
| 344 |
+
#### MultipleNegativesRankingLoss
|
| 345 |
+
```bibtex
|
| 346 |
+
@misc{henderson2017efficient,
|
| 347 |
+
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
| 348 |
+
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
| 349 |
+
year={2017},
|
| 350 |
+
eprint={1705.00652},
|
| 351 |
+
archivePrefix={arXiv},
|
| 352 |
+
primaryClass={cs.CL}
|
| 353 |
+
}
|
| 354 |
+
```
|
| 355 |
+
|
| 356 |
+
<!--
|
| 357 |
+
## Glossary
|
| 358 |
+
|
| 359 |
+
*Clearly define terms in order to be accessible across audiences.*
|
| 360 |
+
-->
|
| 361 |
+
|
| 362 |
+
<!--
|
| 363 |
+
## Model Card Authors
|
| 364 |
+
|
| 365 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
| 366 |
+
-->
|
| 367 |
+
|
| 368 |
+
<!--
|
| 369 |
+
## Model Card Contact
|
| 370 |
+
|
| 371 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
| 372 |
+
-->
|
config.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"BertModel"
|
| 4 |
+
],
|
| 5 |
+
"attention_probs_dropout_prob": 0.1,
|
| 6 |
+
"classifier_dropout": null,
|
| 7 |
+
"dtype": "float32",
|
| 8 |
+
"hidden_act": "gelu",
|
| 9 |
+
"hidden_dropout_prob": 0.1,
|
| 10 |
+
"hidden_size": 512,
|
| 11 |
+
"initializer_range": 0.02,
|
| 12 |
+
"intermediate_size": 2048,
|
| 13 |
+
"layer_norm_eps": 1e-12,
|
| 14 |
+
"max_position_embeddings": 512,
|
| 15 |
+
"model_type": "bert",
|
| 16 |
+
"num_attention_heads": 8,
|
| 17 |
+
"num_hidden_layers": 4,
|
| 18 |
+
"pad_token_id": 0,
|
| 19 |
+
"position_embedding_type": "absolute",
|
| 20 |
+
"transformers_version": "4.57.3",
|
| 21 |
+
"type_vocab_size": 2,
|
| 22 |
+
"use_cache": true,
|
| 23 |
+
"vocab_size": 30522
|
| 24 |
+
}
|
config_sentence_transformers.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "SentenceTransformer",
|
| 3 |
+
"__version__": {
|
| 4 |
+
"sentence_transformers": "5.2.0",
|
| 5 |
+
"transformers": "4.57.3",
|
| 6 |
+
"pytorch": "2.9.1+cu128"
|
| 7 |
+
},
|
| 8 |
+
"prompts": {
|
| 9 |
+
"query": "",
|
| 10 |
+
"document": ""
|
| 11 |
+
},
|
| 12 |
+
"default_prompt_name": null,
|
| 13 |
+
"similarity_fn_name": "cosine"
|
| 14 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:405b994d838bd6f6b303976624123a775a0dc7ca262686085e4d2d45696a0ec9
|
| 3 |
+
size 115062416
|
modules.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"idx": 0,
|
| 4 |
+
"name": "0",
|
| 5 |
+
"path": "",
|
| 6 |
+
"type": "sentence_transformers.models.Transformer"
|
| 7 |
+
},
|
| 8 |
+
{
|
| 9 |
+
"idx": 1,
|
| 10 |
+
"name": "1",
|
| 11 |
+
"path": "1_Pooling",
|
| 12 |
+
"type": "sentence_transformers.models.Pooling"
|
| 13 |
+
}
|
| 14 |
+
]
|
sentence_bert_config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"max_seq_length": 128,
|
| 3 |
+
"do_lower_case": false
|
| 4 |
+
}
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cls_token": "[CLS]",
|
| 3 |
+
"mask_token": "[MASK]",
|
| 4 |
+
"pad_token": "[PAD]",
|
| 5 |
+
"sep_token": "[SEP]",
|
| 6 |
+
"unk_token": "[UNK]"
|
| 7 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"added_tokens_decoder": {
|
| 3 |
+
"0": {
|
| 4 |
+
"content": "[PAD]",
|
| 5 |
+
"lstrip": false,
|
| 6 |
+
"normalized": false,
|
| 7 |
+
"rstrip": false,
|
| 8 |
+
"single_word": false,
|
| 9 |
+
"special": true
|
| 10 |
+
},
|
| 11 |
+
"100": {
|
| 12 |
+
"content": "[UNK]",
|
| 13 |
+
"lstrip": false,
|
| 14 |
+
"normalized": false,
|
| 15 |
+
"rstrip": false,
|
| 16 |
+
"single_word": false,
|
| 17 |
+
"special": true
|
| 18 |
+
},
|
| 19 |
+
"101": {
|
| 20 |
+
"content": "[CLS]",
|
| 21 |
+
"lstrip": false,
|
| 22 |
+
"normalized": false,
|
| 23 |
+
"rstrip": false,
|
| 24 |
+
"single_word": false,
|
| 25 |
+
"special": true
|
| 26 |
+
},
|
| 27 |
+
"102": {
|
| 28 |
+
"content": "[SEP]",
|
| 29 |
+
"lstrip": false,
|
| 30 |
+
"normalized": false,
|
| 31 |
+
"rstrip": false,
|
| 32 |
+
"single_word": false,
|
| 33 |
+
"special": true
|
| 34 |
+
},
|
| 35 |
+
"103": {
|
| 36 |
+
"content": "[MASK]",
|
| 37 |
+
"lstrip": false,
|
| 38 |
+
"normalized": false,
|
| 39 |
+
"rstrip": false,
|
| 40 |
+
"single_word": false,
|
| 41 |
+
"special": true
|
| 42 |
+
}
|
| 43 |
+
},
|
| 44 |
+
"clean_up_tokenization_spaces": true,
|
| 45 |
+
"cls_token": "[CLS]",
|
| 46 |
+
"do_basic_tokenize": true,
|
| 47 |
+
"do_lower_case": true,
|
| 48 |
+
"extra_special_tokens": {},
|
| 49 |
+
"mask_token": "[MASK]",
|
| 50 |
+
"model_max_length": 128,
|
| 51 |
+
"never_split": null,
|
| 52 |
+
"pad_token": "[PAD]",
|
| 53 |
+
"sep_token": "[SEP]",
|
| 54 |
+
"strip_accents": null,
|
| 55 |
+
"tokenize_chinese_chars": true,
|
| 56 |
+
"tokenizer_class": "BertTokenizer",
|
| 57 |
+
"unk_token": "[UNK]"
|
| 58 |
+
}
|
vocab.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|