Add new SentenceTransformer model

8173285 verified 5 months ago

36.8 kB

tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:97975
  - loss:MultipleNegativesRankingLoss
base_model: google/embeddinggemma-300m
widget:
  - source_sentence: >-
      task: classification | query: Trong bài viết này chúng ta sẽ thảo luận:
      Lợi ích của năng lượng mặt trời trong trường học. Năng lượng mặt trời là
      nguồn tài nguyên đầy hứa hẹn và có giá trị cao cho tương lai, với xu hướng
      sử dụng loại năng lượng này ngày càng tăng, một số trường học đang tìm
      cách sử dụng nguồn năng lượng này hiệu quả hơn. Câu hỏi đặt ra là tại sao
      các trường học lại tìm kiếm sự chuyển đổi này? và làm thế nào năng lượng
      mặt trời có thể được sử dụng trong trường học? Năm 2020, Hoa Kỳ đã ghi
      nhận tổng cộng 7300 k-12 trường học sử dụng tấm pin mặt trời để tạo ra
      điện, với mức tăng trưởng hàng năm là 24% từ năm 2017 đến năm 2020. Việc
      lắp đặt tấm pin mặt trời trong trường học có thể giúp: Giúp các học khu
      giảm chi phí hóa đơn tiền điện. Mang lại một môi trường sạch sẽ cho học
      sinh. Nâng cao nhận thức về năng lượng tái tạo. Tận d
    sentences:
      - trông người đã thấy hãm tài
      - Sức khỏe - Đời sống
      - Khoa học môi trường
  - source_sentence: "task: classification | query: Vitaco chuẩn bị bán gần 19 triệu cổ phần\r\nNgày 9/12 tới, Công ty Vận tải Xăng dầu (Vitaco) tổ chức bán đấu giá gần 19 triệu cổ phần qua Trung tâm giao dịch chứng khoán TP HCM và Hà Nội, với giá khởi điểm 10.200 đồng/cổ phần.\r\nTheo Trung tâm giao dịch chứng khoán TP HCM, các pháp nhân, thể nhân có nhu cầu tham gia mua cổ phần của Vitaco phải nộp hồ sơ đăng ký theo mẫu và đúng thời hạn. Số lượng cổ phần đăng ký mua tối thiểu là 500. Mệnh giá 10.000 đồng/cổ phần.\r\nCông ty Vận tải Xăng dầu Vitaco, có trụ sở chính tại số 12 đường Lê Duẩn, quận 1, TP HCM. Vốn điều lệ của Vitaco hiện nay là 400 tỷ đồng. Vitaco kinh doanh các sản phẩm xăng dầu bằng đường biển, ngoại thương, cung ứng vật tư, đại lý tàu biển, vệ sinh... và dịch vụ môi giới hàng hải."
    sentences:
      - Kinh doanh quốc tế
      - Chứng khoán
      - bặm miệng lại
  - source_sentence: "task: classification | query: Nhật Bản học tập kinh nghiệm điều trị cúm gia cầm của VN\r\nTrung tâm Y tế Quốc tế Nhật Bản đã quyết định hợp tác với Bệnh viện Bạch Mai (Hà Nội) về chẩn đoán và điều trị cho các bệnh nhân nhiễm virút cúm gia cầm thông qua truyền hình trực tiếp trên Internet, đồng thời cử các bác sĩ, chuyên gia y tế đến thực tập tại bệnh viện Bạch Mai. Nhằm đối phó với dịch cúm gia cầm thể mới, Trung tâm Y tế Quốc tế Nhật Bản được chỉ định là nơi chuyên chữa trị cho các bệnh nhân nhiễm virút cúm gia cầm. Tuy có nhiều trang thiết bị hiện đại, nhưng nhân viên của Trung tâm vẫn còn thiếu kinh nghiệm thực tế. Thông qua hợp tác với bệnh viện Bạch Mai, Trung tâm hy vọng sẽ đào tạo được một đội ngũ nhân viên có kinh nghiệm thực tế, có thể xử lý nhanh khi Nhật Bản có nhiều người bị nhiễm virút cúm gia cầm."
    sentences:
      - Cúm gà
      - phục hồi lại nguyên trạng
      - '"Mỗi lần nắng mới hắt bên song, Xao xác, gà trưa gáy não nùng."'
  - source_sentence: >-
      task: sentence similarity | query: phần nước đậm đặc, tinh tuý nhất do
      vắt, ép, ngâm hoặc nấu lần đầu mà có
    sentences:
      - Giải trí; Âm nhạc
      - tóc bỏ lơi
      - nước cốt trầu
  - source_sentence: >-
      task: sentence similarity | query: tập hợp 500 tờ giấy hay 20 thếp giấy,
      làm thành đơn vị để tính số lượng giấy
    sentences:
      - bầu không khí nặng nề
      - Tổ chức toàn cầu
      - in hết hai ram giấy
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@2
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_accuracy@100
  - cosine_precision@1
  - cosine_precision@2
  - cosine_precision@5
  - cosine_precision@10
  - cosine_precision@100
  - cosine_recall@1
  - cosine_recall@2
  - cosine_recall@5
  - cosine_recall@10
  - cosine_recall@100
  - cosine_ndcg@10
  - cosine_mrr@1
  - cosine_mrr@2
  - cosine_mrr@5
  - cosine_mrr@10
  - cosine_mrr@100
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on google/embeddinggemma-300m
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.13612565445026178
            name: Cosine Accuracy@1
          - type: cosine_accuracy@2
            value: 0.1806741985854689
            name: Cosine Accuracy@2
          - type: cosine_accuracy@5
            value: 0.2604941673555617
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.3390281987691742
            name: Cosine Accuracy@10
          - type: cosine_accuracy@100
            value: 0.7170019289060348
            name: Cosine Accuracy@100
          - type: cosine_precision@1
            value: 0.13612565445026178
            name: Cosine Precision@1
          - type: cosine_precision@2
            value: 0.09033709929273445
            name: Cosine Precision@2
          - type: cosine_precision@5
            value: 0.05209883347111234
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.03390281987691743
            name: Cosine Precision@10
          - type: cosine_precision@100
            value: 0.007170019289060347
            name: Cosine Precision@100
          - type: cosine_recall@1
            value: 0.13612565445026178
            name: Cosine Recall@1
          - type: cosine_recall@2
            value: 0.1806741985854689
            name: Cosine Recall@2
          - type: cosine_recall@5
            value: 0.2604941673555617
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.3390281987691742
            name: Cosine Recall@10
          - type: cosine_recall@100
            value: 0.7170019289060348
            name: Cosine Recall@100
          - type: cosine_ndcg@10
            value: 0.22552433960734286
            name: Cosine Ndcg@10
          - type: cosine_mrr@1
            value: 0.13612565445026178
            name: Cosine Mrr@1
          - type: cosine_mrr@2
            value: 0.15839992651786533
            name: Cosine Mrr@2
          - type: cosine_mrr@5
            value: 0.1801919720767884
            name: Cosine Mrr@5
          - type: cosine_mrr@10
            value: 0.19070534830385946
            name: Cosine Mrr@10
          - type: cosine_mrr@100
            value: 0.20385519306962407
            name: Cosine Mrr@100
          - type: cosine_map@100
            value: 0.20385519306962605
            name: Cosine Map@100

SentenceTransformer based on google/embeddinggemma-300m

This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google/embeddinggemma-300m
Maximum Sequence Length: 2048 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("meandyou200175/gemma_topic_modal")
# Run inference
queries = [
    "task: sentence similarity | query: t\u1eadp h\u1ee3p 500 t\u1edd gi\u1ea5y hay 20 th\u1ebfp gi\u1ea5y, l\u00e0m th\u00e0nh \u0111\u01a1n v\u1ecb \u0111\u1ec3 t\u00ednh s\u1ed1 l\u01b0\u1ee3ng gi\u1ea5y",
]
documents = [
    'in hết hai ram giấy',
    'Tổ chức toàn cầu',
    'bầu không khí nặng nề',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.4646,  0.0266, -0.0251]])

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.1361
cosine_accuracy@2	0.1807
cosine_accuracy@5	0.2605
cosine_accuracy@10	0.339
cosine_accuracy@100	0.717
cosine_precision@1	0.1361
cosine_precision@2	0.0903
cosine_precision@5	0.0521
cosine_precision@10	0.0339
cosine_precision@100	0.0072
cosine_recall@1	0.1361
cosine_recall@2	0.1807
cosine_recall@5	0.2605
cosine_recall@10	0.339
cosine_recall@100	0.717
cosine_ndcg@10	0.2255
cosine_mrr@1	0.1361
cosine_mrr@2	0.1584
cosine_mrr@5	0.1802
cosine_mrr@10	0.1907
cosine_mrr@100	0.2039
cosine_map@100	0.2039

Training Details

Training Dataset

Unnamed Dataset

Size: 97,975 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 11 tokens
mean: 137.68 tokens
max: 301 tokens

min: 3 tokens
mean: 8.56 tokens
max: 39 tokens

	anchor	positive
type	string	string
details	min: 11 tokens mean: 137.68 tokens max: 301 tokens	min: 3 tokens mean: 8.56 tokens max: 39 tokens

Samples:

anchor	positive
`task: sentence similarity \| query: luống`	`trồng mấy liếp rau`
`task: sentence similarity \| query: không còn có quan hệ tình cảm và tình dục, do bất hoà`	`vợ chồng sống li thân`
`task: sentence similarity \| query: đánh bật khỏi một vị trí, một địa vị nào đó để chiếm lấy`	`Nhật hất cẳng Pháp ở chiến trường Đông Dương`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false
}

Evaluation Dataset

Unnamed Dataset

Size: 10,887 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 10 tokens
mean: 130.94 tokens
max: 350 tokens

min: 3 tokens
mean: 8.25 tokens
max: 36 tokens

	anchor	positive
type	string	string
details	min: 10 tokens mean: 130.94 tokens max: 350 tokens	min: 3 tokens mean: 8.25 tokens max: 36 tokens

Samples:

anchor	positive
`task: sentence similarity \| query: dải phù sa ở dọc sông hay cửa sông`	`doi cát`
task: classification \| query: Theo hãng phân tích JP Morgan, Apple khả năng kỳ vọng Phố Wall quý 2, bất chấp vấn đề chuỗi cung ứng biến động kinh tế vĩ mô. Cụ thể, ghi gửi đầu tư, phân tích Samik Chatterjee JP Morgan hay, "không lo lắng Phố Wall" báo cáo doanh thu Apple – dự kiến công bố 28/7. Mặc rủi ro trung hạn, hy vọng doanh thu doanh iPhone mẽ. iPhone 13 Series "đắt hàng". Nhà phân tích định, chuỗi cung ứng cải thiện yếu kém nhu cầu dự đoán, Apple doanh thu 4 - 8 tỷ USD 3 (tháng 4 – 6). Phố Wall dự kiến, "Nhà Táo" báo cáo doanh thu 82 tỷ USD quý 2, tương đương kỳ vọng 82,1 tỷ USD Chatterjee. Thêm nữa, phân tích hay, phân khúc sản phẩm Mac thể ảnh hưởng cung cấp. Mặt khác, quý nhất, Chatterjee doanh thu dự kiến khiêm tốn. Ông tốc độ trưởng Mac iPad khả năng chi tiêu tiêu xuống. iPhone 11 giá Việt Nam.	`Sức khỏe - Đời sống`
`task: classification \| query: Khó thống nhất việc hiệp thương giá bán than`
Cuộc họp do Bộ Tài chính chủ trì với sự tham gia của Bộ Công nghiệp, Tổng công ty Than Việt Nam (TVN) cuối tuần qua đã đi đến kết luận TVN sẽ tiến hành hiệp thương về giá với các đơn vị tiêu thụ lớn trong vòng 15 ngày tới.
Trong trường hợp hai bên mua bán không hiệp thương được thì cơ quan hữu trách sẽ có những biện pháp giải quyết. Trước đó, các cơ quan hữu trách đã yêu cầu TVN trong thời gian hiệp thương về giá vẫn phải đảm bảo cung cấp đủ than cho các hộ tiêu thụ lớn với mức giá tạm tính theo giá của quý IV năm nay.
Bình luận về việc hiệp thương giá giữa TVN và các hộ tiêu thụ lớn, các chuyên gia cho rằng khó có thể đi đến kết quả thống nhất bởi quyền lợi mỗi bên rất khác nhau.	`Kinh doanh`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 5
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	cosine_ndcg@10
0.0327	100	1.8498	-	-
0.0653	200	1.0175	-	-
0.0980	300	0.7418	-	-
0.1306	400	0.6431	-	-
0.1633	500	0.6181	-	-
0.1960	600	0.5806	-	-
0.2286	700	0.6151	-	-
0.2613	800	0.5552	-	-
0.2939	900	0.5811	-	-
0.3266	1000	0.6113	-	-
0.3592	1100	0.6057	-	-
0.3919	1200	0.6167	-	-
0.4246	1300	0.6254	-	-
0.4572	1400	0.6138	-	-
0.4899	1500	0.6281	-	-
0.5225	1600	0.6567	-	-
0.5552	1700	0.6276	-	-
0.5879	1800	0.6779	-	-
0.6205	1900	0.6172	-	-
0.6532	2000	0.6295	-	-
0.6858	2100	0.6065	-	-
0.7185	2200	0.5892	-	-
0.7511	2300	0.6015	-	-
0.7838	2400	0.5633	-	-
0.8165	2500	0.5123	-	-
0.8491	2600	0.5389	-	-
0.8818	2700	0.5092	-	-
0.9144	2800	0.5297	-	-
0.9471	2900	0.5423	-	-
0.9798	3000	0.5261	-	-
1.0124	3100	0.4951	-	-
1.0451	3200	0.4157	-	-
1.0777	3300	0.3943	-	-
1.1104	3400	0.4216	-	-
1.1430	3500	0.4047	-	-
1.1757	3600	0.3904	-	-
1.2084	3700	0.383	-	-
1.2410	3800	0.4125	-	-
1.2737	3900	0.3971	-	-
1.3063	4000	0.4039	-	-
1.3390	4100	0.3879	-	-
1.3717	4200	0.3985	-	-
1.4043	4300	0.405	-	-
1.4370	4400	0.3616	-	-
1.4696	4500	0.3866	-	-
1.5023	4600	0.3941	-	-
1.5349	4700	0.3875	-	-
1.5676	4800	0.3697	-	-
1.6003	4900	0.3829	-	-
1.6329	5000	0.3939	0.4345	0.1848
1.6656	5100	0.3656	-	-
1.6982	5200	0.3564	-	-
1.7309	5300	0.3925	-	-
1.7636	5400	0.371	-	-
1.7962	5500	0.3624	-	-
1.8289	5600	0.3683	-	-
1.8615	5700	0.3805	-	-
1.8942	5800	0.3601	-	-
1.9268	5900	0.3365	-	-
1.9595	6000	0.3538	-	-
1.9922	6100	0.3602	-	-
2.0248	6200	0.2514	-	-
2.0575	6300	0.2195	-	-
2.0901	6400	0.2327	-	-
2.1228	6500	0.2233	-	-
2.1555	6600	0.2073	-	-
2.1881	6700	0.242	-	-
2.2208	6800	0.2427	-	-
2.2534	6900	0.232	-	-
2.2861	7000	0.239	-	-
2.3187	7100	0.2219	-	-
2.3514	7200	0.2481	-	-
2.3841	7300	0.2252	-	-
2.4167	7400	0.2339	-	-
2.4494	7500	0.2243	-	-
2.4820	7600	0.223	-	-
2.5147	7700	0.2383	-	-
2.5474	7800	0.2269	-	-
2.5800	7900	0.2237	-	-
2.6127	8000	0.2331	-	-
2.6453	8100	0.2056	-	-
2.6780	8200	0.2438	-	-
2.7106	8300	0.2241	-	-
2.7433	8400	0.2172	-	-
2.7760	8500	0.2155	-	-
2.8086	8600	0.2312	-	-
2.8413	8700	0.2091	-	-
2.8739	8800	0.2284	-	-
2.9066	8900	0.2303	-	-
2.9393	9000	0.2068	-	-
2.9719	9100	0.2095	-	-
3.0046	9200	0.1915	-	-
3.0372	9300	0.1496	-	-
3.0699	9400	0.1416	-	-
3.1025	9500	0.1309	-	-
3.1352	9600	0.1436	-	-
3.1679	9700	0.1527	-	-
3.2005	9800	0.1426	-	-
3.2332	9900	0.1405	-	-
3.2658	10000	0.1395	0.4000	0.2179
3.2985	10100	0.1337	-	-
3.3312	10200	0.1356	-	-
3.3638	10300	0.1336	-	-
3.3965	10400	0.1274	-	-
3.4291	10500	0.1246	-	-
3.4618	10600	0.1294	-	-
3.4944	10700	0.1355	-	-
3.5271	10800	0.1323	-	-
3.5598	10900	0.1342	-	-
3.5924	11000	0.1576	-	-
3.6251	11100	0.1318	-	-
3.6577	11200	0.1317	-	-
3.6904	11300	0.1232	-	-
3.7231	11400	0.1307	-	-
3.7557	11500	0.1315	-	-
3.7884	11600	0.13	-	-
3.8210	11700	0.1234	-	-
3.8537	11800	0.1164	-	-
3.8863	11900	0.1322	-	-
3.9190	12000	0.128	-	-
3.9517	12100	0.1301	-	-
3.9843	12200	0.1227	-	-
4.0170	12300	0.0951	-	-
4.0496	12400	0.0983	-	-
4.0823	12500	0.091	-	-
4.1150	12600	0.0744	-	-
4.1476	12700	0.0815	-	-
4.1803	12800	0.0833	-	-
4.2129	12900	0.0738	-	-
4.2456	13000	0.0749	-	-
4.2782	13100	0.0656	-	-
4.3109	13200	0.0812	-	-
4.3436	13300	0.0948	-	-
4.3762	13400	0.098	-	-
4.4089	13500	0.0828	-	-
4.4415	13600	0.0896	-	-
4.4742	13700	0.0817	-	-
4.5069	13800	0.0771	-	-
4.5395	13900	0.0742	-	-
4.5722	14000	0.0718	-	-
4.6048	14100	0.0868	-	-
4.6375	14200	0.0902	-	-
4.6702	14300	0.0682	-	-
4.7028	14400	0.0784	-	-
4.7355	14500	0.0813	-	-
4.7681	14600	0.0796	-	-
4.8008	14700	0.0797	-	-
4.8334	14800	0.0742	-	-
4.8661	14900	0.073	-	-
4.8988	15000	0.0693	0.3748	0.2255
4.9314	15100	0.0765	-	-
4.9641	15200	0.0675	-	-
4.9967	15300	0.0801	-	-

Framework Versions

Python: 3.12.6
Sentence Transformers: 5.1.2
Transformers: 4.56.0
PyTorch: 2.8.0+cu129
Accelerate: 1.10.1
Datasets: 4.4.1
Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}