SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("josangho99/ko-paraphrase-multilingual-MiniLM-L12-v2-multiTask-Fin")
# Run inference
sentences = [
    '연차주주총회는 이 투자회사의 등록사무소나 총회 소집 통지서에 기재되는 룩셈부르크의 다른 장소에서 개최됩니다.',
    '연차주주총회는 이 투자회사의 등록사무소나 총회 소집 통지서에 기재되는 룩셈부르크의 다른 장소에서 개최됩니다.',
    '② 국제거래에 대해서는 「소득세법」 제41조와 「법인세법」 제52조를 적용하지 아니한다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 0.0730],
#         [1.0000, 1.0000, 0.0730],
#         [0.0730, 0.0730, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9873
spearman_cosine 0.8671
pearson_euclidean 0.975
spearman_euclidean 0.8667
pearson_manhattan 0.9749
spearman_manhattan 0.8667
pearson_dot 0.9289
spearman_dot 0.8659
pearson_max 0.9873
spearman_max 0.8671

Semantic Similarity

Metric Value
pearson_cosine 0.9874
spearman_cosine 0.8672
pearson_euclidean 0.9752
spearman_euclidean 0.867
pearson_manhattan 0.975
spearman_manhattan 0.867
pearson_dot 0.9293
spearman_dot 0.866
pearson_max 0.9874
spearman_max 0.8672

Training Details

Training Dataset

Unnamed Dataset

  • Size: 203,584 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 15 tokens
    • mean: 47.4 tokens
    • max: 128 tokens
    • min: 14 tokens
    • mean: 45.62 tokens
    • max: 128 tokens
  • Samples:
    sentence_0 sentence_1
    기존 패러다임의 한계를 극복하고 새로운 도 에 응하기 해 국가발 패러다임을 성장과 복지가 함께 가는 동반성장 략으로 환할 필요가 있다. 기존 패러다임의 한계를 극복하고 국가발 패러다임을 성장과 복지가 함께 가는 동반성장 전략으로 환할 필요가 있다.
    회사는 계속기록법에 따라 기록한 재고자산의 수량을 실지 재고조사에 의하여 확정하고, 그 금액은 개별법에 의하여 평가하고 있습니다. 회사는 계속기록법에 따라 기록한 재고자산의 수량을 실지 재고조사에 의하여 확정하고, 그 금액은 개별법에 의하여 평가하고 있습니다.
    금융당국은 2014년 몇 차례 공청회를 개최하여 각계의 의견 수렴을 바탕으로 K-NCR 제도 개선의 방향과 의지를 강하게 표명하고 있다. 금융당국은 2014년 몇 차례 공청회를 개최하여 각계의 의견 수렴을 바탕으로 K-NCR 제도 개선의 방향과 의지를 강하게 표명하고 있다.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 2
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss spearman_cosine
0.1000 318 - 0.8672
0.1572 500 0.0011 -
0.1999 636 - 0.8672
0.2999 954 - 0.8671
0.3144 1000 0.0015 -
0.3999 1272 - 0.8672
0.4715 1500 0.0012 -
0.4998 1590 - 0.8672
0.5998 1908 - 0.8672
0.6287 2000 0.0009 -
0.6998 2226 - 0.8671
0.7859 2500 0.0011 -
0.7997 2544 - 0.8672
0.8997 2862 - 0.8672
0.9431 3000 0.0004 -
0.9997 3180 - 0.8671
1.0 3181 - 0.8671
1.0997 3498 - 0.8672
1.1003 3500 0.0009 -
1.1996 3816 - 0.8672
1.2575 4000 0.0008 -
1.2996 4134 - 0.8671
1.3996 4452 - 0.8671
1.4146 4500 0.0007 -
1.4995 4770 - 0.8671
1.5718 5000 0.0003 -
1.5995 5088 - 0.8671
1.6995 5406 - 0.8671
1.7290 5500 0.0002 -
1.7994 5724 - 0.8671
1.8862 6000 0.0003 -
1.8994 6042 - 0.8672
1.9994 6360 - 0.8672
2.0 6362 - 0.8671
-1 -1 - 0.8672

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for josangho99/ko-paraphrase-multilingual-MiniLM-L12-v2-multiTask-Fin

Evaluation results