SentenceTransformer based on klue/roberta-base

This is a sentence-transformers model finetuned from klue/roberta-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: klue/roberta-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '와디즈에서 보내는 광고 메일은 받지마',
    '지메일을 쓰면 첨부파일을 몇개까지 보낼 수 있지?',
    '문 여는 방법도 비교적 쉽고 간단합니다!',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.3477
spearman_cosine 0.3556

Semantic Similarity

Metric Value
pearson_cosine 0.9609
spearman_cosine 0.9191

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,501 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 20.09 tokens
    • max: 60 tokens
    • min: 5 tokens
    • mean: 19.5 tokens
    • max: 55 tokens
    • min: 0.0
    • mean: 0.44
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    지난 1990년 핀란드에서 처음 도입한 ‘탄소세’를 온실가스 저감 수단으로 활용하려는 각국의 움직임도 활발하다. 각국은 또한 온실 가스 배출을 줄이기 위한 수단으로 1990년 핀란드에서 처음 도입된 '탄소 세금'을 적극적으로 사용하려고 하고 있습니다. 0.42000000000000004
    그러므로 역 근처 숙소가 가장 편리하다고 생각합니다. 그래서 저는 역 근처에 있는 숙소가 가장 편리하다고 생각합니다. 0.82
    도는 그 일환으로 BC카드 매출의 64%는 10억원 이상 매장에서 사용되는 반면 지역화폐는 3억원 미만 매장에서 사용되는 비율이 36.7%라는 근거를 제시했다. 서울의 15억원 초과 집값 상승률은 12월 3주 0.40%에서 1월 1주 -0.08%로 하락 전환했다. 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss spearman_cosine
-1 -1 - 0.3556
0.0761 50 - 0.8659
0.1522 100 - 0.8890
0.2283 150 - 0.8969
0.3044 200 - 0.9008
0.3805 250 - 0.9026
0.4566 300 - 0.9055
0.5327 350 - 0.9023
0.6088 400 - 0.9076
0.6849 450 - 0.9019
0.7610 500 0.0282 0.9067
0.8371 550 - 0.9060
0.9132 600 - 0.9090
0.9893 650 - 0.9074
1.0 657 - 0.9077
1.0654 700 - 0.9091
1.1416 750 - 0.9120
1.2177 800 - 0.9085
1.2938 850 - 0.9117
1.3699 900 - 0.9137
1.4460 950 - 0.9126
1.5221 1000 0.008 0.9137
1.5982 1050 - 0.9148
1.6743 1100 - 0.9155
1.7504 1150 - 0.9134
1.8265 1200 - 0.9141
1.9026 1250 - 0.9142
1.9787 1300 - 0.9155
2.0 1314 - 0.9163
2.0548 1350 - 0.9174
2.1309 1400 - 0.9177
2.2070 1450 - 0.9171
2.2831 1500 0.005 0.9191

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kammbo/klue-roberta-base-klue-sts

Base model

klue/roberta-base
Finetuned
(419)
this model
Finetunes
1 model

Paper for kammbo/klue-roberta-base-klue-sts

Evaluation results