SentenceTransformer based on x2bee/ModernBert_MLM_kotoken_v03
This is a sentence-transformers model finetuned from x2bee/ModernBert_MLM_kotoken_v03 on the misc_sts_pairs_v2_kor dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: x2bee/ModernBert_MLM_kotoken_v03
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("x2bee/KoModernBERT-base-nli-sts-SBERT_v01")
# Run inference
sentences = [
'수동 운전석 창문을 어떻게 수리하나요?',
'1992년형 혼다 시빅에서 올라가지 않는 수동 창문을 어떻게 수리하나요?',
'아홉 번째 닥터가 멈춘 닥터 후 에피소드는 무엇입니까?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts_dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.524 |
| spearman_cosine | 0.5139 |
| pearson_euclidean | 0.5051 |
| spearman_euclidean | 0.5001 |
| pearson_manhattan | 0.5087 |
| spearman_manhattan | 0.504 |
| pearson_dot | 0.4545 |
| spearman_dot | 0.4439 |
| pearson_max | 0.524 |
| spearman_max | 0.5139 |
Training Details
Training Dataset
misc_sts_pairs_v2_kor
- Dataset: misc_sts_pairs_v2_kor at 845f810
- Size: 449,904 training samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 6 tokens
- mean: 17.81 tokens
- max: 49 tokens
- min: 6 tokens
- mean: 17.78 tokens
- max: 80 tokens
- min: 0.53
- mean: 0.75
- max: 0.98
- Samples:
sentence1 sentence2 score 1999년형 유콘 4륜구동 차량의 앞쪽 조수석 타이어에서 발생하는 갈리는 소음의 원인은 무엇인가요?차의 오른쪽 앞쪽에서 발생하는 갈리는 소리의 원인은 무엇인가요?0.8193586337477191왜 제임스타운 정착민들은 그곳의 원주민들과 갈등을 겪었는가?왜 제임스타운은 원주민들과 갈등을 겪었는가?0.8701910827908218옥수수 전분을 섭취하는 것이 건강에 어떤 영향을 미칠 수 있습니까?옥수수 전분을 섭취하면 당신에게 어떤 영향을 미칠까요?0.8809354609563622 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Evaluation Dataset
misc_sts_pairs_v2_kor
- Dataset: misc_sts_pairs_v2_kor at 845f810
- Size: 449,904 evaluation samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 7 tokens
- mean: 17.76 tokens
- max: 65 tokens
- min: 6 tokens
- mean: 17.65 tokens
- max: 52 tokens
- min: 0.53
- mean: 0.75
- max: 0.98
- Samples:
sentence1 sentence2 score 용광로의 온도는 얼마나 뜨거운가?용광로의 온도는 얼마나 높습니까?0.751853250408994영어로 'Lei è il mio uno e solo'는 어떻게 철자하나요?'Lei è il mio uno e solo'의 영어 동등어는 무엇인가요?0.8265661603331053버드와이저 포커 광고에 나오는 소녀는 누구인가요?포커 스타일의 버드와이저 광고에 나오는 소녀는 누구인가요?0.9301912848973812 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: epochper_device_train_batch_size: 32per_device_eval_batch_size: 32gradient_accumulation_steps: 4learning_rate: 1e-05num_train_epochs: 2warmup_ratio: 0.3push_to_hub: Truehub_model_id: x2bee/KoModernBERT-base-nli-sts-SBERT_v01batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.3warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Truedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Trueresume_from_checkpoint: Nonehub_model_id: x2bee/KoModernBERT-base-nli-sts-SBERT_v01hub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | sts_dev_spearman_max |
|---|---|---|---|---|
| 0 | 0 | - | - | 0.5070 |
| 0.2397 | 100 | 0.0311 | - | - |
| 0.4793 | 200 | 0.0082 | - | - |
| 0.7190 | 300 | 0.0065 | - | - |
| 0.9587 | 400 | 0.0061 | - | - |
| 1.0 | 418 | - | 0.0059 | 0.4899 |
| 1.1965 | 500 | 0.0058 | - | - |
| 1.4362 | 600 | 0.0057 | - | - |
| 1.6759 | 700 | 0.0055 | - | - |
| 1.9155 | 800 | 0.0053 | - | - |
| 1.9970 | 834 | - | 0.0057 | 0.5139 |
Framework Versions
- Python: 3.11.10
- Sentence Transformers: 3.3.1
- Transformers: 4.48.0.dev0
- PyTorch: 2.5.1+cu124
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 8
Dataset used to train x2bee/KoModernBERT-base-nli-sts-SBERT_v01
Evaluation results
- Pearson Cosine on sts devself-reported0.524
- Spearman Cosine on sts devself-reported0.514
- Pearson Euclidean on sts devself-reported0.505
- Spearman Euclidean on sts devself-reported0.500
- Pearson Manhattan on sts devself-reported0.509
- Spearman Manhattan on sts devself-reported0.504
- Pearson Dot on sts devself-reported0.455
- Spearman Dot on sts devself-reported0.444
- Pearson Max on sts devself-reported0.524
- Spearman Max on sts devself-reported0.514