SentenceTransformer based on google/embeddinggemma-300m

This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google/embeddinggemma-300m
  • Maximum Sequence Length: 2048 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("embeddinggemma-300M-KorNLI")
# Run inference
queries = [
    "\ub124, \uc65c\ub0d0\uba74 \ubd84\uba85\ud788 \uacf5\uc7a5\uc744 \ud1b5\uacfc\ud55c \uac83\uc774\uc5c8\uc73c\ub2c8\uae4c\uc694",
]
documents = [
    '그래, 긁힌 자국 하나 없이 해냈어!',
    '오늘날 아이들은 독서보다 TV를 보는 데 더 많은 시간을 보낸다.',
    '그들은 최근의 강도 사건과 관련하여 구금되었다.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.2977, -0.0117,  0.2256]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,696 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 8 tokens
    • mean: 13.14 tokens
    • max: 35 tokens
    • min: 8 tokens
    • mean: 12.96 tokens
    • max: 30 tokens
    • min: 0.0
    • mean: 0.45
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    비행기가 이륙하고 있다. 비행기가 이륙하고 있다. 1.0
    한 남자가 큰 플루트를 연주하고 있다. 남자가 플루트를 연주하고 있다. 0.76
    한 남자가 피자에 치즈를 뿌려놓고 있다. 한 남자가 구운 피자에 치즈 조각을 뿌려놓고 있다. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,466 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 7 tokens
    • mean: 19.41 tokens
    • max: 159 tokens
    • min: 2 tokens
    • mean: 19.24 tokens
    • max: 51 tokens
    • min: 0.0
    • mean: 0.42
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    안전모를 가진 한 남자가 춤을 추고 있다. 안전모를 쓴 한 남자가 춤을 추고 있다. 1.0
    어린아이가 말을 타고 있다. 아이가 말을 타고 있다. 0.95
    한 남자가 뱀에게 쥐를 먹이고 있다. 남자가 뱀에게 쥐를 먹이고 있다. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 4
  • learning_rate: 2e-05
  • max_steps: 1000
  • lr_scheduler_type: cosine
  • warmup_steps: 100
  • push_to_hub: True
  • hub_model_id: embeddinggemma-300M-KorSTS
  • prompts: {'sentence1': 'text: ', 'sentence2': 'text: '}

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: 1000
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: embeddinggemma-300M-KorSTS
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: {'sentence1': 'text: ', 'sentence2': 'text: '}
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0145 500 0.1331
0.0291 1000 0.1414
0.0436 1500 0.1353
0.0582 2000 0.1335
0.0727 2500 0.1234
0.0872 3000 0.1187
0.1018 3500 0.1152
0.1163 4000 0.1102
1.4045 500 0.0396
2.8090 1000 0.0157

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.2
  • Transformers: 5.0.1.dev0
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for linu23/embeddinggemma-300M-KorSTS

Finetuned
(202)
this model

Paper for linu23/embeddinggemma-300M-KorSTS