SentenceTransformer based on Shailu1492/tibetan-mbert-v1-consecutive-segments

This is a sentence-transformers model finetuned from Shailu1492/tibetan-mbert-v1-consecutive-segments. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "bden pa'i gsung gis rab tu bka' stsal na",
    "bden pa gsung gis rab tu bka' stsal na",
    "byang chub sems kyi bsod nams gang // gal te de la gzugs mchis na// nam mkha'i khams 'di kun gang ste// de ni de bas lhag par 'gyur//",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9909, 0.7003],
#         [0.9909, 1.0000, 0.7065],
#         [0.7003, 0.7065, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9507
spearman_cosine 0.9535

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,000 training samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 8 tokens
    • mean: 30.7 tokens
    • max: 190 tokens
    • min: 6 tokens
    • mean: 30.97 tokens
    • max: 226 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    text1 text2 label
    brdar brtags chos nyid thob pa ste// sdom la gnas pa rnams la yod// sgra rnams kyis ni brdar btags ston// de ni tha snyad ched du byas// 0.3125
    zhe sdang gi rang bzhin gsal ba ste/ gti mug gi rang bzhin mi rtog par shes/ 0.125
    gsang ba'i dbang dang shes rab ye shes kyi dbang dang bzhi pa'i dbang zhu bar bya'o// gsang ba'i dbang ni ngag dag par byed pa'o// shes rab ye shes kyi dbang ni yid dag par byed pa'o// 0.5
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,000 evaluation samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 8 tokens
    • mean: 30.7 tokens
    • max: 190 tokens
    • min: 6 tokens
    • mean: 30.97 tokens
    • max: 226 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    text1 text2 label
    brdar brtags chos nyid thob pa ste// sdom la gnas pa rnams la yod// sgra rnams kyis ni brdar btags ston// de ni tha snyad ched du byas// 0.3125
    zhe sdang gi rang bzhin gsal ba ste/ gti mug gi rang bzhin mi rtog par shes/ 0.125
    gsang ba'i dbang dang shes rab ye shes kyi dbang dang bzhi pa'i dbang zhu bar bya'o// gsang ba'i dbang ni ngag dag par byed pa'o// shes rab ye shes kyi dbang ni yid dag par byed pa'o// 0.5
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • weight_decay: 0.1
  • num_train_epochs: 7
  • lr_scheduler_type: reduce_lr_on_plateau
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • bf16: True
  • dataloader_drop_last: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.1
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 7
  • max_steps: -1
  • lr_scheduler_type: reduce_lr_on_plateau
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss spearman_cosine
1.0 2 6.0687 2.5858 0.8629
2.0 4 5.5189 2.5479 0.8886
3.0 6 5.4863 2.4755 0.9129
4.0 8 5.3785 2.3440 0.9284
5.0 10 5.2816 2.2080 0.9355
6.0 12 5.2027 2.0943 0.9432
7.0 14 5.0971 1.9869 0.9535

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 5.2.2
  • Transformers: 5.1.0
  • PyTorch: 2.10.0+cu130
  • Accelerate: 1.12.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@article{10531646,
    author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
    year={2024},
    doi={10.1109/TASLP.2024.3402087}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Intellexus/Bi-Tib-mbert-v4

Paper for Intellexus/Bi-Tib-mbert-v4

Evaluation results