Bi-Tib-mbert-v1 / README.md
Shailu1492's picture
Upload 16 files
65ad280 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:600
  - loss:CoSENTLoss
  - dataset_size:2500
base_model: Intellexus/mbert-tibetan-continual-wylie-final
widget:
  - source_sentence: gong bu gzhan min de de'i min//
    sentences:
      - >-
        de la byang chub kyi yan lag bdun gang zhe na/ 'di lta ste/ dran pa yang
        dag byang chub kyi yan lag dang / chos rab rnam 'byed yang dag byang
        chub kyi yan lag dang / brtson 'grus yang dag byang chub kyi yan lag
        dang / dga' ba yang dag byang chub kyi yan lag dang / shin tu sbyangs pa
        yang dag byang chub kyi yan lag dang / ting nge 'dzin yang dag byang
        chub kyi yan lag dang / btang snyoms yang dag byang chub kyi yan lag
        ste/ de dag ni byang chub kyi yan lag bdun ces bya'o//
      - phung myin gal te de de myin//
      - >-
        sha ra dwa ti'i bu gzhan yang byang chub sems dpa' sems dpa' chen po
        byang sa las 'da' bar 'dod pas/ shes rab kyi pha rol tu phyin pa la
        bslab par bya'o//
  - source_sentence: kun rdzob tu ni thugs brtse bas// rgyu mthun de dag thub pa bzhed//
    sentences:
      - yang na sku gzugs ma nyams spyan ras sngar zlas
      - kun rdzob 'jig rten grags pa la// brtan na tshad ma'i rnam gzhag 'gal//
      - "gzhan gyi dbang gi ngo bo nyid//\r\nrnam rtog yin te rkyen las byung //\r\ngrub ni de la snga ma po//\r\nrtag tu med par gyur pa gang //"
  - source_sentence: >
      bdag las ma yin gzhan las min// gnyis las ma yin rgyu med min// dngos po
      gang dag gang na yang // skye ba nam yang yod ma yin//
    sentences:
      - |-
        shing rta che bu sems can che//

        rtag mo bkres mthong stag phrug rnams//

        thar bar bya phyir snying rje yis//
      - >
        phyogs chos de chas khyab pa yi// gtan tshigs de ni rnam gsum nyid// med
        na mi 'byung nges phyir ro// gtan tshigs ltar snang de las gzhan//
      - sems can rnams kyi 'dod chags byang gyur cig//
  - source_sentence: >-
      gang gi tshe rgyal po pad ma chen po dpung dang mthu che ba de'i tshe na/
      des kyang dpung gi tshogs yan lag bzhi pa/ glang po che pa'i tshogs dang /
      rta pa'i tshogs dang / shing rta pa'i tshogs dang / dpung bu chung gi
      tshogs go bskon te/ yul ang ga tsam pa ma gtogs pa bcom nas phyir ldog par
      byed do//
    sentences:
      - >-
        de tshe rig pa'i rgyal po bsgrub// gal te de ni rab byung gyur// sdom pa
        gsum la yang dag gnas// so sor thar dang byang chub sems// rig 'dzin
        sdom pa mchog yin no//
      - >-
        spyir theg pa zhes bya ba'i nges tshig ni/ ya na zhes bya ba 'gro ba'i
        bya ba ston pa'i tshig yin pas tshig gzugs por lam la bya'o//
      - >-
        rgyal po chen po 'di ltar yang dge sbyong dang / bram ze kha cig dad pas
        byin pa dag spyad nas ltad mo sna tshogs rtsom pa la sbyor bar brtson
        pas gnas pa 'di lta ste/
  - source_sentence: >-
      dam tshig nyams pa'i nyes pa ni/ 'dod pa'i phyogs mi 'grub cing / mi 'dod
      pa'i phyogs rnams thob pa ste/
    sentences:
      - |
        dam tshig dang ni mi ldan na// bsgrubs kyang 'grub par mi 'gyur te//
        rgyu med pa yi 'bras bu bzhin// tshe yi  dus byas dmyal bar 'gro//
      - >-
        rang sangs rgyas rnams kyi rnam par grol ba ni/ ngag gi lam dang bral ba
        las skyes pa/
      - |
        lha dang lha mo ji lta bas// bdud rtsi'i  bum pas dbang bskur ba//
        chu'i dgongs pa  ye shes lnga'i// rtags su  sku lnga rdzogs pa'o//
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: >-
      SentenceTransformer based on
      Intellexus/mbert-tibetan-continual-wylie-final
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: pearson_cosine
            value: 0.8350341193647188
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8539838973084938
            name: Spearman Cosine

SentenceTransformer based on Intellexus/mbert-tibetan-continual-wylie-final

This is a sentence-transformers model finetuned from Intellexus/mbert-tibetan-continual-wylie-final. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "dam tshig nyams pa'i nyes pa ni/ 'dod pa'i phyogs mi 'grub cing / mi 'dod pa'i phyogs rnams thob pa ste/",
    "dam tshig dang ni mi ldan na// bsgrubs kyang 'grub par mi 'gyur te//\nrgyu med pa yi 'bras bu bzhin// tshe yi  dus byas dmyal bar 'gro//\n",
    "lha dang lha mo ji lta bas// bdud rtsi'i  bum pas dbang bskur ba//\nchu'i dgongs pa  ye shes lnga'i// rtags su  sku lnga rdzogs pa'o//\n",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.835
spearman_cosine 0.854

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,500 training samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 6 tokens
    • mean: 19.74 tokens
    • max: 67 tokens
    • min: 5 tokens
    • mean: 22.11 tokens
    • max: 83 tokens
    • min: 0.02
    • mean: 0.51
    • max: 1.0
  • Samples:
    text1 text2 label
    'on pa rnams kyang rna bas sgra thos p 'on pa rnams rna bas sgra thes par bya'o snyam pa dang / smyon pa rnams dran pa thob par 0.5
    com ldan 'das de bzhin gshegs pa dgra bc mkhas pa yongs su gzung bar 'dod pa'i byang chub sems dpa' sems dpa' chen 0.229
    pa /sems can thams cad ng / snying rje'i sems dang ldan pa 0.3335
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 150 evaluation samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 150 samples:
    text1 text2 label
    type string string float
    details
    • min: 8 tokens
    • mean: 32.74 tokens
    • max: 126 tokens
    • min: 6 tokens
    • mean: 32.12 tokens
    • max: 121 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    text1 text2 label
    khang ljon shing rgyal mtshan seng ge rta khang bzangs ljong shing bram ze seng ge rta 0.5625
    rnam par thar pa'i sgo mtshan ma med pa/ yod ces bya bar yang dag par rjes su mi mthong ba/ 0.375
    byang chub ni chos kyi dbyings kyi gnas kyis gnas pa'o// byang chub ni de bzhin nyid rjes su rtogs pa'o// nges pa yod na mngon sum min// 'dra bar 'dzin pa rtog pa yin//
    0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 7
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 7
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss spearman_cosine
1.0 2 56.9409 2.7480 0.8357
2.0 4 53.1489 2.7016 0.8412
3.0 6 52.3657 2.6812 0.8462
3.8421 7 89.1774 2.6767 0.8471
0.8101 4 96.7978 2.7350 0.8455
1.8101 8 94.8279 2.6985 0.8497
2.8101 12 93.583 2.6846 0.8540

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.50.0
  • PyTorch: 2.5.1
  • Accelerate: 1.7.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}