SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("along26/all-MiniLM-L6-v2_multilingual_malaysian-v9")
# Run inference
sentences = [
    'According to Bintulu police chief ACP Zailanni Amit, the late Bermau Bagu, 70, was in the garden before being shot by the suspect, his brother-in-law, who was also hunting in the garden.',
    'Nitih ku Ketuai Polis Pelilih Bintulu, ACP Zailanni Amit, rambau penusah nya nyadi, niang ti benama Bermau Bagu, 70 taun benung ba kebun nya sebedau kena timbak suspek, ipar niang empu, ke bela ngasu dalam kandang kebun nya.',
    'What is finance generally?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.1941, 0.9593],
#         [0.1941, 1.0000, 0.1978],
#         [0.9593, 0.1978, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 420,570 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 7 tokens
    • mean: 164.03 tokens
    • max: 512 tokens
    • min: 6 tokens
    • mean: 197.0 tokens
    • max: 512 tokens
    • min: 6 tokens
    • mean: 178.02 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Why have some analysts suggested that the outcome of Najib Razak's corruption trial could have significant implications for Malaysia's political landscape and the future of its democracy? Mengapa sesetengah penganalisis mencadangkan bahawa keputusan perbicaraan rasuah Najib Razak boleh memberi implikasi yang besar kepada landskap politik Malaysia dan masa depan demokrasinya? A warming climate can significantly affect the emergence times of certain insect species, as many insects are ectothermic and rely on external environmental conditions to regulate their body temperature. This means that their development, reproduction, and behavior are closely linked to temperature and other climatic factors. As global temperatures rise, these changes can lead to shifts in the timing of insect emergence, which can have cascading effects on ecosystem interactions and services.

    1. Phenological shifts: Warmer temperatures can lead to earlier emergence of insects, a phenomenon known as phenological shifts. This can result in a mismatch between the timing of insect emergence and the availability of their food resources, such as plants or other prey species. This mismatch can negatively impact the survival and reproduction of insects, as well as the species that depend on them for food.

    2. Altered species interactions: Shifts in insect emergence times can also affect speci...
    Corruption can have a significant impact on economic development and social inequality in Malaysia.

    Economic Development:
    Corruption can hinder economic development by discouraging investment, distorting markets, and undermining the rule of law. When businesses and individuals perceive a high level of corruption in a country, they may be less likely to invest or start businesses there, as they see it as a risky and unpredictable environment. This can limit the creation of jobs and the growth of industries, ultimately hindering economic development.

    Corruption can also distort markets by giving certain individuals or companies an unfair advantage. For example, if a company is able to secure a government contract through bribery or cronyism, it can stifle competition and lead to lower quality goods and services at higher prices. This can discourage innovation and entrepreneurship, further hindering economic development.

    Social Inequality:
    Corruption can also contribute to social inequ...
    Rasuah boleh memberi kesan yang ketara kepada pembangunan ekonomi dan ketidaksamaan sosial di Malaysia.

    Pembangunan Ekonomi:
    Rasuah boleh menghalang pembangunan ekonomi dengan menggalakkan pelaburan, memutarbelitkan pasaran, dan menjejaskan kedaulatan undang-undang. Apabila perniagaan dan individu melihat tahap rasuah yang tinggi di sesebuah negara, mereka mungkin kurang berkemungkinan untuk melabur atau memulakan perniagaan di sana, kerana mereka melihatnya sebagai persekitaran yang berisiko dan tidak dapat diramalkan. Ini boleh mengehadkan penciptaan pekerjaan dan pertumbuhan industri, akhirnya menghalang pembangunan ekonomi.

    Rasuah juga boleh memutarbelitkan pasaran dengan memberikan individu atau syarikat tertentu kelebihan yang tidak adil. Sebagai contoh, jika syarikat mampu mendapatkan kontrak kerajaan melalui rasuah atau kronisme, ia boleh menyekat persaingan dan membawa kepada barangan dan perkhidmatan berkualiti rendah pada harga yang lebih tinggi. Ini boleh menghalang inova...
    "What are the specific mechanisms through which immunoglobulins act to neutralize antigens and prevent infections?"
    He, who is also Minister of Public Health, Housing and Local Government Councils, said there were 302,243 Sarawakians at risk or recipients who should be more than 60 years old and had the second dose of COVID-19 on April Iya ti mega Menteri Pengerai Mensia Mayuh, Pengawa Berumah enggau Kaunsil Kandang Menua madahka, bisi 302,243 rayat Sarawak ti bisi risiko tauka penerima ke patut beumur lebih 60 taun merima tuchuk kedua dos penyungkak kedua COVID-19 berengkah kena 12 April tu tadi. What are the basic skills required to be a good programmer?
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 4
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0380 500 4.7509
0.0761 1000 2.033
0.1141 1500 1.5296
0.1522 2000 1.3832
0.1902 2500 1.3084
0.2283 3000 1.3416
0.2663 3500 1.3118
0.3043 4000 1.3306
0.3424 4500 1.2771
0.3804 5000 1.2593
0.4185 5500 1.2278
0.4565 6000 1.207
0.4946 6500 1.1735
0.5326 7000 1.1842
0.5706 7500 1.1501
0.6087 8000 1.1562
0.6467 8500 1.1422
0.6848 9000 1.1229
0.7228 9500 1.0865
0.7609 10000 1.1094
0.7989 10500 1.0848
0.8369 11000 1.0957
0.8750 11500 1.0564
0.9130 12000 1.0688
0.9511 12500 0.9947
0.9891 13000 1.048
1.0272 13500 1.0183
1.0652 14000 1.0139
1.1032 14500 1.0291
1.1413 15000 1.001
1.1793 15500 0.9803
1.2174 16000 0.9874
1.2554 16500 0.9895
1.2935 17000 0.9721
1.3315 17500 0.9689
1.3696 18000 0.9622
1.4076 18500 0.9234
1.4456 19000 0.9039
1.4837 19500 0.9223
1.5217 20000 0.9091
1.5598 20500 0.9377
1.5978 21000 0.9174
1.6359 21500 0.9039
1.6739 22000 0.9009
1.7119 22500 0.8912
1.7500 23000 0.9378
1.7880 23500 0.9056
1.8261 24000 0.8748
1.8641 24500 0.8869
1.9022 25000 0.8972
1.9402 25500 0.8856
1.9782 26000 0.87
2.0163 26500 0.869
2.0543 27000 0.8255
2.0924 27500 0.8421
2.1304 28000 0.8196
2.1685 28500 0.8292
2.2065 29000 0.8374
2.2445 29500 0.8101
2.2826 30000 0.8329
2.3206 30500 0.8073
2.3587 31000 0.8015
2.3967 31500 0.8221
2.4348 32000 0.7914
2.4728 32500 0.7768
2.5108 33000 0.8036
2.5489 33500 0.7825
2.5869 34000 0.7981
2.6250 34500 0.779
2.6630 35000 0.7965
2.7011 35500 0.783
2.7391 36000 0.7748
2.7771 36500 0.7962
2.8152 37000 0.7782
2.8532 37500 0.7611
2.8913 38000 0.7877
2.9293 38500 0.757
2.9674 39000 0.7789
3.0054 39500 0.7745
3.0434 40000 0.7471
3.0815 40500 0.7299
3.1195 41000 0.7119
3.1576 41500 0.7199
3.1956 42000 0.7318
3.2337 42500 0.7446
3.2717 43000 0.7316
3.3097 43500 0.7534
3.3478 44000 0.704
3.3858 44500 0.7005
3.4239 45000 0.713
3.4619 45500 0.7492
3.5000 46000 0.7337
3.5380 46500 0.7025
3.5760 47000 0.753
3.6141 47500 0.7378
3.6521 48000 0.7242
3.6902 48500 0.7123
3.7282 49000 0.7277
3.7663 49500 0.7272
3.8043 50000 0.7094
3.8423 50500 0.7074
3.8804 51000 0.7162
3.9184 51500 0.6984
3.9565 52000 0.693
3.9945 52500 0.7026

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for along26/all-MiniLM-L6-v2_multilingual_malaysian-v9

Finetuned
(823)
this model

Papers for along26/all-MiniLM-L6-v2_multilingual_malaysian-v9