SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("along26/mpnet_manglish-sentence-transformer")
# Run inference
sentences = [
    'Why have critics accused Najib Razak of mishandling the economy and what evidence supports these claims?',
    'Mengapa pengkritik menuduh Najib Razak salah mengendalikan ekonomi dan bukti apa yang menyokong dakwaan ini?',
    "How does adding more reactant or product affect the equilibrium position of a chemical reaction? Explain using Le Chatelier's principle with at least three different examples.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.8678,  0.9288],
#         [-0.8678,  1.0000, -0.8804],
#         [ 0.9288, -0.8804,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 139,404 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 12 tokens
    • mean: 221.26 tokens
    • max: 512 tokens
    • min: 21 tokens
    • mean: 262.59 tokens
    • max: 512 tokens
    • min: 13 tokens
    • mean: 241.29 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Suppose there are 6 guests at a party. Among them, some are friends with each other and some are strangers. What is the minimum number of guests who must be friends with each other or at least strangers with each other, in order to guarantee that there are 3 guests who are all friends or 3 guests who are all strangers? Eh, imagine got 6 people go party lah. Some of them kakis, some don't know each other one. How many people must be friends or at least don't know each other, so can confirm got 3 people all kakis or 3 people all don't know each other? Aiyoh, very headache leh! Why is there still a lack of emphasis on science, technology, engineering, and mathematics (STEM) education in Malaysia?
    A photon at point A is entangled with a photon at point B. Using the quantum teleportation protocol, how can you transfer the quantum state of the photon at point A to point C, without physically moving the photon from point A to point C, given that a shared entangled pair of photons is available between point A and point C? Provide a step-by-step explanation of the procedure. Foton pada titik A terikat dengan foton pada titik B. Menggunakan protokol teleportasi kuantum, bagaimana anda boleh memindahkan keadaan kuantum foton pada titik A ke titik C, tanpa memindahkan foton secara fizikal dari titik A ke titik C, diberikan bahawa sepasang foton terjerat yang dikongsi tersedia antara titik A dan titik C? Berikan penjelasan langkah demi langkah tentang prosedur. Civil society groups and activists in Malaysia have expressed concern about the government's handling of the 1MDB scandal and the prosecution of those involved for several reasons. The 1MDB scandal involves allegations of massive corruption and money laundering at the state-owned investment fund, with billions of dollars allegedly misappropriated and used for personal gain.

    Firstly, some civil society groups and activists have criticized the government's investigation and prosecution of those involved in the 1MDB scandal as being politically motivated and lacking in transparency. They argue that the government has selectively targeted certain individuals for prosecution, while others with close ties to the ruling party have been left untouched.

    Secondly, there are concerns about the slow pace of the investigations and prosecutions, which have dragged on for years without any significant progress. Some civil society groups and activists have accused the government of deliberately slow...
    To solve this problem, we can use the generating functions method. Let's represent each number in the set {1, 2, 3, ..., 10} as a variable x raised to the power of that number. The generating function for this set is:

    G(x) = x^1 + x^2 + x^3 + ... + x^10

    Now, we want to find the coefficient of x^15 in the expansion of G(x)^3, as this will represent the number of ways to select 3 numbers from the set such that their sum is 15.

    G(x)^3 = (x^1 + x^2 + x^3 + ... + x^10)^3

    Expanding G(x)^3, we are looking for the coefficient of x^15. We can do this by finding the possible combinations of terms that multiply to x^15:

    1. x^1 * x^5 * x^9
    2. x^2 * x^4 * x^9
    3. x^3 * x^4 * x^8
    4. x^3 * x^5 * x^7
    5. x^4 * x^5 * x^6

    Now, let's count the number of ways each combination can be formed:

    1. x^1 * x^5 * x^9: There is only 1 way to form this combination.
    2. x^2 * x^4 * x^9: There is only 1 way to form this combination.
    3. x^3 * x^4 * x^8: There is only 1 way to form this combination.
    4. x^3 * x^5 * ...
    Untuk menyelesaikan masalah ini, kita boleh menggunakan kaedah fungsi penjanaan. Mari kita wakili setiap nombor dalam set {1, 2, 3, ..., 10} sebagai pembolehubah x dinaikkan kepada kuasa nombor itu. Fungsi penjanaan untuk set ini ialah:

    G(x) = x^1 + x^2 + x^3 + ... + x^10

    Sekarang, kita ingin mencari pekali x^15 dalam pengembangan G(x)^3, kerana ini akan mewakili bilangan cara untuk memilih 3 nombor daripada set supaya jumlahnya ialah 15.

    G(x)^3 = (x^1 + x^2 + x^3 + ... + x^10)^3

    Mengembangkan G(x)^3, kita sedang mencari pekali bagi x^15. Kita boleh melakukan ini dengan mencari kemungkinan gabungan istilah yang didarab kepada x^15:

    1. x^1 * x^5 * x^9
    2. x^2 * x^4 * x^9
    3. x^3 * x^4 * x^8
    4. x^3 * x^5 * x^7
    5. x^4 * x^5 * x^6

    Sekarang, mari kita kira bilangan cara setiap gabungan boleh dibentuk:

    1. x^1 * x^5 * x^9: Terdapat hanya 1 cara untuk membentuk gabungan ini.
    2. x^2 * x^4 * x^9: Terdapat hanya 1 cara untuk membentuk gabungan ini.
    3. x^3 * x^4 * x^8: Terdapat hanya 1 cara u...
    Why does the Malaysian government still insist on implementing the controversial National Security Council Act, which gives the government excessive power to declare a security area and restrict civil liberties?
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0574 500 3.8344
0.1148 1000 0.2259
0.1722 1500 0.0166
0.2295 2000 0.0025
0.2869 2500 0.0024
0.3443 3000 0.0022
0.4017 3500 0.0014
0.4591 4000 0.0016
0.5165 4500 0.0003
0.5739 5000 0.0002
0.6312 5500 0.0013
0.6886 6000 0.0002
0.7460 6500 0.0
0.8034 7000 0.0
0.8608 7500 0.0011
0.9182 8000 0.0
0.9756 8500 0.0007
1.0329 9000 0.0008
1.0903 9500 0.0
1.1477 10000 0.0
1.2051 10500 0.0008
1.2625 11000 0.0
1.3199 11500 0.0
1.3773 12000 0.0003
1.4346 12500 0.0009
1.4920 13000 0.0
1.5494 13500 0.0012
1.6068 14000 0.0012
1.6642 14500 0.0
1.7216 15000 0.0
1.7790 15500 0.0
1.8363 16000 0.0
1.8937 16500 0.0015
1.9511 17000 0.0004
2.0085 17500 0.0
2.0659 18000 0.0
2.1233 18500 0.0007
2.1806 19000 0.0001
2.2380 19500 0.0006
2.2954 20000 0.0006
2.3528 20500 0.0001
2.4102 21000 0.0
2.4676 21500 0.0
2.5250 22000 0.0003
2.5823 22500 0.0001
2.6397 23000 0.0
2.6971 23500 0.0
2.7545 24000 0.0
2.8119 24500 0.0006
2.8693 25000 0.0
2.9267 25500 0.0
2.9840 26000 0.0

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for along26/all-MiniLM-Malaysian-Multi

Finetuned
(748)
this model

Papers for along26/all-MiniLM-Malaysian-Multi