SentenceTransformer based on jangedoo/all-MiniLM-L6-v2-nepali

This is a sentence-transformers model finetuned from jangedoo/all-MiniLM-L6-v2-nepali on the title_excerpt dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jangedoo/all-MiniLM-L6-v3-nepali")
# Run inference
sentences = [
    'सिटिइभिटीतर्फ स्वास्थ्य कार्यक्रममा भर्ना कहिले खुल्ने?',
    'सिटिइभिटीमा स्वास्थ्य कार्यक्रममा भर्ना अझै खुल्न सकेको छैन। चिकित्सा शिक्षा आयोगसँगको असमझदारीका कारण विद्यार्थीहरू मर्कामा छन् र भर्ना ढिलाइ भएको छ।',
    'Nepal has confirmed the spread of multiple Omicron subvariants including XFG, XFG.3, and JN.1 amidst a recent rise in Covid-19 cases, with health officials emphasizing the ongoing risks particularly for vulnerable populations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric multi_lang_ir en_ir ne_ir
cosine_accuracy@10 0.9726 0.9865 0.9693
cosine_precision@10 0.0973 0.0986 0.0969
cosine_precision@50 0.0199 0.0199 0.0198
cosine_recall@10 0.9726 0.9865 0.9693
cosine_recall@50 0.9927 0.9973 0.9919
cosine_ndcg@10 0.8973 0.9306 0.8864
cosine_mrr@10 0.8725 0.9119 0.8592
cosine_map@100 0.8736 0.9126 0.8605

Translation

Metric Value
src2trg_accuracy 0.7384
trg2src_accuracy 0.7371
mean_accuracy 0.7377

Triplet

  • Dataset: nepali_triplets
  • Evaluated with TripletEvaluator with these parameters:
    {
        "margin": {
            "cosine": 0.1,
            "dot": 0.1,
            "manhattan": 0.1,
            "euclidean": 0.1
        }
    }
    
Metric Value
cosine_accuracy 0.445

Training Details

Training Dataset

title_excerpt

  • Dataset: title_excerpt at 88677eb
  • Size: 11,688 training samples
  • Columns: title and excerpt
  • Approximate statistics based on the first 1000 samples:
    title excerpt
    type string string
    details
    • min: 6 tokens
    • mean: 30.66 tokens
    • max: 78 tokens
    • min: 19 tokens
    • mean: 79.53 tokens
    • max: 180 tokens
  • Samples:
    title excerpt
    कांग्रेस तल्लो तहका नागरिकलाई समृद्ध बनाउने अभियानमा छ: गगन थापा कांग्रेस महामन्त्री गगनकुमार थापाले नेपाली कांग्रेसले तल्लो तहका नागरिकलाई समृद्ध बनाउने अभियानमा लगा परेको बताएका छन्। बेलायती संसदीय प्रणाली र लेबर पार्टीको अनुभवबाट सिक्दै नेपाली कांग्रेसले समृद्ध नेपाल र व्यवस्थित व्यवस्था निर्माणमा विश्वास गर्छ।
    शिक्षा: अधिकार कि व्यापार? नेपालमा शिक्षा क्षेत्र महिला़र बचाउने अधिकार र व्यवसायिकरण बीच द्वैधता देखिन्छ, जहाँ निजी विद्यालयहरूले निःशुल्क शिक्षा पालन नगर्दा र सरकारी विद्यालयहरू कम बजेटमा संघर्षरत छन्।
    Another Saudi Arabia returnee has mpox Nepal has reported its fourth case of mpox, involving a Saudi Arabia returnee migrant worker who is currently stable and isolated in a tropical disease hospital.
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

title_excerpt

  • Dataset: title_excerpt at 88677eb
  • Size: 585 evaluation samples
  • Columns: title and excerpt
  • Approximate statistics based on the first 585 samples:
    title excerpt
    type string string
    details
    • min: 5 tokens
    • mean: 30.05 tokens
    • max: 77 tokens
    • min: 22 tokens
    • mean: 76.93 tokens
    • max: 254 tokens
  • Samples:
    title excerpt
    किङ्स कलेजलाई हाइसेन्स नेपाल सायम फुटसल किङ्स कलेजले हाइसेन्स नेपाल प्रथम सायम अन्तर कलेज सेभेन ‘ए’ साइड फुटसल प्रतियोगितामा उपाधि जितेको छ। फाइनलमा टेकस्पाइर कलेजलाई २–१ ले पराजित गर्दै किङ्स कलेजले उपाधि उचालेको हो।
    'इरानमा रहेका नेपालीको उद्धारका लागि भारतसँग आग्रह गरेकी छु, सकारात्मक जवाफ आउनेमा विश्वस्त छु' इरानमा रहेका नेपालीलाई भारतको सहयोगमा उद्धार गरिने भएको छ। परराष्ट्रमन्त्री डा आरजु राणा देउवाले भारतसँग सकारात्मक जवाफ आउनेमा विश्वस्त व्यक्त गरिन्।
    ADB unveils $2.3 billion plan to boost green, job-rich growth in Nepal The Asian Development Bank unveiled a $2.3 billion strategy for Nepal from 2025-2029 focused on green, inclusive, and employment-intensive economic growth, aligning with Nepal's national development plans and emphasizing private sector investment and climate resilience.
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss title excerpt loss multi_lang_ir_cosine_ndcg@10 en_ir_cosine_ndcg@10 ne_ir_cosine_ndcg@10 translation_mean_accuracy nepali_triplets_cosine_accuracy
0.5464 100 0.3128 0.0851 - - - - -
1.0929 200 0.2365 0.0707 - - - - -
1.6393 300 0.1786 0.0652 - - - - -
2.1858 400 0.1476 0.0660 - - - - -
2.7322 500 0.1242 0.0657 - - - - -
3.2787 600 0.1112 0.0672 - - - - -
3.8251 700 0.097 0.0632 - - - - -
4.3716 800 0.0853 0.0618 - - - - -
4.9180 900 0.0792 0.0614 - - - - -
5.4645 1000 0.0723 0.0616 - - - - -
6.0109 1100 0.0672 0.0628 - - - - -
6.5574 1200 0.0576 0.0595 - - - - -
7.1038 1300 0.0559 0.0615 - - - - -
7.6503 1400 0.0554 0.0592 - - - - -
8.1967 1500 0.0511 0.0597 - - - - -
8.7432 1600 0.0492 0.0600 - - - - -
9.2896 1700 0.051 0.0607 - - - - -
9.8361 1800 0.0497 0.0608 - - - - -
-1 -1 - - 0.8973 0.9306 0.8864 0.7377 0.4450
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.8.1
  • Datasets: 2.21.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
5
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jangedoo/all-MiniLM-L6-v3-nepali

Unable to build the model tree, the base model loops to the model itself. Learn more.

Paper for jangedoo/all-MiniLM-L6-v3-nepali

Evaluation results