SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'the msme portal software keeps crashing during udyam registration renewal and scheme applications with error messages and failed uploads every time i try. support team gives no help and i can t access my digital certificates or track status. this software glitch blocks my business from government benefits and loans. please fix the bugs improve server speed and add better error guides right away. issue software glitch in msme portal during udyam registration renewal and scheme applications context the user is reporting frequent crashes of the msme portal software during udyam registration renewal and scheme applications resulting in failed uploads error messages and inability to access digital certificates or track status which is hindering business access to government benefits and loans. details - software msme portal software issue frequent crashes during udyam registration renewal and scheme applications error messages failed uploads and error messages impact inability to access digital certificates track status and access government benefits and loans',
    'Technology, Quality and Institutions. Software Related. software-related initiatives for msmes mainly center on the digital msme scheme under the national manufacturing competitiveness programme which promotes adoption of information and communication technologies through cloud-based erp crm and accounting software to digitalize day-to-day business operations. the scheme combines awareness workshops needs assessment and financial support in the form of subsidies covering about <NUM> <NUM> of eligible costs subject to a ceiling of <NUM> lakh over two years specifically targeting micro and small enterprises. these initiatives are reinforced by complementary efforts such as software-enabled facilities under technology centre programmes for electronics and esdm sectors digital quality and process parameters under zed certification and software-focused modules within entrepreneurship and skill development programmes. together these measures aim to standardize workflows automate inventory finance and customer management reduce operational inefficiencies and inventory holding support online sales and compliance and enhance overall competitiveness without requiring heavy upfront investment in hardware. examples of grievances include subsidy denial an msme implementing a cloud-based erp costing <NUM> . <NUM> lakh receives no reimbursement beyond the <NUM> lakh cap despite meeting all eligibility conditions. software ineligibility a cloud application selected after needs assessment is later rejected as non-standard or non-approved forcing the enterprise to abandon or restart implementation mid-way. inadequate training awareness workshops focus only on theoretical benefits of digitalization and fail to provide hands-on demonstrations or practical guidance on using erp or crm software. post-subsidy continuity issue after the two-year subsidized period ends steep renewal or subscription costs make the software unaffordable disrupting business operations. needs mismatch an msme assessed for crm requirements is instead provided accounting software limiting the usefulness of the digital intervention and affecting adoption outcomes.',
    'Technology, Quality and Institutions. Related to NSIC. this category encompasses grievances related to the support and facilitation services provided by the national small industries corporation nsic to micro small and medium enterprises msmes . the scope of this category includes issues arising from the areas of raw material assistance market access and risk mitigation through guarantees. specifically it covers situations where approved raw material assistance is not released on time supplier coordination fails after nsic approval material supplied through nsic is delayed or does not meet specifications or documentation and regional office processes stall procurement. the category also captures failures in marketing support including - delayed or missing inclusion in tenders gem or psu vendor listings - late communication of bid opportunities - problems in nsic-sponsored exhibitions or buyer-connect programs additionally it includes issues related to performance and emd guarantees such as - delayed issuance - incorrect formats - non-renewal despite payment - rejection by psus - lack of response when guarantees are invoked these grievances typically result in missed orders blocked working capital contract delays or loss of business credibility and arise from execution coordination or service delivery breakdowns rather than policy interpretation. the category is further divided into the following subcategories <NUM> . corporate communication single point registration scheme and exhibition consortia and tender marketing <NUM> . internal audit and law recovery <NUM> . human resource <NUM> . vigilance law recovery <NUM> . international cooperation <NUM> . bank guarantee monitoring <NUM> . finance accounts <NUM> . national sc st hub <NUM> . chief vigilance officer <NUM> . contract procurement grievance officer <NUM> . digital services facilitation and training <NUM> .space marketing cell event management cell <NUM> .raw material assistance bank guarantee bill discounting bank tieup csr administration <NUM> .technology liaison officer for sc st pwd cmr <NUM> .epf trust superannuation pension trust <NUM> .center public information officers cpio <NUM> .company secretary',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6265, 0.5981],
#         [0.6265, 1.0000, 0.7013],
#         [0.5981, 0.7013, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine nan
spearman_cosine nan

Training Details

Training Dataset

Unnamed Dataset

  • Size: 88 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 88 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 46 tokens
    • mean: 172.95 tokens
    • max: 256 tokens
    • min: 199 tokens
    • mean: 253.88 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    with due respect i md mafijul husen would like to intimate that when i trying to edit my existing udyam registration certificate certificate of my enterprise viz. md mafijul husen then i failed to enter otp as my earlier mobile number has been changed and the given gmail id is also inactive. hence it is my request to change my mobile number so that i can edit my existing udyam registration certificate. my pan no is and aadhaar number is . issue update of mobile number and gmail id for udyam registration certificate editing context the user is requesting an update of the mobile number and gmail id associated with the existing udyam registration certificate udyam-wb- - to facilitate editing of the certificate. details - udyam registration certificate no udyam-wb- - old mobile no old gmail id inactive pan no aetph0941n aadhar no UAM/Udyam Registration/Certificate related issues. Updation of Email ID/Mobile No. Linked to UDYAM Certificate. this category includes grievances related to updating or correcting the email id or mobile number associated with an existing udyam registration. contact details provided during registration are used for communication verification and authentication when accessing the enterprise profile on the portal. if these contact details become outdated incorrect or inaccessible the enterprise owner may face difficulty receiving otps accessing the portal or managing the registration information. common grievances under this category include requests to change the registered mobile number or email address because the original number is no longer active the sim card has been lost the email account is no longer accessible or the contact details were entered incorrectly during registration. some complaints arise when the registered contact details belong to an employee or consultant who is n...
    we had applied for msme registration under the application number m on 22nd march . after reviewing the status and considering our circumstances we kindly request that our case be transferred to the micro and small enterprises facilitation council msefc for further processing and resolution. we believe that the msefc councils intervention will help address any concerns or disputes that may have arisen regarding our application. we are hopeful that this request will be processed swiftly and in accordance with the necessary regulations. thank you for your attention to this matter. we look forward to your prompt assistance in facilitating this request. issue request for transfer of msme registration case to msefc context the user is requesting to transfer their msme registration case to the micro and small enterprises facilitation council msefc for further processing and resolution. details - application number udyam-dl- - m application date 22nd m... Technology, Quality and Institutions. Related to NI-MSME. this category encompasses grievances related to training capacity-building and certification programs administered by the national institute for micro small and medium enterprises ni-msme for micro small and medium enterprises msmes entrepreneurs and their employees. the scope of this category includes issues arising from the delivery of training programs such as repeatedly postponed schedules without prior notification inaccessible online training portals unclear eligibility criteria unavailable trainers insufficient mentoring outdated or non-practical course content additionally this category captures certification-related issues including delayed issuance of certificates certificates issued with incorrect details difficulty verifying certificates online failure to deliver certificates after course completion furthermore the category includes course enrollment and admission disputes such as unjustified rejection of enrollment ...
    insurancy company national insurance company limited branch name of insurance company branch if other khamgaon branch date of application - - policy number my claim is kept pending even after submitting all the documents after changing all the requirements as changed by various surveyors. issue delayed insurance claim under national insurance company limited context the user is reporting that the insurance claim submitted on - - with policy number is still pending despite submission of all required documents as per changes made by various surveyors. details - policy number claim submission date - - branch khamgaon Starter, Credit and Finance. Insurance Claim related issues. this category encompasses grievances related to insurance claims associated with various government-backed and private insurance products. the scope includes . esic employees state insurance corporation insurance benefits . epfo employees provident fund organisation -linked insurance benefits including edli employees deposit linked insurance . cgtmse credit guarantee fund trust for micro and small enterprises -linked insurance elements . private or general business insurance products where a government department psu public sector undertaking or bank acts as an intermediary or implementing authority the category covers a range of issues including opaque rejection decisions undocumented policy exclusions administrative closure without explanation shifting of risk and liability onto msmes micro small and medium enterprises or employees document and data mismatches across multiple systems such as aadhaar ...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 32,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 6
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step spearman_cosine
1.0 2 nan
2.0 4 nan
3.0 6 nan
4.0 8 nan
5.0 10 nan
6.0 12 nan

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ambika14/bge_grievance_classifier-code-B

Finetuned
(444)
this model

Papers for Ambika14/bge_grievance_classifier-code-B

Evaluation results