Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Paper • 2101.06983 • Published • 2
How to use Ambika14/sbert-grievance-classifier-code-A with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Ambika14/sbert-grievance-classifier-code-A")
sentences = [
"udyam registration cancel <udyam_no> is still pending for cancellation and its passing more than <NUM> days issue delayed cancellation of udyam registration context the user is reporting that the cancellation of udyam registration for udyam-up- <NUM> - <NUM> is still pending and has been delayed for more than <NUM> days. details - udyam registration no udyam-up- <NUM> - <NUM> cancellation status pending",
"Policy and Schemes. Definition of MSMEs (Clarifications related to definition) Policy. this category pertains to grievances seeking policy interpretation and clarification regarding the definition and classification of micro small and medium enterprises msmes under the micro small and medium enterprises development msmed act <NUM> as amended . the category encompasses disputes or doubts related to the application of turnover investment and structural factors to specific enterprise cases. key issues include turnover and investment threshold calculations treatment of export turnover or goods and services tax gst classification of enterprises as micro small or medium clubbing of multiple units or related businesses under a single msme identity the category also captures concerns arising from the transition between old and revised msme definitions including impact of reclassification on eligibility continuity of benefits already availed applicable financial year for revised criteria grievances in this category are clarification-driven rather than system-error driven arising from the intersection of policy intent numerical calculations and enterprise structure. example issues include turnover classification discrepancies my turnover is within limits but udyam shows a higher msme category please clarify the correct classification as per policy. export turnover treatment export turnover has been included while determining msme status kindly clarify whether it should be excluded. post-migration classification changes after migration from uam to udyam my enterprise category has changed despite no change in investment please confirm if this is correct. revised definition impact on eligibility due to the revised msme definition my eligibility under schemes is affected kindly clarify whether benefits already availed will continue. the operational procedural policy and institutional causes of these grievances include",
"Policy and Schemes. DBT / IT desk including Annual Report. dbt it desk including the annual report in msme refers to the data dbt wing functioning under the office of the development commissioner msme which is responsible for administering direct benefit transfer dbt of subsidies under msme schemes managing it and digital infrastructure and compiling the ministry s annual report. the wing oversees end-to-end dbt processes for scheme reimbursements such as ict and cloud computing subsidies where msmes initially incur eligible expenses and subsequently receive reimbursements directly into aadhaar-linked bank accounts through the public financial management system often after technical verification by agencies like telecommunications consultants india limited. it ensures compliance with national dbt standards in coordination with the dbt mission and national informatics centre maintains and upgrades msme it systems including the udyam registration portal supports cloud-based it adoption for msmes undertakes data analytics and mis reporting and onboards schemes to the national dbt framework. the wing also prepares the annual report of the ministry of msme consolidating performance indicators financial outlays scheme outcomes udyam registration trends and macro-level contributions such as msme share in gdp and employment which are used for parliament cabinet briefings and policy evaluation. while this framework promotes transparency leak-proof subsidy delivery evidence-based policymaking and digital efficiency stakeholders frequently raise grievances related to dbt execution data accuracy it reliability and reporting quality. examples of grievances include msmes experiencing delays in receipt of approved ict or cloud service subsidies due to pfms transaction or verification glitches reimbursement failures arising from aadhaar bank account linkage mismatches despite valid udyam registration inaccuracies or under-reporting of scheme achievements udyam registrations or msme gdp contribution in the annual report affecting policy advocacy and planning temporary downtime or access issues on udyam or other msme it portals during registration or subsidy claim periods and gaps in mis capture where scheme data duplications or leakages are not properly reflected in dbt dashboards or the annual report prompting appeals for correction and system strengthening.",
"UAM/Udyam Registration/Certificate related issues. Time Taken for Cancellation of UDYAM Certificate (Technical). this category refers to grievances concerning delays in processing requests for cancellation of an existing udyam registration. when a business owner submits a request to cancel a registration the request is expected to be processed within a reasonable timeframe. however in some cases users report that the cancellation request remains pending for an extended period. grievances under this category usually involve complaints where the enterprise owner has already submitted a cancellation request but the status continues to show as pending or unprocessed. entrepreneurs may also report that they cannot proceed with other actions related to their registration because the cancellation has not yet been completed. in some situations users may have submitted the request multiple times or may be seeking clarification about the delay in processing the cancellation. these grievances are typically raised by msme proprietors partners company directors or authorized representatives who previously requested cancellation of their enterprise registration. business owners who closed their operations or who submitted cancellation due to incorrect registration details may seek updates on the status of their request. compliance managers accountants or consultants handling enterprise registrations may also raise grievances when the cancellation process takes longer than expected or prevents further registration-related actions from being completed."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'MPNetModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'dear sir mam i am trying to register udyam with my pan but error showing udyam registration has already done through this pan and i have not registered earlier so please guide me aadhaar <uam_no> pan <pan_no> mobile <phone_no> issue clarification on existing udyam registration context the user is requesting clarification as the udyam registration portal indicates that registration has already been done through the pan although the user states that no registration was made. details - aadhar no <NUM> pan no gnips2021g mobile no <NUM>',
'UAM/Udyam Registration/Certificate related issues. Existing / Unauthorized UDYAM Registration Against PAN. this category refers to grievances where an entrepreneur discovers that a udyam registration already exists against their pan either due to duplicate registration or because someone else created the registration without their authorization. since pan is used as a key identifier for enterprise registration the presence of an existing registration can prevent the legitimate owner from creating a new one or managing the enterprise details. grievances under this category usually include complaints about duplicate registrations created for the same enterprise or multiple registrations linked to the same pan. some business owners report that when they attempt to register their enterprise the system indicates that a registration already exists even though they are unaware of creating one earlier. in other cases entrepreneurs may find that an employee consultant former partner or third party registered the enterprise using the business pan without informing the owner. there may also be situations where an earlier registration contains incorrect enterprise information leading to confusion about the valid record. such grievances are generally raised by business proprietors partners of partnership firms directors of companies or authorized representatives responsible for registering the enterprise under msme. these complaints may also be submitted by compliance managers accountants or consultants who are attempting to complete the msme registration process for the business but encounter an existing record linked to the pan. the purpose of raising this grievance is to identify the existing registration verify its legitimacy and resolve conflicts arising from duplicate or unauthorized registrations associated with the enterprise s pan.',
'Marketing and Skilling. National SC ST HUB. national sc-st hub nssh is a central sector scheme launched in <NUM> by the ministry of micro small and medium enterprises and implemented by the national small industries corporation to empower scheduled caste and scheduled tribe entrepreneurs and strengthen their participation in the msme ecosystem. the scheme focuses on capacity building market access financial facilitation and handholding support while also operationalizing the mandatory <NUM> procurement target for sc st owned mses under the public procurement policy for mses <NUM> . through a network of national sc-st hub offices across the country the hub assists eligible sc st entrepreneurs holding at least <NUM> ownership and control in activities such as udyam and gem registration participation in government tenders access to credit and skill upgradation. financial support is provided in the form of reimbursements for testing and certification charges from recognized laboratories bank loan processing and bank guarantee fees membership fees of export promotion councils onboarding costs for e-commerce and government procurement platforms and fees for short-term skill and management training programs at reputed institutions. by reducing entry barriers and providing structured handholding nssh aims to enhance competitiveness ensure inclusive growth and enable sc st entrepreneurs to scale up operations and integrate with formal supply chains. examples of grievances reported under the scheme include rejection of reimbursement claims where testing or certification expenses exceed the prescribed financial ceiling despite compliance with quality standards blockage of financial assistance due to delays or discrepancies in caste certificate verification even when enterprises are otherwise registered as sc st-owned instances where sc st msmes fail to secure tenders despite the mandated procurement quota because of non-compliance by procuring cpses partial reimbursement of approved training or capacity-building expenses owing to scheme-specific limits leading to out-of-pocket costs for entrepreneurs and gaps in timely support from local nssh offices particularly in remote or north-eastern regions affecting onboarding to procurement portals and access to scheme benefits.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7751, 0.1988],
# [0.7751, 1.0000, 0.2777],
# [0.1988, 0.2777, 1.0000]])
EmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | nan |
| spearman_cosine | nan |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
sub - request for clarification on msme dev act . dear sir . your august office is kindly requested to define the specific word _x0080__x009c_tender x0080__x009d as referred in the public procurement policy for micro and small enterprises mse _x0080__x0099_s order gazetted notification no. d.l.- dtd. . . sub sec heading as price quotation in tenders and further word _x0080__x009c_rate contract x0080__x009d as referred in sub sec developing micro and small enterprises vendors before substitution dt. . . as extracted. _x0080__x009c_7. developing micro and small enterprise vendors. _x0080__x0093_the central ministries or departments or public sector undertakings shall take necessary steps to develop appropriate vendors by organizing vendor development programmes or buyer-seller meets and entering into rate contract with micro and small enterprises for a specified period in respect of periodic requirements also. _x... |
Policy and Schemes. Related to Public Procurement by PSUs. this category pertains to grievances related to public sector undertakings psus violating or diluting mandatory msme procurement norms under the public procurement policy for msmes and related guidelines including gem . the scope encompasses cases where psus fail to meet prescribed msme procurement quotas deny msmes their l1 price-matching rights bypass eligible msme vendors despite valid registration or design tenders with disproportionate eligibility conditions that effectively exclude msmes. key issues and scenarios within this category include failure to meet msme procurement quotas denial of l1 price-matching rights to msmes bypassing eligible msme vendors despite valid registration designing tenders with disproportionate eligibility conditions such as excessive turnover requirements prior psu experience requirements high emd pbg requirements unnecessary technical specifications post-award payment delays including wi... |
banks approved clcs-tu loan for new machines but subsidy claim is rejected over minor tech list mismatch despite empanelled vendor. this ties up my finance without aid. release subsidy and simplify verification for tech upgrades.special clcs-tu for sc st promises subsidy but nodal agency delays processing my plant machinery finance claim for months with extra document demands. please fast-track special aid and approve higher subsidy for sc st beginners. issue delayed subsidy claim and non-approval under clcs-tu and special clcs for sc st context the user is reporting delayed subsidy claim and non-approval under clcs-tu and special clcs for sc st schemes citing minor technical list mismatch and extra document demands and requesting simplification of verification and fast-tracking of special aid. details - issue with clcs-tu loan minor tech list mismatch issue with special clcs for sc st delayed processing and extra document demands requested action simplify verification and ... |
Starter, Credit and Finance. Credit Linked Capital Subsidy for Technology Upgradation (CLCS- TU) & Special CLCS for SC&ST. credit linked capital subsidy scheme for technology upgradation clcss tu and the special clcs for sc st entrepreneurs is a flagship technology modernisation program of the ministry of micro small and medium enterprises designed to help micro and small manufacturing enterprises upgrade to proven state-of-the-art technologies. under the standard clcss tu eligible mses receive an upfront capital subsidy of on institutional term loans used for purchasing approved plant and machinery subject to a maximum subsidy of lakh on an eligible investment ceiling of crore across notified sub-sectors. the scheme is implemented through nodal agencies such as small industries development bank of india national bank for agriculture and rural development and national institute for entrepreneurship and small business development with technical vetting by expert bodies... |
i am unable to change enterprise name or trade name in my udayam certificate pls give proper solution issue update of enterprise trade name in udyam certificate context the user is requesting an update of the enterprise trade name in the udyam certificate. details - enterprise trade name update required |
UAM/Udyam Registration/Certificate related issues. Update Company/Owner Name Details. this category includes grievances related to corrections or updates to the name of the enterprise or the name of the owner associated with a udyam registration. accurate naming details are important for maintaining correct enterprise records and ensuring that the information recorded in the registration reflects the official business identity. grievances under this category typically arise when the name of the enterprise or the owner s name recorded during registration contains an error or needs to be updated due to changes in the business structure. for example the enterprise name may have been entered incorrectly during registration or the owner s name may not match official identification documents. in some cases the enterprise name may change due to business rebranding conversion of the business structure or correction of typographical errors made during the registration process. users may also re... |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 32,
"gather_across_devices": false
}
per_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 5fp16: Truemulti_dataset_batch_sampler: round_robindo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | spearman_cosine |
|---|---|---|
| 1.0 | 3 | nan |
| 2.0 | 6 | nan |
| 3.0 | 9 | nan |
| 4.0 | 12 | nan |
| 5.0 | 15 | nan |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Base model
sentence-transformers/all-mpnet-base-v2