Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from keepitreal/vietnamese-sbert. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'RobertaModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("hungq/hust_sbert_2")
# Run inference
sentences = [
"............... 22 \\nArticle 34. Master's degree requirements and graduation classifications. ....... 22 \\nArticle 35. Temporary leave and academic record retention ........................... 23 \\nArticle 36. Extension of study period and withdrawal from study .................. 24 \\nCHAPTER V : DOCTORAL PROGRAMS ................................ .................... 25 \\nArticle 37. Planning and progress reporting ................................ .................... 25 \\nArticle 38. Supplementary courses and doctoral courses ................................ . 25 \\nArticle 39. Literature review and doctoral thematic studies ............................ 26 \\nArticle 40. Doctoral dissertation ................................ ................................ ...... 26",
'30 \\nCHAPTER VI : ORGANIZATION AND IMPLEMENTATION ................. 32 \\nArticle 47. Transitional provisions ................................ ................................ ... 32 \\nArticle 48. Commencement ................................ ................................ .............. 32 \\n \\n \\n 1 \\n \\nMINISTRY OF EDUCATION AND TRANING \\nHANOI UNIVERSITY OF SCIENCE AND \\nTECHNOLOGY SOCIALIST REPUBLIC OF VIETNAM \\nIndependence - Freedom - Happiness \\n \\n \\nACADEMIC REGULATIONS \\n(Issued together with Decision No. 5445/QĐ -ĐHBK, dated 28/05/2025 \\nby the President of Hanoi University of Science and Technology) \\nCHAPTER I \\nGENERAL PROVISIONS \\nArticle 1. Scope and Applicability \\n1. These regulations govern the training activities for full -time and part -time/in',
'Điều 45. Công nh ận và chuy ển đổi kết quả học tập, nghiên c ứu .............................. 27 \nCHƯƠNG VI T Ổ CHỨC TH ỰC HI ỆN ................................ ................................ ....... 28 \nĐiều 46. Quy định chuy ển tiếp ................................ ................................ .................. 28 \nĐiều 47. Hi ệu lực thi hành ................................ ................................ ......................... 28 \nBỘ GIÁO DỤC VÀ ĐÀO TẠO CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM \nĐẠI HỌC BÁCH KHOA HÀ NỘI \n Độc lập – Tự do – Hạnh phúc \n \n \nQUY CHẾ ĐÀO T ẠO \n(Ban hành kèm theo Quyết định số 4600 /QĐ–ĐHBK ngày 09 tháng 6 năm 202 3 \ncủa Giám đốc Đại học Bách khoa Hà Nội) \n \nCHƯƠNG I \nNHỮNG QUY Đ ỊNH CHUNG',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7949, 0.7070],
# [0.7949, 1.0000, 0.7457],
# [0.7070, 0.7457, 1.0000]])
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Sinh viên được công nhận tốt nghiệp cử nhân sẽ được cấp bằng cử nhân và chính thức chuyển sang học CTĐT KS 180 TC. Các học phần tích lũy trước sẽ được xét công nhận thuộc CTĐT KS. |
Theo đề nghị của Trưởng Phòng Đào tạo; |
0.6376281516217733 |
b) Học bổng từ các nguồn hợp tác quốc tế song phương và đa phương: |
5.Học bổng gắn kết quê hương |
0.8607079764250578 |
thuộc diện miễn làm ngh ĩa vụ quân sự theo quy định hiện hành; |
lớp phù hợp để hoàn thành ch ương trình. |
0.8607079764250578 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
keepitreal/vietnamese-sbert