Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use arulpm/ipbgpt with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("arulpm/ipbgpt", trust_remote_code=True)
sentences = [
"Judul: Pelestarian Lanskap Sejarah Lasem Sebagai Kota Pusaka di Kabupaten Rembang Jawa Tengah\nAbstrak: Lasem merupakan sebuah kota kecil yang berada di Kabupaten Rembang yang mempunyai banyak peninggalan warisan budaya dan sejarah dengan ciri khas dari Arab, Cina dan Pribumi. Sejarah Lasem meninggalkan beberapa tapak bersejarah dan keberadaannya tersebar di wilayah Lasem. Penelitian ini bertujuan untuk menganalisis karakter dan kualitas lanskap sejarah Lasem; mengkaji upaya pengembangan pengelolaan lanskap sejarah sebagai kota pusaka; dan menyusun rencana pelestarian lanskap sejarah kota Lasem sebagai kota pusaka. Analisis yang digunakan berupa analisis deskriptif dan analisis spasial dengan menggunakan metode identifikasi lanskap sejarah, skoring, mental map dan kuisioner. Berdasarkan hasil penelitian diperoleh 11 elemen potensial peninggalan sejarah pada kawasan Prioritas kota Lasem. Terdapat juga penilaian terhadap elemen sejarah yang menghasilkan kategori elemen sejarah tingkat tinggi, sedang dan rendah berdasarkan kriteria yang diuji. Pada mental map menhasilkan citra lanskap pembentuk suatu kota. Sehingga rekomendasi yang diberikan berupa pelestarian lanskap sejarah yang terdapat di kota Lasem untuk menjaga keberlanjutannya.\nKeyword: historical landscape management, cultural heritage, sustainability, landscape preservation, historical landscape management",
"Judul: Responses on Performance and Minerals Digestibility by Novel Consensus Bacterial 6-Phytase Variant (PHY-G) Supplementation in Broiler Diet\nAbstrak: Plant feedstuffs normally contain phytic acid which poorly hydrolized by monogastric especially broiler. Further, 80% of phosphorus (P) in plant feedstuff is complexed with phytic acid. In that case, high amount of inorganic P sources is needed to fulfill P requirement. Since it has capability to bind with some nutrients such as bi- or trivalent minerals and amino acids then phytase is widely utilized in poultry feed to improve the nutrient digestibility. Today, the use of 500-1000 FTU phytases in poultry feed is widely applied by feed industry and its capability releasing phytate-bound phosphorus is very well documented. Novel consensus bacterial 6-phytase variant (PHY-G) is newest 6-phytase derived from bacterial phytase gene Buttiauxella sp. expressed in Trichoderma reesei with enhanced functionality for better phytic acids degradation. The present research was conducted to evaluate the efficacy of PHY-G in different doses (1000, 1500 and 2000 FTU/kg) on high phytic acids content diets which at least contained 0.30% Phytate-P. A total 3,675 male broilers Indian River/IR (105 pens, 35 birds/pen) were provided mixed grain diets in seven treatments with fifteen replications, they were divided into two phases of rearing which were starter (1 – 21 d) and finisher (22 – 35 d). Treatments were positive control (PC) using standard diet following IR’s nutrient requirement, negative control 1 (NC1) with nutrient reduction at 0.21 percent unit calcium (Ca), 0.21 percent units available phosphorus (AvP), 0.34 percent unit crude protein (CP) and 66 kcal/kg AME, NC2 with nutrient reduction at 0.23 percent unit Ca & AvP, 0.45 percent unit CP and 75 kcal/kg AME, followed by NC3 with nutrient reduction at 0.24 percent unit Ca & AvP, 0.52 percent unit CP and 79 kcal/kg AME. PHY-G supplementation with dose 1000, 1500 and 2000 FTU/kg on top of NC1, NC2 and NC3 respectively. PHY-G supplementation at any level significantly improved body weight gain/BWG and corrected FCR/McFCR (P<0.05) on starter (1,083 – 1,093 g/bird) and overall phase (2,482 – 2,532 g/bird) compared to any NCs (1,063-1,084 g/bird on starter and 2,387 – 2,398 g/bird on finisher). No significant different were observed on mortality of all treatments but PHY-G supplementation significantly improved (P<0.05) broiler index/BI (444 - 463) versus NCs (427 - 430) and able to maintain it equivalent to PC (455). Toe ash was significantly improved (P<0.05) by all doses of PHY-G (13.28 – 13.56%) compared to NC (12.3 – 12.7%). Apparent ileal digestibility (AID) of Ca was not affected by PHY-G but 1000, 1500 and 2000 FTU/kg PHY-G supplementation significantly improvemed on AID of P (P<0.05) which were 64.97%, 75.60% and 78.29% compared to NCs (42.64%, 48.88% and 46.17% for NC1, NC2 and NC3 respectively). This These data indicated that PHYG supplementation effectively improved broiler growth performance, bone mineralization and P digestibility at any level of dose on high content of phytic acid in the diets.\nKeyword: broiler, growth performance, phytase, phytic acid, toe ash",
"Judul: Pelestarian Lanskap Sejarah Kota Banda Aceh Sebagai Kota Pusaka Di Provinsi Aceh\nAbstrak: Banda Aceh menjadi salah satu dari sepuluh kota pusaka yang ada di Indonesia untuk dipersiapkan menjadi The World Heritage City oleh Kementrian Pekerjaan Umum melalui Program Penataan dan Pelestarian Kota Pusaka (P3KP). Program kota pusaka ini mewujudkan ruang kota yang aman, nyaman, produktif dan berkelanjutan berbasis rencana tata ruang, bercirikan nilai-nilai pusaka, melalui transformasi upaya-upaya pelestarian menuju urban (heritage) development dengan dukungan dan pengelolaan yang baik serta penyediaan infrastruktur yang tepat. Hal ini didasarkan melalui UU Cagar Budaya Nomor 11 Tahun 2010 dan UU Penataan Ruang nomor 26 tahun 2007. Banda Aceh memiliki kawasan situs sejarah yang dapat dibedakan berdasarkan periodenya, yaitu: masa kerajaan, masa kolonial dan masa kemerdekaan. Tetapi, dalam pengelolaannya hingga saat ini belum terlihat adanya strategi pelestarian peninggalan sejarah tersebut. Beberapa lanskap sejarah yang ada dalam kondisi tidak terawat, terlantar, tidak fungsional dan rusak. Dari berbagai masalah di atas, dirasakan sudah saatnya perlu dilakukan kajian pelestarian lanskap sejarah Kota Banda Aceh sebagai kota pusaka di Indonesia. Penelitian juga dilakukan untuk mengevaluasi proses perlindungan pusaka peninggalan sejarahnya yang kemudian diharapkan bermanfaat dalam meningkatkan ekonomi daerah. Tujuan penelitian ini yaitu: menganalisis karakter dan kualitas lanskap sejarah Kota Banda Aceh, mengkaji persepsi masyarakat dalam mendukung Kota Banda Aceh sebagai kota pusaka, dan menyusun strategi pelestarian lanskap sejarah di Kota Banda Aceh. Metode penelitian yakni analisis karakter dan kualitas lanskap sejarah, analisis dilakukan dengan tahapan yaitu: penentuan karakter lanskap sejarah, penilaian signifikansi, serta penilaian keaslian, keunikan dan kenyamanan. Kemudian dilakukan analisis persepsi masyarakat, yakni untuk mengetahui pengetahuan terhadap kota pusaka, persepsi masyarakat Kota Banda Aceh mengenai pelestarian lanskap sejarah yang perlu dilindungi serta aktor yang berperan untuk melestarikan pusaka di Kota Banda Aceh. Hasil assessment lanskap sejarah dan survei kepada masyarakat menjadi dasar dalam menyusun kriteria dalam metode AHP, hasilnya berupa strategi pelestarian lanskap sejarah Kota Banda Aceh sebagai kota pusaka. Hasil penelitian ini dapat diidentifikasi bahwa di Kota Pusaka Banda Aceh terdapat 12 lanskap sejarah dengan karakter tiga masa peninggalan, yaitu masa kerajaan dan kesultanan, masa kolonial, dan masa kemerdekaan. Dari penilaian kualitas lanskap sejarah, Lanskap Baiturrahman dan Putroe Phang yang merupakan lanskap masa Kerajaan dan Kesultanan memperoleh skor tertinggi sehingga menjadi prioritas untuk dilestarikan. Sebagian besar masyarakat tidak mengetahui bahwa Kota Banda Aceh telah ditetapkan sebagai kota pusaka, tetapi mereka setuju 12 lanskap sejarah di Kota Banda Aceh perlu dilestarikan. Perlu peningkatan upaya sosialisasi melalui berbagai media serta kegiatan-kegiatan terkait program kota pusaka. v Hasil Analytical Hierarchy Process (AHP), menunjukkan bahwa komponen prioritas dalam upaya pelestarian lanskap sejarah di Kota Banda Aceh adalah komponen keunikan (0,547), keaslian (0,231), kenyamanan (0,166), dan nilai penting (0,058). Alternatif prioritas untuk pelestarian lanskap sejarah di Kota Banda Aceh yaitu peninggalan Lanskap Kolonial (0,551), Lanskap Kerajaan dan Kesultanan (0,355), dan Lanskap Kemerdekaan (0,095). Komponen keunikan (integritas, keberagaman, dan kualitas estetik) merupakan komponen prioritas dalam upaya pelestarian lanskap sejarah sedangkan alternatif prioritasnya yaitu peninggalan dengan karakter lanskap kerajaan-kesultanan dan kolonial. Rekomendasi untuk melestarikan lanskap sejarah di Kota Banda Aceh yaitu penetapan kawasan prioritas pusaka. Produk rekomendasi berupa usulan deliniasi kawasan prioritas. Produk selanjutnya dari penelitian ini adalah peta pusaka Banda Aceh beserta informasi mengenai situs-situs sejarah Banda\nKeyword: budaya, keaslian, keunikan, lanskap sejarah, masa kesejarahan",
"Judul: Kajian Pendayagunaan Sumber Air Ciparay di Cinagara, Kecamatan Caringin Kabupaten Bogor\nAbstrak: Manfaat air bagi kehidupan manusia diantaranya digunakan untuk memenuhi kebutuhan air mmah tangga (domestik), industri, dan irigasi. Pemenuhan kebutuhan air untuk layanan tersebut memerlukan pengembangan sumber air yang bam. Salah satunya adalah pemanfaatan'mata air dan limpasan permukaan. Pengembangan sumberdaya air memerlukan adanya konsepsi, perencanaan, perancangan, kontmksi dan operasi fasilitas-fasilitas untuk pengendalian dan pemanfaatan air. Penelitian masalah khusus ini diharapkan dapat bermanfaat sebagai bahan pertimbangan pendayagunaan sumber air Ciparay secara berkelanjutan. Dengan diketahuinya debit sumber air dan kebutuhan air untuk tanaman (padi, palawija, sayuran dan buah-buahan), usaha tani ternak, perikanan, dan domestik maka efisiensi pemanfaatan sumber air dapat ditingkatkan agar pemenuhan kebutuhan air domestik serta pengembangan pertanian dan industri dapat direncanaltan dengan baik. Penelitian masalah khusus ini bertujuan untuk mengkaji pemanfaatan surnber air untuk memenuhi kebutuhan air tanaman (padi, palawija, dan hortiltultura, peternakan, perikanan dan domestik, yaitu meliputi kajian efisiensi pemanfaatan air, analisis biaya irigasi, sistem distribusi, dan pola pemanfaatan.\nKeyword: "
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from Alibaba-NLP/gte-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Judul: Formulasi Surfaktan Metil Ester Sulfonat sebagai Oil Well Cleaning\nAbstrak: Oil productivity reduction may be due to plugging in the oil rock formations. The plugging may be caused by the deposition of paraffin, asphaltene, and scale. Problem caused by the presence of the precipitate is the rock formation can be oil wet so that oil permeability decreases. The problem can be solved by well cleaning technique with surfactant formula. Surfactant MES is a type of anionic surfactant which has ability to lower the interfcial tension, surface tension, and able to change the properties of rock from oil wet to become water wet. Surfactant MES formula for well cleaning requires carrier agent. In this study, diesel oil and metil ester were used as carrying agent. Aromatic solvents were also needed. Xylene and toluene has ability to dissolve asphaltene that deposites in formation. Surfaktan formulation for well cleaning was done with several stages, those are determine the SMES concentration and aromatic solvents concentration. Surfactant performance tests for oil well cleaning were thermal stability, phase behavior, and wettability. The surfactant formula which gave the best performance was SMES 3% in metil ester carrying agent with xylene 15% as additive.\nKeyword: methyl sulfonic esters, oil well cleaning, Asphaltene',
'Judul: Formulasi Surfaktan SMES sebagai Acid Stimulation Agent untuk Aplikasi di Lapangan Karbonat OK\nAbstrak: Methyl Sulfonic Esters (MES) is one type of anionic surfactants which have advantages in terms of its hardness, resistance to deterjensi, the character of renewable and environmentally friendly. Excess MES this can be utilized as stimulation agent in oil wells, so can increase productivity an oil well. Increased productivity an oil well done by means of cleaning oil wells and pore a reservoir fromsediment of scale formed, enlarging the pores of rocks and can changing the nature of rocks being water-wet. This research was carried out to obtain the formula of solution of surfactants-based MES that can be applied as acid stimulation agent that is one method of IOR. Formula tested is a combination of surfactants sodium MES, HCl, and CH3COOH. The formulation is done by determining the optimum concentration of surfactant SMES and HCl gradually. The best results obtained from the solution of acid stimulation agent was with value of IFT < 10-2 dyne/cm with solubility of rock reaches 36%, and was can to change the contact angle of the reservoir rocks of the contact angle number 420 became 680 in formula SMES 6% + HCl 7% and CH3COOH 2%.\nKeyword: acid well stimulation, IOR, IFT, Sodium Methyl Sulfonic Esters',
'Judul: World Journal of Zoology\nAbstrak: A study on daily pattern of male western lowland gorilla (Gorilla gorilla gorilla, Savage & Wyman 1847) had been done at Schmutzer Primate Center, Taman Margasatwa Ragunan Jakarta, Indonesia. The aim of the study was to observe the daily activity pattern of adult male gorilla group without any female in captivity in order to obtain a condition of preparing incoming female gorillas leading to successfull conservation program.\nKeyword: ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
all-nli-devTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 1.0 |
| dot_accuracy | 0.0 |
| manhattan_accuracy | 1.0 |
| euclidean_accuracy | 1.0 |
| max_accuracy | 1.0 |
all-nli-testTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 1.0 |
| dot_accuracy | 0.0 |
| manhattan_accuracy | 1.0 |
| euclidean_accuracy | 1.0 |
| max_accuracy | 1.0 |
eval_strategy: stepsgradient_accumulation_steps: 2num_train_epochs: 1warmup_ratio: 0.1bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 2eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy | all-nli-test_max_accuracy |
|---|---|---|---|---|---|
| 0 | 0 | - | - | 0.9998 | - |
| 0.0772 | 2000 | 0.0402 | 0.0164 | 1.0 | - |
| 0.1544 | 4000 | 0.0213 | 0.0135 | 1.0 | - |
| 0.2316 | 6000 | 0.0182 | 0.0115 | 1.0 | - |
| 0.3088 | 8000 | 0.015 | 0.0106 | 1.0 | - |
| 0.3860 | 10000 | 0.014 | 0.0094 | 1.0 | - |
| 0.4632 | 12000 | 0.0116 | 0.0085 | 1.0 | - |
| 0.5404 | 14000 | 0.0097 | 0.0072 | 1.0 | - |
| 0.6176 | 16000 | 0.0083 | 0.0056 | 1.0 | - |
| 0.6948 | 18000 | 0.0071 | 0.0050 | 1.0 | - |
| 0.7720 | 20000 | 0.0066 | 0.0046 | 1.0 | - |
| 0.8492 | 22000 | 0.0051 | 0.0034 | 1.0 | - |
| 0.9264 | 24000 | 0.0047 | 0.0031 | 1.0 | - |
| 1.0000 | 25907 | - | - | - | 1.0 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Alibaba-NLP/gte-base-en-v1.5