Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v5-text-nano-retrieval. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'EuroBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("duy95/jina_ct26_finetuned")
# Run inference
queries = [
"That\u0027s odd, it\u0027s as though there already existed a sound and potent (let alone cheap!) treatment for coronavirus and we don\u0027t need a vaccine. So does that mean that Fauci and the mainstream media have been pulling the wool over our eyes the whole time!? \ud83e\udd14",
]
documents = [
'The FDA-approved drug ivermectin inhibits the replication of SARS-CoV-2 in vitro Although several clinical trials are now underway to test possible therapies, the worldwide response to the COVID-19 outbreak has been largely limited to monitoring/containment. We report here that Ivermectin, an FDA-approved anti-parasitic previously shown to have broad-spectrum anti-viral activity in vitro, is an inhibitor of the causative virus (SARS-CoV-2), with a single addition to Vero-hSLAM cells 2 h post infection with SARS-CoV-2 able to effect ~5000-fold reduction in viral RNA at 48 h. Ivermectin therefore warrants further investigation for possible benefits in humans.',
'Crop Cultivation at Wartime – Plight and Resilience of Tigray’s Agrarian Society (North Ethiopia) During the 2021 conflict in Tigray (north Ethiopia), crop cultivation has been hampered by warfare. Oxen have been looted and killed, farm inputs and tools destroyed by Ethiopian and Eritrean soldiers. Farmers felt vulnerable out in the open with their oxen. To produce, farmers evaluated risks involved with ploughing and organised lookouts. Overall, a large part of the land had been tilled in difficult conditions, and crops sown that require minimal management, without fertiliser, what led to low yields. True Colour Composite images, produced from Sentinel satellite imagery show that smallholder irrigation schemes were operational. There was a shift from commercial crops to cereals. The situation in western Tigray was particular, as there has been ethnic cleansing of the population and often the 2020 rainfed crops had even not been harvested. Overall, our findings show that the Tigrayan smallholder farming system is resilient, thanks to community self-organisation, combining common strategies of agrarian societies in wartime: spatio-temporal shift in agricultural activities to avoid the proximity with soldiers and shifts in crop types. Rather unique is the relying on communal aid, while the blockade of the Tigray region made that outmigration and off-farm income were no options for the farmers.',
'Visualizing Speech-Generated Oral Fluid Droplets with Laser Light Scattering result on serum RT-qPCR assay for yellow fever.Liver-biopsy samples showed lobular necroinflammation, which included many foci of spotty necrosis, apoptosis, and hydropic hepatocyte degeneration in all lobular zones, without typical midzonal lesions associated with yellow fever, along with extensive hypercellularity and hypertrophy of Kupffer cells.Some of the patients had confluent necrosis.Among the patients who underwent liver biopsy, immunohistochemical analysis was positive for yellow fever antigen, which was found mainly in Kupffer-cell cytoplasm; such antigens are typically found in hepatocytes of the midzonal region in patients with acute yellow fever.All 26 patients recovered clinically with normal levels of liver enzymes.In a 2019 report, 5 researchers described rebound hepatitis associated with yellow fever in two travelers who had returned to France from Brazil.Similar to these investigators, we hypothesized that such cases of late-onset liver inflammation result from an impaired immune transition from an antiinflammatory pattern to a proinflammatory pattern owing to the presence of the virus or its antigens after the acute phase.In our study, the administration of sofosbuvir did not appear to be associated with subsequent changes in levels of liver enzymes.Thus, in this study, we characterized another possible clinical manifestation of yellow fever, a late-onset relapsing hepatitis occurring 1 to 4 months after the initial symptoms of severe acute yellow fever.Longer follow-up of the patients is needed to determine whether this condition will have serious health implications.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.4746, -0.0121, 0.1904]], dtype=torch.bfloat16)
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
Untersuchung entdeckte bei allen getesteten Personen seltene, aber periodische RBD-spezifische Antikörper mit hoher gegenviraler Aktivität, was darauf hindeutet, dass ein Impfstoff, der solche Antikörper auslösen soll, größtenteils wirksam sein könnte. |
Convergent antibody responses to SARS-CoV-2 in convalescent individuals During the coronavirus disease-2019 (COVID-19) pandemic, severe acute respiratory syndrome-related coronavirus-2 (SARS-CoV-2) has led to the infection of millions of people and has claimed hundreds of thousands of lives. The entry of the virus into cells depends on the receptor-binding domain (RBD) of the spike (S) protein of SARS-CoV-2. Although there is currently no vaccine, it is likely that antibodies will be essential for protection. However, little is known about the human antibody response to SARS-CoV-21–5. Here we report on 149 COVID-19-convalescent individuals. Plasma samples collected an average of 39 days after the onset of symptoms had variable half-maximal pseudovirus neutralizing titres; titres were less than 50 in 33% of samples, below 1,000 in 79% of samples and only 1% of samples had titres above 5,000. Antibody sequencing revealed the expansion of clones of RBD-specific memory B cells that express... |
🚨AKTUELL +++ SCIENCE- Studie unter Beteiligung von „Top-Virologe“ @c_drosten und seinem Labor zurückgezogen, da die Ergebnisse zur Entstehung von Omicron nicht nachweisbar sind und Fehlinterpretationen vermutlich auf falsch-positiven, kontaminierten Proben beruhen! |
Retraction In the Research Article “Gradual emergence followed by exponential spread of the SARS-CoV-2 Omicron variant in Africa”(1), we reported data from retrospective characterization of viral genomes of putative ancestors of the SARS-CoV-2 Omicron variant from western Africa months before the first detection of Omicron. After several social media posts suggested that these putative early Omicron ancestor sequences may have been false positives, we reanalyzed our data and the residual samples. We found a mixture of different SARS-CoV-2 genomic fragments contaminating some of the samples and sequence data on which we based our analysis. The residual samples are now exhausted, and the reconstruction of evolutionary intermediates cannot be replicated. Therefore, we are retracting our Research Article. The epidemiological data are not called into question and will be made available. |
Turns out that Biden’s vaccine mandates forced babies (for which no safe dose has been established) to get mRNA via their mother’s breast milk. Is there any nastier abuse of power? |
Detection of Messenger RNA COVID-19 Vaccines in Human Breast Milk This cohort study investigates the presence of COVID-19 vaccine mRNA in the expressed breast milk of lactating individuals who received the vaccination within 6 months after delivery. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
num_train_epochs: 2multi_dataset_batch_sampler: round_robinper_device_train_batch_size: 8num_train_epochs: 2max_steps: -1learning_rate: 5e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1label_smoothing_factor: 0.0bf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: noper_device_eval_batch_size: 8prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.2078 | 500 | 0.1950 |
| 0.4156 | 1000 | 0.1858 |
| 0.6234 | 1500 | 0.1594 |
| 0.8313 | 2000 | 0.1608 |
| 1.0391 | 2500 | 0.1494 |
| 1.2469 | 3000 | 0.0823 |
| 1.4547 | 3500 | 0.0804 |
| 1.6625 | 4000 | 0.0887 |
| 1.8703 | 4500 | 0.0781 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
EuroBERT/EuroBERT-210m