metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:21769
- loss:MultipleNegativesRankingLoss
base_model: am-azadi/bilingual-embedding-large_Fine_Tuned_1e
widget:
- source_sentence: >-
GOOD NEWS! Eriksen, has already gone out to the hospital window, where he
is under observation and looks optimistic after having suffered a cardiac
arrest.
sentences:
- >-
Bolsonaro with the two assassins of Marielle Franco No, the men next to
Jair Bolsonaro in this photo are not the ones accused of the murder of
Marielle Franco
- >-
This photo shows Christian Eriksen waving from the window of the
hospital where he was admitted after suffering cardiac arrest The photo
of Eriksen waving from the window was taken months before his heart
incident
- >-
Video of protests in the US during the COVID-19 pandemic This video has
been circulating in reports about the funeral procession of military
commanders in Iran in January 2020
- source_sentence: >-
What a dirty game... "US postman arrested in canadian border with
banknotes stolen in the trunk of the car". 91 Breaking911 5h U.S. Postal
Worker Caught at Canadian Border With Stolen Ballots In Car Trunk -
breaking911.com/u-s-postal-wor... 8218248 Claudia Wild IT 8206434 300
4:57 06 Nov 20 Twitter for iPhone 1,134 Retweets 113 Tweets with comment
sentences:
- >-
Postman arrested with stolen bills at US-Canada border Only three blank
bills were found in a US postal worker's car
- >-
Covid relief plan will cost every American $5,750 Misleading posts claim
US covid relief plan costs every American $5,750
- >-
CDC informs that 10% of the swabs used for PCR testing were sent to
LABORATORIES, being analyzed of GENETIC SEQUENCES We check the claim
that PCR tests aim to sequence the DNA of patients with covid-19
- source_sentence: >-
. Northeast Always in Our Hearts! Advance Northeast!! . Brazilian Army
through its Engineering Battalion finds a Huge Potable Water Well in
Seridó - Caicó/RN, one of the most needy areas. This well will supply the
homes of more than 3,000 people!! . It's our President Bolsonaro ridding
the Bravo People of the Northeast from the wounds of drought! . . .
BRAZIL LOVED HOMELAND . . Friends and Followers of : Follow and Turn on
our Notifications . . # pocket .
sentences:
- >-
Twitter suspended Elon Musk's Twitter account after he pulled out of
deal Imposter Elon Musk Twitter account shared in false posts claiming
he was 'suspended' over buyout row
- >-
The Brazilian Army found water in Caicó, Rio Grande do Norte, during the
government of President Jair Bolsonaro. The recording of the drilling of
an artesian well in Caicó, Rio Grande do Norte, has been circulating
since 2015
- >-
A video was published today about Syrian refugees in Sweden being
subjected to the separation of husbands, as well as the forcible removal
of their children and the handing over of children to Christian families
to change their religion. And to turn them into Christians, they will
have two children Swedish police did not take Syrian children to hand
over to Christian families
- source_sentence: >-
what hp Álvaro Uribe Vélez ... 3pm ✓ The coastal people are the least
intellectual of the country, that is why this region of Colombia is mired
in poverty. They don't like to work either. that's why there is currently
a level very high of misery in la guajira. With the democratic center we
will change. The entire Caribbean coast must feel outraged by the
statements of this individual. Now with more reasons, the coastal people
should support Petro. The how.. see more
sentences:
- >-
Covid-19: Omicron variant is transmitted by eye contact according to the
WHO The coronavirus is transmitted by interaction with contaminated
droplets, not by eye contact
- >-
5G causes suffocation in humans, affects the respiratory system There is
no evidence that 5G technology affects the respiratory system and
increases toxins in the body
- >-
Álvaro Uribe tweeted that the coastal people are the least intellectual
population in Colombia There is no record of Uribe tweeting that the
coast is the "least intellectual" region of Colombia
- source_sentence: >-
The terrorists evaporated in seconds A very rare scene of the moment the
Egyptian planes bombed the terrorist elements in Sinai Watch the video
here NB Please all our followers on our page subscribe to our YouTube
channel We will publish everything new on the ground Open the channel
link
sentences:
- >-
Cars melt due to hot weather in Saudi Arabia No, these cars did not melt
due to hot weather
- >-
Footage shows robbery in Sri Lanka Delhi crime footage falsely shared as
'Sri Lanka burglary'
- >-
A very rare scene of the moment the Egyptian planes bombed the terrorist
elements in Sinai This picture is not of an Egyptian warplane, but of an
Israeli plane
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on am-azadi/bilingual-embedding-large_Fine_Tuned_1e
This is a sentence-transformers model finetuned from am-azadi/bilingual-embedding-large_Fine_Tuned_1e. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: am-azadi/bilingual-embedding-large_Fine_Tuned_1e
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BilingualModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'The terrorists evaporated in seconds A very rare scene of the moment the Egyptian planes bombed the terrorist elements in Sinai Watch the video here NB Please all our followers on our page subscribe to our YouTube channel We will publish everything new on the ground Open the channel link ',
'A very rare scene of the moment the Egyptian planes bombed the terrorist elements in Sinai This picture is not of an Egyptian warplane, but of an Israeli plane',
'Cars melt due to hot weather in Saudi Arabia No, these cars did not melt due to hot weather',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 21,769 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 6 tokens
- mean: 119.28 tokens
- max: 512 tokens
- min: 18 tokens
- mean: 39.42 tokens
- max: 98 tokens
- Samples:
sentence_0 sentence_1 HAPPENING NOW ; KENYA ELECTRIC BUS IS ON FIRE ALONG KAREN ROAD.Electric bus catches fire in Nairobi Video shows a methane-powered bus that caught fire in Italy, not an electric bus in KenyaRUPTLY Viewed 51,670 times 8 hours Snorr On the way down Khao Pak Thong Chai, route 3-4, Sattahip - Korat, all of them would have died. pity Incident 27 Jun.Video showing road accidents in Thailand? This is a video published in a news report about a car crash in Russia.The image that went around the world! This photo won the best of the decade award and led to the author to depression, the author narrated in his description; "Cheetahs chased a mother deer and her 2 babies, she offered herself so that her children could escape and in the photo looks like she watches her babies run to safety as she is about to be devoured" How many times have you stopped to think how many sacrifices your parents do for you. While you have fun, laugh and you enjoy life, they give theirs.Cheetahs chased a mother deer and she volunteered so her children could escape Behind the picture: Cheetahs learned from their mother how to capture prey - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 2per_device_eval_batch_size: 2num_train_epochs: 1multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 2per_device_eval_batch_size: 2per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.0459 | 500 | 0.0135 |
| 0.0919 | 1000 | 0.024 |
| 0.1378 | 1500 | 0.0073 |
| 0.1837 | 2000 | 0.0103 |
| 0.2297 | 2500 | 0.0265 |
| 0.2756 | 3000 | 0.0209 |
| 0.3215 | 3500 | 0.0308 |
| 0.3675 | 4000 | 0.0301 |
| 0.4134 | 4500 | 0.0382 |
| 0.4593 | 5000 | 0.0164 |
| 0.5053 | 5500 | 0.0251 |
| 0.5512 | 6000 | 0.0141 |
| 0.5972 | 6500 | 0.0131 |
| 0.6431 | 7000 | 0.006 |
| 0.6890 | 7500 | 0.0261 |
| 0.7350 | 8000 | 0.0111 |
| 0.7809 | 8500 | 0.0089 |
| 0.8268 | 9000 | 0.0201 |
| 0.8728 | 9500 | 0.0175 |
| 0.9187 | 10000 | 0.0086 |
| 0.9646 | 10500 | 0.0049 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}