SentenceTransformer based on sucharush/e5_stem_finetuned

This is a sentence-transformers model finetuned from sucharush/e5_stem_finetuned. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sucharush/e5_stem_finetuned
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Stergios-Konstantinidis/MNLP_M3_tokenizer_tuned_pos_neg_1")
# Run inference
sentences = [
    "The text presents a novel pilot-free multi-user uplink framework for integrated sensing and communication (ISAC) in mm-wave networks. It enables single-antenna users to transmit without dedicated pilots, allowing the base station to decode messages and detect targets by utilizing users' spatial and velocity diversities. The system employs a 3D super-resolution approach to jointly estimate multi-path parameters like delays, Doppler shifts, and angles-of-arrival, solved through semidefinite programming. A key innovation is multi-user fusion, enhancing sensing and decoding by combining diverse user observations. This approach improves robustness and integrates multi-user perspectives for high-resolution sensing and communication. Numerical results demonstrate substantial enhancements in target estimation and communication performance. The methodology transforms user transmissions into valuable sensing opportunities, crucial for the next generation of wireless networks focused on environmental awareness and reliable data transmission. Future work includes addressing user mobility, real-time challenges, and optimizing fusion techniques in noisy conditions.",
    "The text presents a novel pilot-free multi-user uplink framework for integrated sensing and communication (ISAC) in mm-wave networks. It enables single-antenna users to transmit without dedicated pilots, allowing the base station to decode messages and detect targets by utilizing users' spatial and velocity diversities. The system employs a 3D super-resolution approach to jointly estimate multi-path parameters like delays, Doppler shifts, and angles-of-arrival, solved through semidefinite programming. A key innovation is multi-user fusion, enhancing sensing and decoding by combining diverse user observations. This approach improves robustness and integrates multi-user perspectives for high-resolution sensing and communication. Numerical results demonstrate substantial enhancements in target estimation and communication performance. The methodology transforms user transmissions into valuable sensing opportunities, crucial for the next generation of wireless networks focused on environmental awareness and reliable data transmission. Future work includes addressing user mobility, real-time challenges, and optimizing fusion techniques in noisy conditions.",
    'Question: what is charlotte\'s real name from henry danger, Answer: List of Henry Danger characters Charlotte[5] (Riele Downs) is one of Henry\'s best friends. She is sarcastic, clever, and smart. She is the "sass master" of the bunch, always there to snap everyone back to reality. She and Henry have been best friends for a long time and therefore she is close enough to him to tell it like it is. She is a big fan of Captain Man. In the fourth episode, Charlotte figures out Henry\'s secret and gets a job as Henry and Ray\'s manager.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 99,980 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 16 tokens
    • mean: 186.84 tokens
    • max: 510 tokens
    • min: 16 tokens
    • mean: 186.84 tokens
    • max: 510 tokens
    • 1: 100.00%
  • Samples:
    sentence_0 sentence_1 label
    Question: who was the leader of the texas rangers, Answer: History of the Texas Ranger Division However, the Texas Rangers were not formally constituted until 1835. Austin returned to Texas after having been imprisoned in Mexico City and helped organize a council to govern the group. On October 17, at a consultation of the Provisional Government of Texas, Daniel Parker proposed a resolution to establish the Texas Rangers. He proposed creating three companies that would total some 60 men and would be known by "uniforms" consisting of a light duster (clothing) and an identification badge made from a Mexican Peso. They were instituted by Texan lawmakers on November 24. On November 28, 1835 Robert McAlpin Williamson was chosen to be the first Major of the Texas Rangers. Within two years the Rangers grew to more than 300 men. Question: who was the leader of the texas rangers, Answer: History of the Texas Ranger Division However, the Texas Rangers were not formally constituted until 1835. Austin returned to Texas after having been imprisoned in Mexico City and helped organize a council to govern the group. On October 17, at a consultation of the Provisional Government of Texas, Daniel Parker proposed a resolution to establish the Texas Rangers. He proposed creating three companies that would total some 60 men and would be known by "uniforms" consisting of a light duster (clothing) and an identification badge made from a Mexican Peso. They were instituted by Texan lawmakers on November 24. On November 28, 1835 Robert McAlpin Williamson was chosen to be the first Major of the Texas Rangers. Within two years the Rangers grew to more than 300 men. 1
    [{'from': 'human', 'value': 'Working together, Printer A and Printer B would finish a task in 40 minutes. Printer A alone would finish the task in 60 minutes. The task contains 480 pages. How many more pages per minute does Printer B print compared to Printer A?'}, {'from': 'gpt', 'value': "Let's denote the rate at which Printer A works as A pages per minute and the rate at which Printer B works as B pages per minute.\n\nWe know that Printer A alone can finish the task in 60 minutes, so:\nA * 60 = 480 pages\nA = 480 / 60\nA = 8 pages per minute\n\nWhen Printer A and Printer B work together, they can finish the task in 40 minutes, so:\n(A + B) * 40 = 480 pages\n8 + B = 480 / 40\n8 + B = 12\nB = 12 - 8\nB = 4 pages per minute\n\nNow we can find out how many more pages per minute Printer B prints compared to Printer A:\nB - A = 4 - 8\nB - A = -4\n\nThis result indicates that Printer A actually prints 4 pages per minute more than Printer B, not the other way around. If you're looking for h... [{'from': 'human', 'value': 'Working together, Printer A and Printer B would finish a task in 40 minutes. Printer A alone would finish the task in 60 minutes. The task contains 480 pages. How many more pages per minute does Printer B print compared to Printer A?'}, {'from': 'gpt', 'value': "Let's denote the rate at which Printer A works as A pages per minute and the rate at which Printer B works as B pages per minute.\n\nWe know that Printer A alone can finish the task in 60 minutes, so:\nA * 60 = 480 pages\nA = 480 / 60\nA = 8 pages per minute\n\nWhen Printer A and Printer B work together, they can finish the task in 40 minutes, so:\n(A + B) * 40 = 480 pages\n8 + B = 480 / 40\n8 + B = 12\nB = 12 - 8\nB = 4 pages per minute\n\nNow we can find out how many more pages per minute Printer B prints compared to Printer A:\nB - A = 4 - 8\nB - A = -4\n\nThis result indicates that Printer A actually prints 4 pages per minute more than Printer B, not the other way around. If you're looking for h... 1
    Question: where does the time zone change from central to eastern, Answer: Eastern Time Zone The boundary between time zones is set forth in the Code of Federal Regulations, with the boundary between the Eastern and Central Time Zones being specifically detailed at 49 CFR 71.[4] Question: where does the time zone change from central to eastern, Answer: Eastern Time Zone The boundary between time zones is set forth in the Code of Federal Regulations, with the boundary between the Eastern and Central Time Zones being specifically detailed at 49 CFR 71.[4] 1
  • Loss: ContrastiveTensionLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 20
  • per_device_eval_batch_size: 20
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 20
  • per_device_eval_batch_size: 20
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1000 500 6.2692
0.2000 1000 6.2665
0.3001 1500 6.2663
0.4001 2000 6.2662
0.5001 2500 6.266
0.6001 3000 6.2658
0.7001 3500 6.2657
0.8002 4000 6.2656
0.9002 4500 6.2656

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 3.4.1
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu126
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveTensionLoss

@inproceedings{carlsson2021semantic,
    title={Semantic Re-tuning with Contrastive Tension},
    author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=Ov_sMNau-PF}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Stergios-Konstantinidis/MNLP_M3_document_encoder

Finetuned
(5)
this model

Paper for Stergios-Konstantinidis/MNLP_M3_document_encoder