SentenceTransformer based on jiwonyou0420/MNLP_M2_document_encoder

This is a sentence-transformers model finetuned from jiwonyou0420/MNLP_M2_document_encoder. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jiwonyou0420/MNLP_M2_document_encoder
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jiwonyou0420/encoder-qa-finetuned-v2")
# Run inference
sentences = [
    'How can the renormalization technique be used to address the issue of infinity in the quantum electrodynamics calculation of the self-energy of an electron? Specifically, how can the divergent integrals be reorganized and regularized to yield a finite value for the self-energy?',
    'The prevalence of ALG6-CDG is unknown, but it is thought to be the second most common type of congenital disorder of glycosylation. More than 30 cases of ALG6-CDG have been described in the scientific literature.',
    'Superconductivity and superfluidity are two distinct quantum phenomena that share some similarities. Both phenomena involve the emergence of macroscopic quantum coherence, leading to the disappearance of electrical resistance or viscosity, respectively. They are both observed in materials at very low temperatures, where quantum effects become more pronounced.\n\nSuperconductivity is a phenomenon observed in certain materials, usually metals and alloys, where the electrical resistance drops to zero below a critical temperature. This allows for the flow of electric current without any energy loss. Superconductivity is explained by the BCS (Bardeen-Cooper-Schrieffer) theory, which states that electrons in a superconductor form Cooper pairs, which can move through the material without resistance due to their quantum mechanical nature.\n\nSuperfluidity, on the other hand, is a phenomenon observed in certain liquids, such as liquid helium, where the viscosity drops to zero below a critical temperature. This allows the liquid to flow without any resistance, leading to some unusual properties, such as the ability to climb the walls of a container or flow through extremely narrow channels. Superfluidity in liquid helium is explained by the Bose-Einstein condensation of helium atoms, which form a coherent quantum state that allows them to flow without resistance.\n\nWhile superconductivity and superfluidity are distinct phenomena, they share some similarities in their underlying mechanisms. Both involve the formation of a macroscopic quantum state, where particles (electrons in superconductors or atoms in superfluids) form pairs or condensates that can move without resistance. In this sense, superconductivity can be thought of as a type of superfluidity for charged particles.\n\nIn the case of liquid helium, superconductivity does not directly contribute to its superfluidity, as the two phenomena involve different particles (electrons for superconductivity and helium atoms for superfluidity). However, the study of superconductivity has provided valuable insights into the understanding of superfluidity, as both phenomena share some common underlying principles related to quantum coherence and the behavior of particles at very low temperatures.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 72,812 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 33.24 tokens
    • max: 148 tokens
    • min: 12 tokens
    • mean: 343.73 tokens
    • max: 512 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    What is (are) Multicentric Castleman Disease ? Multicentric Castleman disease (MCD) is a rare condition that affects the lymph nodes and related tissues. It is a form of Castleman disease that is "systemic" and affects multiple sets of lymph nodes and other tissues throughout the body (as opposed to unicentric Castleman disease which has more "localized" effects). The signs and symptoms of MCD are often nonspecific and blamed on other, more common conditions. They can vary but may include fever; weight loss; fatigue; night sweats; enlarged lymph nodes; nausea and vomiting; and an enlarged liver or spleen. The eact underlying cause is unknown. Treatment may involve immunotherapy, chemotherapy, corticosteroid medications and/or anti-viral drugs. 1.0
    What are the treatments for multiple sclerosis ? The rotation period of the Milky Way galaxy can be estimated based on the observed velocities of stars in the outer regions of the galaxy. The Milky Way has a diameter of about 100,000 light-years, and the Sun is located about 27,000 light-years from the galactic center. The Sun orbits the galactic center at a speed of approximately 220 km/s.

    To estimate the rotation period, we can use the formula for the circumference of a circle (C = 2πr) and divide it by the orbital speed of the Sun. The radius of the Sun's orbit is about 27,000 light-years, which is equivalent to 2.54 x 10^20 meters. Using this value, we can calculate the circumference of the Sun's orbit:

    C = 2π(2.54 x 10^20 m) ≈ 1.6 x 10^21 meters

    Now, we can divide the circumference by the Sun's orbital speed to find the time it takes for the Sun to complete one orbit around the Milky Way:

    T = C / v = (1.6 x 10^21 m) / (220 km/s) ≈ 7.3 x 10^15 seconds

    Converting this to years, we get:

    T ≈ 7.3 x 10^15 s * (1 year / 3.15 x 10...
    0.0
    "How do black holes affect the large-scale structure of the cosmic web, specifically in terms of dark matter distribution and the formation of galaxy clusters?" Black holes, especially supermassive black holes (SMBHs) found at the centers of galaxies, play a significant role in the large-scale structure of the cosmic web, which is a complex network of dark matter, gas, and galaxies that spans the universe. The cosmic web is organized into filaments, nodes, and voids, with galaxy clusters typically forming at the intersections of these filaments. The influence of black holes on the cosmic web can be understood in terms of their impact on dark matter distribution and the formation of galaxy clusters.

    1. Dark matter distribution: Dark matter is a key component of the cosmic web, as it provides the gravitational scaffolding for the formation of galaxies and galaxy clusters. Black holes, particularly SMBHs, can influence the distribution of dark matter in several ways. For instance, when black holes merge, they release gravitational waves that can potentially redistribute dark matter in their vicinity. Additionally, the accretion of matter onto bl...
    1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1099 500 0.0476
0.2197 1000 0.0277
0.3296 1500 0.0243
0.4395 2000 0.0225
0.5493 2500 0.0207
0.6592 3000 0.0206
0.7691 3500 0.019
0.8789 4000 0.02
0.9888 4500 0.0189

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
-
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jiwonyou0420/encoder-qa-finetuned-v2

Finetuned
(4)
this model

Paper for jiwonyou0420/encoder-qa-finetuned-v2