SentenceTransformer based on jiwonyou0420/MNLP_M2_document_encoder

This is a sentence-transformers model finetuned from jiwonyou0420/MNLP_M2_document_encoder. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: jiwonyou0420/MNLP_M2_document_encoder
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jiwonyou0420/encoder-qa-finetuned-v2")
# Run inference
sentences = [
    'How can the renormalization technique be used to address the issue of infinity in the quantum electrodynamics calculation of the self-energy of an electron? Specifically, how can the divergent integrals be reorganized and regularized to yield a finite value for the self-energy?',
    'The prevalence of ALG6-CDG is unknown, but it is thought to be the second most common type of congenital disorder of glycosylation. More than 30 cases of ALG6-CDG have been described in the scientific literature.',
    'Superconductivity and superfluidity are two distinct quantum phenomena that share some similarities. Both phenomena involve the emergence of macroscopic quantum coherence, leading to the disappearance of electrical resistance or viscosity, respectively. They are both observed in materials at very low temperatures, where quantum effects become more pronounced.\n\nSuperconductivity is a phenomenon observed in certain materials, usually metals and alloys, where the electrical resistance drops to zero below a critical temperature. This allows for the flow of electric current without any energy loss. Superconductivity is explained by the BCS (Bardeen-Cooper-Schrieffer) theory, which states that electrons in a superconductor form Cooper pairs, which can move through the material without resistance due to their quantum mechanical nature.\n\nSuperfluidity, on the other hand, is a phenomenon observed in certain liquids, such as liquid helium, where the viscosity drops to zero below a critical temperature. This allows the liquid to flow without any resistance, leading to some unusual properties, such as the ability to climb the walls of a container or flow through extremely narrow channels. Superfluidity in liquid helium is explained by the Bose-Einstein condensation of helium atoms, which form a coherent quantum state that allows them to flow without resistance.\n\nWhile superconductivity and superfluidity are distinct phenomena, they share some similarities in their underlying mechanisms. Both involve the formation of a macroscopic quantum state, where particles (electrons in superconductors or atoms in superfluids) form pairs or condensates that can move without resistance. In this sense, superconductivity can be thought of as a type of superfluidity for charged particles.\n\nIn the case of liquid helium, superconductivity does not directly contribute to its superfluidity, as the two phenomena involve different particles (electrons for superconductivity and helium atoms for superfluidity). However, the study of superconductivity has provided valuable insights into the understanding of superfluidity, as both phenomena share some common underlying principles related to quantum coherence and the behavior of particles at very low temperatures.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 72,812 training samples
Columns: sentence_0, sentence_1, and label
Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label
type string string float
details
min: 7 tokens
mean: 33.24 tokens
max: 148 tokens

min: 12 tokens
mean: 343.73 tokens
max: 512 tokens

min: 0.0
mean: 0.5
max: 1.0

	sentence_0	sentence_1	label
type	string	string	float
details	min: 7 tokens mean: 33.24 tokens max: 148 tokens	min: 12 tokens mean: 343.73 tokens max: 512 tokens	min: 0.0 mean: 0.5 max: 1.0

Samples:

sentence_0	sentence_1	label
`What is (are) Multicentric Castleman Disease ?`	Multicentric Castleman disease (MCD) is a rare condition that affects the lymph nodes and related tissues. It is a form of Castleman disease that is "systemic" and affects multiple sets of lymph nodes and other tissues throughout the body (as opposed to unicentric Castleman disease which has more "localized" effects). The signs and symptoms of MCD are often nonspecific and blamed on other, more common conditions. They can vary but may include fever; weight loss; fatigue; night sweats; enlarged lymph nodes; nausea and vomiting; and an enlarged liver or spleen. The eact underlying cause is unknown. Treatment may involve immunotherapy, chemotherapy, corticosteroid medications and/or anti-viral drugs.	`1.0`
`What are the treatments for multiple sclerosis ?`	The rotation period of the Milky Way galaxy can be estimated based on the observed velocities of stars in the outer regions of the galaxy. The Milky Way has a diameter of about 100,000 light-years, and the Sun is located about 27,000 light-years from the galactic center. The Sun orbits the galactic center at a speed of approximately 220 km/s. To estimate the rotation period, we can use the formula for the circumference of a circle (C = 2πr) and divide it by the orbital speed of the Sun. The radius of the Sun's orbit is about 27,000 light-years, which is equivalent to 2.54 x 10^20 meters. Using this value, we can calculate the circumference of the Sun's orbit: C = 2π(2.54 x 10^20 m) ≈ 1.6 x 10^21 meters Now, we can divide the circumference by the Sun's orbital speed to find the time it takes for the Sun to complete one orbit around the Milky Way: T = C / v = (1.6 x 10^21 m) / (220 km/s) ≈ 7.3 x 10^15 seconds Converting this to years, we get: T ≈ 7.3 x 10^15 s * (1 year / 3.15 x 10...	`0.0`
`"How do black holes affect the large-scale structure of the cosmic web, specifically in terms of dark matter distribution and the formation of galaxy clusters?"`	Black holes, especially supermassive black holes (SMBHs) found at the centers of galaxies, play a significant role in the large-scale structure of the cosmic web, which is a complex network of dark matter, gas, and galaxies that spans the universe. The cosmic web is organized into filaments, nodes, and voids, with galaxy clusters typically forming at the intersections of these filaments. The influence of black holes on the cosmic web can be understood in terms of their impact on dark matter distribution and the formation of galaxy clusters. 1. Dark matter distribution: Dark matter is a key component of the cosmic web, as it provides the gravitational scaffolding for the formation of galaxies and galaxy clusters. Black holes, particularly SMBHs, can influence the distribution of dark matter in several ways. For instance, when black holes merge, they release gravitational waves that can potentially redistribute dark matter in their vicinity. Additionally, the accretion of matter onto bl...	`1.0`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 1
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss
0.1099	500	0.0476
0.2197	1000	0.0277
0.3296	1500	0.0243
0.4395	2000	0.0225
0.5493	2500	0.0207
0.6592	3000	0.0206
0.7691	3500	0.019
0.8789	4000	0.02
0.9888	4500	0.0189

Framework Versions

Python: 3.12.8
Sentence Transformers: 3.4.1
Transformers: 4.51.3
PyTorch: 2.5.1+cu124
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: -

Safetensors

Model size

33.4M params

Tensor type

F32

Model tree for jiwonyou0420/encoder-qa-finetuned-v2

Base model

jiwonyou0420/MNLP_M2_document_encoder

Finetuned

(4)

this model

Paper for jiwonyou0420/encoder-qa-finetuned-v2

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12