SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Gurveer05/mpnet-base-eedi-2024")
# Run inference
sentences = [
    'Construct:  Solve coordinate geometry questions involving ratio.\n\nQuestion:  A straight line on squared paper. Points P, Q and R lie on this line. The leftmost end of the line is labelled P. If you travel right 4 squares and up 1 square you get to point Q. If you then travel 8 squares right and 2 squares up from Q you reach point R. What is the ratio of  P Q: P R  ?\n\nOptions:\nA. 1: 12\nB. 1: 4\nC. 1: 2\nD. 1: 3\n\nCorrect Answer: 1: 3\n\nIncorrect Answer: 1: 2\n\nPredicted Misconception: Misunderstanding the ratio calculation by not considering the correct horizontal and vertical distances between points P, Q, and R.',
    'May have estimated when using ratios with geometry',
    'Thinks x = y is an axis',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 12,210 training samples
  • Columns: qa_pair_text, MisconceptionName, and negative
  • Approximate statistics based on the first 1000 samples:
    qa_pair_text MisconceptionName negative
    type string string string
    details
    • min: 54 tokens
    • mean: 123.41 tokens
    • max: 384 tokens
    • min: 4 tokens
    • mean: 15.16 tokens
    • max: 39 tokens
    • min: 7 tokens
    • mean: 14.49 tokens
    • max: 40 tokens
  • Samples:
    qa_pair_text MisconceptionName negative
    Construct: Construct frequency tables.

    Question: Dave has recorded the number of pets his classmates have in the frequency table on the right.
    Number of pets
    Frequency
    0
    4
    1
    Construct: Convert between any other time periods.

    Question: To work out how many hours in a year you could do...

    Options:
    A. 365 x 7
    B. 365 x 60
    C. 365 x 12
    D. 365 x 24

    Correct Answer: 365 x 24

    Incorrect Answer: 365 x 60

    Predicted Misconception: Multiplying days by hours per minute instead of hours per day.
    Answers as if there are 60 hours in a day Confuses an equation with an expression
    Construct: Given information about one part, work out other parts.

    Question: Jess and Heena share some sweets in the ratio 3;: 5 .
    Jess gets 15 sweets.
    How many sweets does Heena get?

    Options:
    A. 17
    B. 9
    C. 5
    D. 25

    Correct Answer: 25

    Incorrect Answer: 17

    Predicted Misconception: Misunderstanding the direct proportionality between the ratio and actual quantities.
    Thinks a difference of one part in a ratio means the quantities will differ by one unit Believes dividing two positives will give a negative answer
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 9,640 evaluation samples
  • Columns: qa_pair_text, MisconceptionName, and negative
  • Approximate statistics based on the first 1000 samples:
    qa_pair_text MisconceptionName negative
    type string string string
    details
    • min: 56 tokens
    • mean: 121.43 tokens
    • max: 384 tokens
    • min: 6 tokens
    • mean: 14.51 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 13.86 tokens
    • max: 40 tokens
  • Samples:
    qa_pair_text MisconceptionName negative
    Construct: Identify when rounding a calculation will give an over or under approximation.

    Question: Tom and Katie are discussing how to estimate the answer to
    [
    38.8745 / 7.9302
    ]

    Tom says 40 / 7.9302 would give an overestimate.

    Katie says 38.8745 / 8 would give an overestimate.

    Who is correct?

    Options:
    A. Only Tom
    B. Only Katie
    C. Both Tom and Katie
    D. Neither is correct

    Correct Answer: Only Tom

    Incorrect Answer: Neither is correct

    Predicted Misconception: Rounding both numbers up leads to an overestimate.
    Believes that the larger the dividend, the smaller the answer. Does not know how to calculate the mean
    Construct: Substitute negative integer values into expressions involving no powers or roots.

    Question: Amy is trying to work out the distance between these two points: (1,-6) and (-5,2) She labels them like this: x_1
    y_1 x_2
    Construct: Round numbers to three or more decimal places.

    Question: What is 20.15349 rounded to 3 decimal places?

    Options:
    A. 20.153
    B. 20.15
    C. 20.154
    D. 20.253

    Correct Answer: 20.153

    Incorrect Answer: 20.154

    Predicted Misconception: Rounding up the fourth decimal place without considering the fifth decimal place.
    Rounds up instead of down When dividing decimals, does not realize that the order and position of the digits (relative to each other) has to remain constant.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 40
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {'num_cycles': 20}
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 40
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {'num_cycles': 20}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.5026 12 1.6224 -
1.0026 24 1.4736 1.6492
1.5052 36 1.3341 -
2.0052 48 1.1563 1.3401
2.5079 60 1.0641 -
3.0079 72 0.9238 1.1597
3.5105 84 0.8253 -
4.0105 96 0.7101 1.0224
4.5131 108 0.6285 -
5.0131 120 0.5821 0.9944
5.5157 132 0.5676 -
6.0157 144 0.5018 0.9471
6.5183 156 0.4599 -
7.0183 168 0.4403 0.9292
7.5209 180 0.4161 -
8.0209 192 0.3784 0.9107
8.5236 204 0.3503 -
9.0236 216 0.3451 0.9042
9.5262 228 0.3141 -
10.0262 240 0.2916 0.9012
10.5288 252 0.2863 -
11.0288 264 0.2713 0.8977
11.5314 276 0.244 -
12.0314 288 0.2323 0.8922
12.5340 300 0.2293 -
13.0340 312 0.211 0.8933
13.5366 324 0.1972 -
14.0366 336 0.1918 0.9024
14.5393 348 0.1868 -
15.0393 360 0.1704 0.8930
15.5419 372 0.1661 -
16.0419 384 0.1666 0.9077
16.5445 396 0.1558 -
17.0445 408 0.1459 0.9153
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Gurveer05/mpnet-base-eedi-2024

Finetuned
(345)
this model

Papers for Gurveer05/mpnet-base-eedi-2024