Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Paper • 2101.06983 • Published • 2
How to use minsuas/Misconceptions_1 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("minsuas/Misconceptions_1")
sentences = [
"Subject: Range and Interquartile Range from a List of Data\nConstruct: Calculate the range from a list of data\nQuestion: What is the range of the following numbers?\n\\[\n1,5,5,17,-6\n\\]\nIncorrect Answer: \\( 5 \\)",
"To find the range adds the biggest and smallest number rather than subtract\nThe passage is clarifying a common misunderstanding about how to calculate the range of a set of numbers. The misconception here is that someone might think the range is found by adding the largest number to the smallest number in the dataset. However, this is incorrect. The correct method to find the range is to subtract the smallest number from the largest number in the dataset. This subtraction gives the difference, which represents how spread out the numbers are.",
"Finds the mode rather than the range\nThe passage is indicating a common mistake made in solving math problems, particularly those involving statistics. The misconception lies in a confusion between two statistical concepts: the mode and the range.\n\n- **Mode**: This is the value that appears most frequently in a set of data. It helps to identify the most typical or common value.\n- **Range**: This is the difference between the highest and lowest values in a set of data. It gives an idea about the spread or dispersion of the values.\n\nThe misconception described here suggests that a student might calculate the mode when asked to find the range, or simply mix up these two concepts. The important distinction is that while the mode tells you about the frequency of the most common value, the range informs you about the span of the data.",
"Believes a cubic expression should have three terms\nThe misconception described here is that someone might think a cubic expression, which is a polynomial of degree three, should consist of exactly three terms. This is a misunderstanding because a cubic expression can have any number of terms, but the highest power of the variable must be three. \n\nFor example, both \\( x^3 + 2x + 1 \\) and \\( 4x^3 - 3x^2 + x - 7 \\) are cubic expressions, even though they have different numbers of terms. The defining characteristic is that the highest power of the variable (x in these examples) is three. So, a cubic expression can have fewer or more than three terms, as long as the degree (the highest power) of the expression is three."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("minsuas/Misconceptions_1")
# Run inference
sentences = [
'Subject: Construct Triangle\nConstruct: Construct a triangle using Side-Side-Side\nQuestion: Tom and Katie are arguing about constructing triangles.\n\nTom says you can construct a triangle with lengths and .\n\nKatie says you can construct a triangle with lengths and .\n\nWho is correct?\nIncorrect Answer: Neither is correct',
'Does not realise that the sum of the two shorter sides must be greater than the third side for it to be a possible triangle\nThe passage is discussing a common misconception about the properties required to form a triangle. The misconception is that one might think any three given side lengths can form a triangle. However, for three lengths to actually form a triangle, they must satisfy the triangle inequality theorem. This theorem states that the sum of the lengths of any two sides of a triangle must be greater than the length of the remaining side. This rule must hold true for all three combinations of added side lengths. \n\nTo apply this to the misconception: one does not realize that the sum of the lengths of the two shorter sides must be greater than the length of the longest side to form a possible triangle. This ensures that the sides can actually meet to form a closed figure with three angles.',
'Draws both angles at the same end of the line when constructing a triangle\nThe misconception described refers to a common error in geometry when students are constructing a triangle based on given angles and a line segment. The mistake is to draw both given angles at the same end of the given line segment. This is incorrect because in a triangle, each angle is located at a different vertex, and each vertex connects two sides. To correctly construct the triangle, each given angle should be drawn at different ends of the line segment (if constructing based on one line segment and two angles) or at vertices defined by the construction steps (if additional sides are given). This ensures that the three angles are positioned to form the corners of the triangle, with each angle at a distinct vertex, thereby creating a proper triangle.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 512num_train_epochs: 1lr_scheduler_type: cosinewarmup_ratio: 0.1fp16: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 512per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}