SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Chandar/sv-subject-based-all-MiniLM-L6-v2")
# Run inference
sentences = [
    'with\tthe\torigin\tof\tthe\tcoal\tformed\tduring\tthe\tcarboniferous\tepoch,\ttwo\tor\tthree\nconsiderations\tsuggest\tthemselves.\nIn\tthe\tfirst\tplace,\tthe\tgreat\tphantom\tof\tgeological\ttime\trises\tbefore\tthe\tstudent\tof\nthis,\tas\tof\tall\tother,\tfragments\tof\tthe\thistory\tof\tour\tearth—\tspringing\nirrepressibly\tout\tof\tthe\tfacts,\tlike\tthe\tDjin\tfrom\tthe\tjar\twhich\tthe\tfishermen\tso\nincautiously\topened;\tand\tlike\tthe\tDjin\tagain,\tbeing\tvaporous,\tshifting,\tand\nindefinable,\tbut\tunmistakably\tgigantic.\tHowever\tmodest\tthe\tbases\tof\tone\'s\ncalculation\tmay\tbe,\tthe\tminimum\tof\ttime\tassignable\tto\tthe\tcoal\tperiod\tremains\nsomething\tstupendous.\nPrincipal\tDawson\tis\tthe\tlast\tperson\tlikely\tto\tbe\tguilty\tof\texaggeration\tin\tthis\nmatter,\tand\tit\twill\tbe\twell\tto\tconsider\twhat\the\thas\tto\tsay\tabout\tit:—\n"The\trate\tof\taccumulation\tof\tcoal\twas\tvery\tslow.\tThe\tclimate\tof\tthe\tperiod,\tin\nthe\tnorthern\ttemperate\tzone,\twas\tof\tsuch\ta\tcharacter\tthat\tthe\ttrue\tconifers\tshow\nrings\tof\tgrowth,\tnot\tlarger,\tnor\tmuch\tless\tdistinct,\tthan\tthose\tof\tmany\tof\ttheir\nmodern\tcongeners.\tThe\t\nSigillarioe\n\tand\t\nCalamites\n\twere\tnot,\tas\toften\tsupposed,\ncomposed\twholly,\tor\teven\tprincipally,\tof\tlax\tand\tsoft\ttissues,\tor\tnecessarily\nshort-lived.\tThe\tformer\thad,\tit\tis\ttrue,\ta\tvery\tthick\tinner\tbark;\tbut\ttheir\tdense\nwoody\taxis,\ttheir\tthick\tand\tnearly\timperishable\touter\tbark,\tand\ttheir\tscanty\tand\nrigid\tfoliage,\twould\tindicate\tno\tvery\trapid\tgrowth\tor\tdecay.\tIn\tthe\tcase\tof\tthe\nSigillarioe\n,\tthe\tvariations\tin\tthe\tleaf-scars\tin\tdifferent\tparts\tof\tthe\ttrunk,\tthe\nintercalation\tof\tnew\tridges\tat\tthe\tsurface\trepresenting\tthat\tof\tnew\twoody\twedges\nin\tthe\taxis,\tthe\ttransverse\tmarks\tleft\tby\tthe\tstages\tof\tupward\tgrowth,\tall\tindicate\nthat\tseveral\tyears\tmust\thave\tbeen\trequired\tfor\tthe\tgrowth\tof\tstems\tof\tmoderate\nsize.\tThe\tenormous\troots\tof\tthese\ttrees,\tand\tthe\tcondition\tof\tthe\tcoal-swamps,\nmust\thave\texempted\tthem\tfrom\tthe\tdanger\tof\tbeing\toverthrown\tby\tviolence.\nThey\tprobably\tfell\tin\tsuccessive\tgenerations\tfrom\tnatural\tdecay;\tand\tmaking\nevery\tallowance\tfor\tother\tmaterials,\twe\tmay\tsafely\tassert\tthat\tevery\tfoot\tof\nthickness\tof\tpure\tbituminous\tcoal\timplies\tthe\tquiet\tgrowth\tand\tfall\tof\tat\tleast\nfifty\tgenerations\tof\t\nSigillarioe\n,\tand\ttherefore\tan\tundisturbed\tcondition\tof\tforest\ngrowth\tenduring\tthrough\tmany\tcenturies.\tFurther,\tthere\tis\tevidence\tthat\tan\nimmense\tamount\tof\tloose\tparenchymatous\ttissue,\tand\teven\tof\twood,\tperished\tby\ndecay,\tand\twe\tdo\tnot\tknow\tto\twhat\textent\teven\tthe\tmost\tdurable\ttissues\tmay\nhave\tdisappeared\tin\tthis\tway;\tso\tthat,\tin\tmany\tcoal-seams,\twe\tmay\thave\tonly\ta\nvery\tsmall\tpart\tof\tthe\tvegetable\tmatter\tproduced."\nUndoubtedly\tthe\tforce\tof\tthese\treflections\tis\tnot\tdiminished\twhen\tthe',
    '31 \n \n \n2.Chapter Two:………………………………………………………….. Causes of Aging \n \n \n There are many types of free radicals and the most related to the \nbiological process are those which derived from oxygen: the  Reactive \nOxygen Species ( ROS ). These ROS include superoxide anion , peroxide \nand hydro radicals . ROS are produced in vivo within the mitochondria \nduring electron tra nsport chain. They are also produced as intermediate \nproducts in different enzymatic reactions and by different physiological \nprocesses such as: \n\uf0a7 Phagocytic activity of white blood cells, specifically neu trophils. \nNeutrophils generate ROS during phagocytic activity in order to kill the \ninvading pathogens as a host defense mechanism. \n \n\uf0a7 When the cells are exposed to abnormal conditions -such as hypoxia \nor hperoxia -produce ROS. Some drugs have the ability to induce the \ncells to produce ROS due to their oxidizing effect. \n \n\uf0a7 An exposure to radiation may induce the biological systems to \nproduce ROS.',
    'THEINHERITANCEOFDURATION177\npreferredtostatetheconclusionintermsofdeath,rates,\nasitwasoriginallystatedbyPearson,becauseofthe\nbearingithasuponagreatdealofthepublichealth\npropagandasolooselyflungabout.Itneedonlybere-\nmemberedthatthereisaperfectlydefinitefunctional\nrelationbetweendeathrateandaveragedurationoflife\ninanapproximatelystablepopulationgroup,expres-\nsiblebyanequation,inordertoseethatanyconclusion\nastotherelativeinfluenceofheredityandenvironment\nuponthegeneraldeathratemustapplywithequalforce\ntothedurationoflife.\nTHESELECTIVEDEATHBATEINMAN\nIfthedurationo; lifewereinheriteditwouldlogical-\nlybeexpectedthatsomeportionofthedeathratemust\nbeselectiveincharacter.Forinheritanceofduration\noflifecanonlymeanthatwhenapersondiesisinpart\ndeterminedbythatindividual\'sbiologicalconstitutionor\nmakeup.Andequallyitisobviousthatindividualsof\nweakandunsoundconstitutionmust,ontheaverage,\ndieearlierthanthoseofstrong,sound,andvigorouscon-\nstitution."Whenceitfollowsthatthechancesofleaving\noffspringwillbegreaterforthoseofsoundconstitution\nthanfortheweaklings.Themathematicaldiscussion\nwhichhasjustbeengivenindicatesthatfromone-half\ntothree-fourthsofthedeathrateisselectiveinchar-\nacter,becausethatproportionisdeterminedbyhereditary\nfactors.Justinproportionashereditydetermines\nthedeathrate,soisthemortalityselective.Therealityof\nthefactofaselectivedeathrateinmancanbeeasily\nshowngraphically.\nInFigure44areseenthegraphsofsomedatafrom\nEuropeanroyalfamilies,wherenoneglectofchildren,\n12',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.7934
cosine_accuracy_threshold 0.1476
cosine_f1 0.6493
cosine_f1_threshold 0.1067
cosine_precision 0.6543
cosine_recall 0.6444
cosine_ap 0.7384
cosine_mcc 0.4793

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,124,250 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 256 tokens
    • mean: 256.0 tokens
    • max: 256 tokens
    • min: 23 tokens
    • mean: 242.22 tokens
    • max: 256 tokens
    • 0: ~50.10%
    • 1: ~49.90%
  • Samples:
    sentence1 sentence2 label
    with the origin of the coal formed during the carboniferous epoch, two or three
    considerations suggest themselves.
    In the first place, the great phantom of geological time rises before the student of
    this, as of all other, fragments of the history of our earth— springing
    irrepressibly out of the facts, like the Djin from the jar which the fishermen so
    incautiously opened; and like the Djin again, being vaporous, shifting, and
    indefinable, but unmistakably gigantic. However modest the bases of one's
    calculation may be, the minimum of time assignable to the coal period remains
    something stupendous.
    Principal Dawson is the last person likely to be guilty of exaggeration in this
    matter, and it will be well to consider what he has to say about it:—
    "The rate of accumulation of coal was very slow. The climate of the period, in
    the northern temperate zone, was of such a character that the true conifers show
    rings of growth, not larger, nor much less distinct, than those of many of their
    moder...
    organic coenzymes to catalyze its specific chemical reaction. Therefore, enzyme function is, in part,
    regulated by an abundance of various cofactors and coenzymes, which are supplied primarily by the diets
    of most organisms.
    Figure 6.20Vitamins are important coenzymes or precursors of coenzymes, and are required for
    enzymes to function properly. Multivitamin capsules usually contain mixtures of all the vitamins at
    different percentages.
    Enzyme Compartmentalization
    In eukaryotic cells, molecules such as enzymes are usually compartmentalized into different organelles.
    This allows for yet another level of regulation of enzyme activity. Enzymes required only for certain
    cellular processes can be housed separately along with their substrates, allowing for more efficient
    chemical reactions. Examples of this sort of enzyme regulation based on location and proximity include
    the enzymes involved in the latter stages of cellular respiration, which take place exclusively in the
    mitochondria, and ...
    1
    with the origin of the coal formed during the carboniferous epoch, two or three
    considerations suggest themselves.
    In the first place, the great phantom of geological time rises before the student of
    this, as of all other, fragments of the history of our earth— springing
    irrepressibly out of the facts, like the Djin from the jar which the fishermen so
    incautiously opened; and like the Djin again, being vaporous, shifting, and
    indefinable, but unmistakably gigantic. However modest the bases of one's
    calculation may be, the minimum of time assignable to the coal period remains
    something stupendous.
    Principal Dawson is the last person likely to be guilty of exaggeration in this
    matter, and it will be well to consider what he has to say about it:—
    "The rate of accumulation of coal was very slow. The climate of the period, in
    the northern temperate zone, was of such a character that the true conifers show
    rings of growth, not larger, nor much less distinct, than those of many of their
    moder...
    Infertility
    Infertility is the inability to conceive a child or carry a child to birth. About 75 percent of causes of
    infertility can be identified; these include diseases, such as sexually transmitted diseases that can cause
    scarring of the reproductive tubes in either men or women, or developmental problems frequently related
    to abnormal hormone levels in one of the individuals. Inadequate nutrition, especially starvation, can
    delay menstruation. Stress can also lead to infertility. Short-term stress can affect hormone levels, while
    long-term stress can delay puberty and cause less frequent menstrual cycles. Other factors that affect
    fertility include toxins (such as cadmium), tobacco smoking, marijuana use, gonadal injuries, and aging.
    If infertility is identified, several assisted reproductive technologies (ART) are available to aid
    conception. A common type of ART isin vitrofertilization (IVF) where an egg and sperm are combined
    outside the body and then placed in the uterus. Eggs...
    1
    with the origin of the coal formed during the carboniferous epoch, two or three
    considerations suggest themselves.
    In the first place, the great phantom of geological time rises before the student of
    this, as of all other, fragments of the history of our earth— springing
    irrepressibly out of the facts, like the Djin from the jar which the fishermen so
    incautiously opened; and like the Djin again, being vaporous, shifting, and
    indefinable, but unmistakably gigantic. However modest the bases of one's
    calculation may be, the minimum of time assignable to the coal period remains
    something stupendous.
    Principal Dawson is the last person likely to be guilty of exaggeration in this
    matter, and it will be well to consider what he has to say about it:—
    "The rate of accumulation of coal was very slow. The climate of the period, in
    the northern temperate zone, was of such a character that the true conifers show
    rings of growth, not larger, nor much less distinct, than those of many of their
    moder...
    Figure 18.13The honeycreeper birds illustrate adaptive radiation. From one original species of bird,
    multiple others evolved, each with its own distinctive characteristics.
    Notice the differences in the species’ beaks in Figure 18.13. Evolution in response to natural
    selection based on specific food sources in each new habitat led to evolution of a different beak suited to
    the specific food source. The seed-eating bird has a thicker, stronger beak which is suited to break hard
    nuts. The nectar-eating birds have long beaks to dip into flowers to reach the nectar. The insect-eating
    birds have beaks like swords, appropriate for stabbing and impaling insects. Darwin’s finches are another
    example of adaptive radiation in an archipelago.
    Click through this interactive site (http://openstaxcollege.org/l/bird_evolution) to see how island
    birds evolved in evolutionary increments from 5 million years ago to today.
    Sympatric Speciation
    Can divergence occur if no physical barriers are in place to ...
    1
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • max_steps: 2000

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 2000
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss cosine_ap
-1 -1 - 0.7384
0.0071 500 0.5524 -
0.0142 1000 0.0016 -
0.0213 1500 0.0004 -
0.0285 2000 0.0001 -

Framework Versions

  • Python: 3.12.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.0
  • PyTorch: 2.7.1
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chandar/sv-subject-based-all-MiniLM-L6-v2

Finetuned
(752)
this model

Paper for Chandar/sv-subject-based-all-MiniLM-L6-v2

Evaluation results