Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and has been fine-tuned to match essay texts with relevant skills for pedadogical evaluation.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference, to find matching skills for a given essay. The essay should be in plain text, and the skills should ideally be of the form "Short skill name: detailed skill description"
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("dpanea/skill-assignment-transformer")
# Prepare data
essay_text = ['Fighter Jet\nGreetings my fellow friends. I am going to talk about my greatest passion fighter jets...']
skills = [
'Noun Consistency Skills: I can use nouns, pronouns, plurals and tenses accurately and consistently throughout.',
'Adventurous Vocabulary Skills: I can select from a range of known adventurous vocabulary. (tier 2 and tier 3 words).',
'Descriptive Language Skills: I can use appropriate, interesting and varied word choice (adjectives, adverbs and descriptive phrases).',
'Dialogue Tagging Skills: I can use dialogue tags successfully (eg correct positioning, new line for new speaker).',
'Spell Words: I can spell commonly used words accurately.',
...
]
# Get embeddings
essay_embedding = model.encode(essay_text)
skill_embeddings = model.encode(skills)
# Get the k most relevant skills for the given essay
from sentence_transformers.util import cos_sim
similarities = cos_sim(essay_embedding, skill_embeddings).flatten()
top_indices = np.argsort(similarities)[-k:][::-1]
top_skills = [all_skill_texts[i] for i in top_indices]
Essay text, Relevant skill, and Irrelevant skill| Essay text | Relevant skill | Irrelevant skill | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| Essay text | Relevant skill | Irrelevant skill |
|---|---|---|
2024 POETRY FEATURE ARTICLE – SCAFFOLD - blank |
Emotionally Engaging Language: I can evoke an emotional response through emotive language. |
Reference Formatting Skills: Formats the reference list/bibliography correctly. |
Why is there no fuel for the next 500 kilometers? We need fuel and there is no way to turn back.This is such a bad time.We need fuel and i am gonna rage quit and drive us off the bridge if we can't get fuel any time soon pull over it's my turn, to drive you have been driving for the last hour and i want t go speeding, down this hill and get to the fuel station quicker, you drive way to slow and it is annoying me.Ok fine i'm pulling over.Finally ok i see that red car coming ,he wants to race and im racing him.ya i beat him but now we only have enough fuel for the next 200 km and the next fuel station is 250 km away i will drive until we run out of fuel then we will have to push and i'm paying for the fuel don't even think about paying for the fuel little brother.Ok time to push.No i am not pushing the car and you can not make me just because u are 1 year older than me does no mean can boss me around.Fine i will push lazy boy.What Why is the gas station shut down and the next one is 300k... |
Essay Organization Skills: Essay Writing |
Case Evaluation Skills: Does the student include discerning evaluation of ideas to support their case for positive change? |
What is the artefact? the artefact is a gold armband. What are the features of the artefacts? the features on the arte fact it's a gold amband it looks like it beendigging to look like a snake rap around ur arm. you can see the snake scale's and and snake head on the amberd. Question 2 What aspect of Ancient Roman society does this artefact represent? the artefacts represent partion partion partian partian head tate were the richest people in human society it tells us that partions were the richest people in Aome Home society. patients were on of social What does the artefact tell us about Ancient Roman society? pyramid. they had all theexpertsn suf and they had Slaves How does this artefact give us an understanding about Ancient Roman society? the artefact gives us a understanding their were rich people and Cparthers) they had a late more money then all the others people in home society. 7 |
Spelling Visuals: Spelling visual - 4 |
Event Setting Visualization Skills: I can use technical vocabulary, contemporary language and images to create a sense of the event and the setting |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
eval_strategy: stepsper_device_train_batch_size: 4per_device_eval_batch_size: 4multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.0340 | 100 | - |
| 0.0679 | 200 | - |
| 0.1019 | 300 | - |
| 0.1358 | 400 | - |
| 0.1698 | 500 | 1.7346 |
| 0.2037 | 600 | - |
| 0.2377 | 700 | - |
| 0.2716 | 800 | - |
| 0.3056 | 900 | - |
| 0.3396 | 1000 | 0.8428 |
| 0.3735 | 1100 | - |
| 0.4075 | 1200 | - |
| 0.4414 | 1300 | - |
| 0.4754 | 1400 | - |
| 0.5093 | 1500 | 0.4421 |
| 0.5433 | 1600 | - |
| 0.5772 | 1700 | - |
| 0.6112 | 1800 | - |
| 0.6452 | 1900 | - |
| 0.6791 | 2000 | 0.3366 |
| 0.7131 | 2100 | - |
| 0.7470 | 2200 | - |
| 0.7810 | 2300 | - |
| 0.8149 | 2400 | - |
| 0.8489 | 2500 | 0.2568 |
| 0.8829 | 2600 | - |
| 0.9168 | 2700 | - |
| 0.9508 | 2800 | - |
| 0.9847 | 2900 | - |
| 1.0 | 2945 | - |
| 1.0187 | 3000 | 0.1666 |
| 1.0526 | 3100 | - |
| 1.0866 | 3200 | - |
| 1.1205 | 3300 | - |
| 1.1545 | 3400 | - |
| 1.1885 | 3500 | 0.1027 |
| 1.2224 | 3600 | - |
| 1.2564 | 3700 | - |
| 1.2903 | 3800 | - |
| 1.3243 | 3900 | - |
| 1.3582 | 4000 | 0.0657 |
| 1.3922 | 4100 | - |
| 1.4261 | 4200 | - |
| 1.4601 | 4300 | - |
| 1.4941 | 4400 | - |
| 1.5280 | 4500 | 0.0788 |
| 1.5620 | 4600 | - |
| 1.5959 | 4700 | - |
| 1.6299 | 4800 | - |
| 1.6638 | 4900 | - |
| 1.6978 | 5000 | 0.0648 |
| 1.7317 | 5100 | - |
| 1.7657 | 5200 | - |
| 1.7997 | 5300 | - |
| 1.8336 | 5400 | - |
| 1.8676 | 5500 | 0.0413 |
| 1.9015 | 5600 | - |
| 1.9355 | 5700 | - |
| 1.9694 | 5800 | - |
| 2.0 | 5890 | - |
| 2.0034 | 5900 | - |
| 2.0374 | 6000 | 0.0293 |
| 2.0713 | 6100 | - |
| 2.1053 | 6200 | - |
| 2.1392 | 6300 | - |
| 2.1732 | 6400 | - |
| 2.2071 | 6500 | 0.0158 |
| 2.2411 | 6600 | - |
| 2.2750 | 6700 | - |
| 2.3090 | 6800 | - |
| 2.3430 | 6900 | - |
| 2.3769 | 7000 | 0.0183 |
| 2.4109 | 7100 | - |
| 2.4448 | 7200 | - |
| 2.4788 | 7300 | - |
| 2.5127 | 7400 | - |
| 2.5467 | 7500 | 0.0079 |
| 2.5806 | 7600 | - |
| 2.6146 | 7700 | - |
| 2.6486 | 7800 | - |
| 2.6825 | 7900 | - |
| 2.7165 | 8000 | 0.007 |
| 2.7504 | 8100 | - |
| 2.7844 | 8200 | - |
| 2.8183 | 8300 | - |
| 2.8523 | 8400 | - |
| 2.8862 | 8500 | 0.0057 |
| 2.9202 | 8600 | - |
| 2.9542 | 8700 | - |
| 2.9881 | 8800 | - |
| 3.0 | 8835 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
Alibaba-NLP/gte-large-en-v1.5