sbert-hr-v2 / final_model /README.md
KarenYYH's picture
Upload sbert-hr-v2 fine-tuned model
86d4444 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:25128
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/distiluse-base-multilingual-cased-v1
widget:
  - source_sentence: >-
      Do we have your updated personal information on file? (Answer with Yes or
      No)
    sentences:
      - >-
        Employee: Offensive tweets were sent which caused me to feel anger and
        frustration.
      - >-
        Employee: Yes, I'll need help arranging for my vehicle to be
        transported.
      - >-
        Employee: The training will last for two days. The contact is Sofia
        Alvarez, her email is salvarez@lawfirm.com.
  - source_sentence: What level of access do you require? (e.g., Full, Read-Only, Limited)
    sentences:
      - 'Employee: The dates will be from 2023-06-15 to 2023-06-30.'
      - >-
        Employee: I'd say a 5, I really tried to fully invest myself in any
        collaborative work.
      - 'Employee: No other notes. I plan to return on June 15th, 2023.'
  - source_sentence: >-
      What type of time off are you requesting? (e.g., Vacation, Sick Leave,
      Personal Day)
    sentences:
      - >-
        Employee: I would like to select Plan A, and yes you should have my
        current information.
      - >-
        Employee: An apology from John and some workplace training would help.
        That's all I need to add.
      - >-
        Employee: The job transfer will take care of my employment, so no
        additional assistance is needed.
  - source_sentence: Describe the skill development or learning growth shown by the employee.
    sentences:
      - >-
        Employee: I would like my coverage to begin on 2023-03-01. I am looking
        to enroll in health insurance.
      - >-
        Employee: The date range for this review is January 1st, 2023 to
        December 31st, 2023. Alex has improved their digital art skills over the
        past year.
      - 'Employee: The incident was moderate and only affected me.'
  - source_sentence: Where did the incident occur? (Please provide the specific location)
    sentences:
      - 'Employee: I received first aid treatment in the office.'
      - 'Employee: My supervisor Principal Jones approved the request.'
      - 'Employee: Yes, I sprained my ankle. The incident occurred at 10:30.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/distiluse-base-multilingual-cased-v1
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: hr eval
          type: hr_eval
        metrics:
          - type: pearson_cosine
            value: 0.40439726027502476
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.38135944113883574
            name: Spearman Cosine

SentenceTransformer based on sentence-transformers/distiluse-base-multilingual-cased-v1

This is a sentence-transformers model finetuned from sentence-transformers/distiluse-base-multilingual-cased-v1. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'DistilBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Where did the incident occur? (Please provide the specific location)',
    'Employee: I received first aid treatment in the office.',
    'Employee: Yes, I sprained my ankle. The incident occurred at 10:30.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5309, 0.4969],
#         [0.5309, 1.0000, 0.9365],
#         [0.4969, 0.9365, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.4044
spearman_cosine 0.3814

Training Details

Training Dataset

Unnamed Dataset

  • Size: 25,128 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 10 tokens
    • mean: 19.45 tokens
    • max: 32 tokens
    • min: 8 tokens
    • mean: 24.39 tokens
    • max: 56 tokens
    • min: 0.0
    • mean: 0.18
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    What format do you prefer for the training? (e.g., Online, In-person, Workshop, Seminar) Employee: It's just a planned vacation. There's nothing else to note. 0.0
    Who was involved in the incident? (Names or descriptions of individuals) Employee: It was verbal harassment that happened at work. The person involved was John Smith. 1.0
    What immediate actions were taken following the incident? Employee: My vacation will end on June 22nd, 2023. I have not taken any other time off lately. 0.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss hr_eval_spearman_cosine
0.0637 100 - 0.2010
0.1273 200 - 0.1586
0.1910 300 - 0.2509
0.2546 400 - 0.2886
0.3183 500 2.7402 0.3120
0.3819 600 - 0.2132
0.4456 700 - 0.2521
0.5092 800 - 0.2559
0.5729 900 - 0.3063
0.6365 1000 2.668 0.3009
0.7002 1100 - 0.3297
0.7638 1200 - 0.3562
0.8275 1300 - 0.3723
0.8912 1400 - 0.3913
0.9548 1500 2.6426 0.3831
1.0 1571 - 0.3814

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.5.0
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}