You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

tsi_v2

This is a sentence-transformers model finetuned from cl-nagoya/ruri-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: cl-nagoya/ruri-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '高校公民免許。政治経済の指導が可能な方。',
    '高校1種(公民)',
    '高校1種(地理歴史)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9949

Training Details

Training Dataset

Unnamed Dataset

  • Size: 784 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 784 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 17.95 tokens
    • max: 31 tokens
    • min: 4 tokens
    • mean: 26.79 tokens
    • max: 65 tokens
    • min: 4 tokens
    • mean: 14.69 tokens
    • max: 33 tokens
  • Samples:
    anchor positive negative
    高校の英語免許をお持ちの方。ネイティブレベルで英語で授業可能な方。 高校1種(外国語), 高校専修(外国語), 英検1級, TOEIC 990, IELTS 9.0, ケンブリッジ英検CPE, 国連英検特A級 小学校1種, 中学1種(国語), 書道教員
    自動車整備士の資格と高校の技術免許をお持ちの方。 高校1種(工業), 高校専修(工業), 自動車整備士1級, 自動車整備士2級, 二級電気工事士 高校1種(商業), 簿記1級, 宅地建物取引士
    英検準1級以上の英語力とICTスキルをお持ちの方。高校英語免許尚可。 英検準1級, TOEIC 850, IELTS 7.5, ITパスポート, MOS (Excel Expert), MOS (PowerPoint Specialist) 英検3級, 漢字検定3級, 日本語教育能力検定
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.25
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_drop_last: True
  • remove_unused_columns: False
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss spcc_cosine_accuracy
-1 -1 - 0.8622
0.2041 10 0.1924 -
0.4082 20 0.065 -
0.6122 30 0.0221 -
0.8163 40 0.0266 -
1.0204 50 0.0095 0.9796
1.2245 60 0.0056 -
1.4286 70 0.0023 -
1.6327 80 0.0021 -
1.8367 90 0.0042 -
2.0408 100 0.0007 0.9949
2.2449 110 0.0003 -
2.4490 120 0.0003 -
2.6531 130 0.001 -
2.8571 140 0.0009 -
0.2041 10 0.0009 -
0.4082 20 0.0 -
0.6122 30 0.0014 -
0.8163 40 0.0033 -
1.0204 50 0.0019 0.9949
1.2245 60 0.0003 -
1.4286 70 0.0007 -
1.6327 80 0.0005 -
1.8367 90 0.0004 -
2.0408 100 0.0 0.9949
2.2449 110 0.0004 -
2.4490 120 0.0005 -
2.6531 130 0.0001 -
2.8571 140 0.0003 -
0.2041 10 0.0002 -
0.4082 20 0.0 -
0.6122 30 0.0 -
0.8163 40 0.0 -
1.0204 50 0.0004 0.9949
1.2245 60 0.0 -
1.4286 70 0.0003 -
1.6327 80 0.0004 -
1.8367 90 0.0 -
2.0408 100 0.0001 0.9949
2.2449 110 0.0005 -
2.4490 120 0.0 -
2.6531 130 0.0 -
2.8571 140 0.0 -
-1 -1 - 0.9949

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tdm503/tsi-finetuned-requirements_v1

Finetuned
(4)
this model

Papers for tdm503/tsi-finetuned-requirements_v1

Evaluation results