SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the finanical-rag-embedding-dataset dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("elnurgar/qwen3-embedding-finetuned")
# Run inference
sentences = [
    "How much did GM Financial's primary source of cash from finance charge income increase in 2023 compared to the previous year?",
    'In the year ended December 31, 2023, Net cash provided by operating activities increased primarily due to an increase in finance charge income of $1.7 billion.',
    "A corporate entity referred to as a management services organization (MSO) provides various management services and keeps the physician entity 'friendly' through a stock transfer restriction agreement and/or other relationships. The fees under the management services arrangement must comply with state fee splitting laws, which in some states may prohibit percentage-based fees.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5147, -0.0829],
#         [ 0.5147,  1.0000, -0.0916],
#         [-0.0829, -0.0916,  1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6971
cosine_accuracy@3 0.8471
cosine_accuracy@5 0.8843
cosine_accuracy@10 0.9143
cosine_precision@1 0.6971
cosine_precision@3 0.2824
cosine_precision@5 0.1769
cosine_precision@10 0.0914
cosine_recall@1 0.6971
cosine_recall@3 0.8471
cosine_recall@5 0.8843
cosine_recall@10 0.9143
cosine_ndcg@10 0.8111
cosine_mrr@10 0.7774
cosine_map@100 0.7808

Training Details

Training Dataset

finanical-rag-embedding-dataset

  • Dataset: finanical-rag-embedding-dataset at e0b1781
  • Size: 6,300 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 20.9 tokens
    • max: 45 tokens
    • min: 9 tokens
    • mean: 47.3 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What was the amount of cash generated from operations by the company in fiscal year 2023? Highlights during fiscal year 2023 include the following: We generated $18,085 million of cash from operations.
    How much were unrealized losses on U.S. government and agency securities for those held for 12 months or greater as of June 30, 2023? U.S. government and agency securities | $ | 7,950 | | $ | (336 | ) | $ | 45,273 | $ | (3,534 | ) | $ | 53,223 | $ | (3,870 | )
    How is the impairment of assets assessed for projects still under development? For assets under development, assets are grouped and assessed for impairment by estimating the undiscounted cash flows, which include remaining construction costs, over the asset's remaining useful life. If cash flows do not exceed the carrying amount, impairment based on fair value versus carrying value is considered.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

finanical-rag-embedding-dataset

  • Dataset: finanical-rag-embedding-dataset at e0b1781
  • Size: 700 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 700 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 21.52 tokens
    • max: 54 tokens
    • min: 4 tokens
    • mean: 50.92 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    How much were the company's debt obligations as of December 31, 2023? The company's debt obligations as of December 31, 2023, totaled $2,299,887 thousand.
    What are the specific structures and legal considerations for a management services organization (MSO) in relation to its relationship with physician owners? A corporate entity referred to as a management services organization (MSO) provides various management services and keeps the physician entity 'friendly' through a stock transfer restriction agreement and/or other relationships. The fees under the management services arrangement must comply with state fee splitting laws, which in some states may prohibit percentage-based fees.
    Where does Eli Lilly and Company manufacture and distribute its products? We manufacture and distribute our products through facilities in the United States (U.S.), including Puerto Rico, and in Europe and Asia.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 8
  • learning_rate: 0.0002
  • weight_decay: 0.1
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • fp16_full_eval: True
  • tf32: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.1
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: True
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss ir-eval_cosine_ndcg@10
0.0406 2 0.3124 - -
0.0812 4 0.2676 0.4594 0.7143
0.1218 6 0.2724 - -
0.1624 8 0.2373 0.2985 0.7329
0.2030 10 0.1475 - -
0.2437 12 0.0728 0.1531 0.7410
0.2843 14 0.0654 - -
0.3249 16 0.0655 0.1202 0.7349
0.3655 18 0.0604 - -
0.4061 20 0.1237 0.1128 0.7402
0.4467 22 0.0528 - -
0.4873 24 0.062 0.1033 0.7504
0.5279 26 0.0244 - -
0.5685 28 0.0432 0.0921 0.7601
0.6091 30 0.0301 - -
0.6497 32 0.0637 0.0839 0.7712
0.6904 34 0.0542 - -
0.7310 36 0.0256 0.0778 0.7810
0.7716 38 0.0397 - -
0.8122 40 0.0191 0.0736 0.7879
0.8528 42 0.0403 - -
0.8934 44 0.0354 0.0702 0.7903
0.9340 46 0.0291 - -
0.9746 48 0.0534 0.0696 0.7956
1.0 50 0.0229 - -
1.0406 52 0.0364 0.0718 0.7911
1.0812 54 0.0219 - -
1.1218 56 0.0347 0.0725 0.7903
1.1624 58 0.0303 - -
1.2030 60 0.0679 0.0725 0.7914
1.2437 62 0.0249 - -
1.2843 64 0.032 0.0733 0.7948
1.3249 66 0.0285 - -
1.3655 68 0.0226 0.0752 0.7939
1.4061 70 0.0176 - -
1.4467 72 0.0257 0.0749 0.7991
1.4873 74 0.0131 - -
1.5279 76 0.0362 0.0733 0.8033
1.5685 78 0.0199 - -
1.6091 80 0.0073 0.0726 0.8030
1.6497 82 0.0152 - -
1.6904 84 0.0078 0.0721 0.8042
1.7310 86 0.035 - -
1.7716 88 0.0267 0.0700 0.8029
1.8122 90 0.0114 - -
1.8528 92 0.0438 0.0674 0.8068
1.8934 94 0.0244 - -
1.9340 96 0.0125 0.0666 0.8051
1.9746 98 0.0463 - -
2.0 100 0.0095 0.0670 0.8100
2.0406 102 0.0251 - -
2.0812 104 0.0163 0.0670 0.8073
2.1218 106 0.0126 - -
2.1624 108 0.025 0.0666 0.8086
2.2030 110 0.0261 - -
2.2437 112 0.0313 0.0672 0.8073
2.2843 114 0.0197 - -
2.3249 116 0.022 0.0664 0.8055
2.3655 118 0.019 - -
2.4061 120 0.0121 0.0654 0.8071
2.4467 122 0.0093 - -
2.4873 124 0.022 0.0649 0.8059
2.5279 126 0.0125 - -
2.5685 128 0.0206 0.0647 0.8043
2.6091 130 0.012 - -
2.6497 132 0.0271 0.0646 0.8093
2.6904 134 0.0257 - -
2.7310 136 0.0097 0.0637 0.8066
2.7716 138 0.0348 - -
2.8122 140 0.0349 0.0637 0.8081
2.8528 142 0.0215 - -
2.8934 144 0.0106 0.0631 0.8067
2.9340 146 0.0421 - -
2.9746 148 0.0093 0.0625 0.8096
3.0 150 0.008 - -
3.0406 152 0.0144 0.0621 0.8079
3.0812 154 0.0531 - -
3.1218 156 0.0088 0.0622 0.8091
3.1624 158 0.0093 - -
3.2030 160 0.018 0.0619 0.8081
3.2437 162 0.0127 - -
3.2843 164 0.0091 0.0620 0.8101
3.3249 166 0.0121 - -
3.3655 168 0.0021 0.0618 0.8092
3.4061 170 0.0072 - -
3.4467 172 0.0178 0.0617 0.8090
3.4873 174 0.0256 - -
3.5279 176 0.0156 0.0619 0.8105
3.5685 178 0.0223 - -
3.6091 180 0.0215 0.0617 0.8112
3.6497 182 0.0084 - -
3.6904 184 0.0156 0.0617 0.8100
3.7310 186 0.0292 - -
3.7716 188 0.0138 0.0619 0.8105
3.8122 190 0.0072 - -
3.8528 192 0.0103 0.0614 0.8097
3.8934 194 0.0102 - -
3.9340 196 0.0176 0.0617 0.8096
3.9746 198 0.016 - -
4.0 200 0.0037 0.0618 0.8111
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.2
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for elnurgar/qwen3-embedding-finetuned

Finetuned
(96)
this model

Dataset used to train elnurgar/qwen3-embedding-finetuned

Evaluation results