SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the telecom-technical-documents-retrieval-embedding-dataset dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("KayaTechAI/Qwen3-0.6B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-With-Config")
# Run inference
queries = [
    "What is the provisioning scope for the eMLPP service?",
]
documents = [
    'eMLPP is provisioned per subscriber.',
    'The main objective is to verify that the User Equipment (UE) tracks channel variations and selects the optimal transport format for frequency non-selective scheduling.',
    'SDP is used in SIP communications to describe the parameters and media capabilities of a session, such as audio/video codecs, transport protocols, and IP addresses, enabling participants to agree on the media types to be used.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6303, -0.0008, -0.0340]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7988
cosine_accuracy@3 0.912
cosine_accuracy@5 0.9404
cosine_accuracy@10 0.9636
cosine_precision@1 0.7988
cosine_precision@3 0.304
cosine_precision@5 0.1881
cosine_precision@10 0.0964
cosine_recall@1 0.7988
cosine_recall@3 0.912
cosine_recall@5 0.9404
cosine_recall@10 0.9636
cosine_ndcg@10 0.886
cosine_mrr@10 0.8606
cosine_map@100 0.8621

Information Retrieval

Metric Value
cosine_accuracy@1 0.7996
cosine_accuracy@3 0.9148
cosine_accuracy@5 0.9408
cosine_accuracy@10 0.9624
cosine_precision@1 0.7996
cosine_precision@3 0.3049
cosine_precision@5 0.1882
cosine_precision@10 0.0962
cosine_recall@1 0.7996
cosine_recall@3 0.9148
cosine_recall@5 0.9408
cosine_recall@10 0.9624
cosine_ndcg@10 0.8859
cosine_mrr@10 0.8608
cosine_map@100 0.8625

Information Retrieval

Metric Value
cosine_accuracy@1 0.7968
cosine_accuracy@3 0.9128
cosine_accuracy@5 0.9388
cosine_accuracy@10 0.962
cosine_precision@1 0.7968
cosine_precision@3 0.3043
cosine_precision@5 0.1878
cosine_precision@10 0.0962
cosine_recall@1 0.7968
cosine_recall@3 0.9128
cosine_recall@5 0.9388
cosine_recall@10 0.962
cosine_ndcg@10 0.8844
cosine_mrr@10 0.8589
cosine_map@100 0.8606

Information Retrieval

Metric Value
cosine_accuracy@1 0.7804
cosine_accuracy@3 0.912
cosine_accuracy@5 0.9316
cosine_accuracy@10 0.9584
cosine_precision@1 0.7804
cosine_precision@3 0.304
cosine_precision@5 0.1863
cosine_precision@10 0.0958
cosine_recall@1 0.7804
cosine_recall@3 0.912
cosine_recall@5 0.9316
cosine_recall@10 0.9584
cosine_ndcg@10 0.8753
cosine_mrr@10 0.848
cosine_map@100 0.8496

Information Retrieval

Metric Value
cosine_accuracy@1 0.7696
cosine_accuracy@3 0.898
cosine_accuracy@5 0.9268
cosine_accuracy@10 0.9524
cosine_precision@1 0.7696
cosine_precision@3 0.2993
cosine_precision@5 0.1854
cosine_precision@10 0.0952
cosine_recall@1 0.7696
cosine_recall@3 0.898
cosine_recall@5 0.9268
cosine_recall@10 0.9524
cosine_ndcg@10 0.8663
cosine_mrr@10 0.8381
cosine_map@100 0.8399

Information Retrieval

Metric Value
cosine_accuracy@1 0.75
cosine_accuracy@3 0.8816
cosine_accuracy@5 0.9124
cosine_accuracy@10 0.9456
cosine_precision@1 0.75
cosine_precision@3 0.2939
cosine_precision@5 0.1825
cosine_precision@10 0.0946
cosine_recall@1 0.75
cosine_recall@3 0.8816
cosine_recall@5 0.9124
cosine_recall@10 0.9456
cosine_ndcg@10 0.8522
cosine_mrr@10 0.8218
cosine_map@100 0.8236

Training Details

Training Dataset

telecom-technical-documents-retrieval-embedding-dataset

  • Dataset: telecom-technical-documents-retrieval-embedding-dataset at 3ebf34a
  • Size: 127,731 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 18.79 tokens
    • max: 68 tokens
    • min: 4 tokens
    • mean: 26.09 tokens
    • max: 77 tokens
  • Samples:
    anchor positive
    What is the estimated Transmit power considered sufficient for achieving 95% Downlink coverage with a single Base Station? Approximately 14 dBm Transmit power is considered sufficient.
    What is the primary goal of the Nominal Accuracy requirement? The primary goal of the Nominal Accuracy requirement is to ensure good accuracy when signal conditions are ideal.
    What happens on the mobile station side if contention resolution fails because the G-RNTI value in the network's acknowledgement message differs from what the mobile station sent? If the mobile station receives a PACKET UPLINK ACK/NACK message with a G-RNTI value different from the one it included in its first RLC data blocks, it signifies a contention resolution failure, and the mobile station will not transmit a PACKET CONTROL ACKNOWLEDGEMENT.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_1024_cosine_ndcg@10 dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0401 10 1.5256 - - - - - -
0.0802 20 0.8247 - - - - - -
0.1202 30 0.4102 - - - - - -
0.1603 40 0.27 - - - - - -
0.2004 50 0.2182 - - - - - -
0.2405 60 0.1998 - - - - - -
0.2806 70 0.2017 - - - - - -
0.3206 80 0.1672 - - - - - -
0.3607 90 0.2029 - - - - - -
0.4008 100 0.1609 - - - - - -
0.4409 110 0.1565 - - - - - -
0.4810 120 0.1476 - - - - - -
0.5210 130 0.1278 - - - - - -
0.5611 140 0.1669 - - - - - -
0.6012 150 0.1642 - - - - - -
0.6413 160 0.1307 - - - - - -
0.6814 170 0.1487 - - - - - -
0.7214 180 0.1329 - - - - - -
0.7615 190 0.13 - - - - - -
0.8016 200 0.1393 - - - - - -
0.8417 210 0.1344 - - - - - -
0.8818 220 0.1184 - - - - - -
0.9218 230 0.1147 - - - - - -
0.9619 240 0.1283 - - - - - -
1.0 250 0.1228 0.8693 0.8683 0.8634 0.8535 0.8430 0.8082
1.0401 260 0.0613 - - - - - -
1.0802 270 0.0559 - - - - - -
1.1202 280 0.0704 - - - - - -
1.1603 290 0.0578 - - - - - -
1.2004 300 0.0588 - - - - - -
1.2405 310 0.079 - - - - - -
1.2806 320 0.0602 - - - - - -
1.3206 330 0.0553 - - - - - -
1.3607 340 0.0663 - - - - - -
1.4008 350 0.0513 - - - - - -
1.4409 360 0.0615 - - - - - -
1.4810 370 0.0462 - - - - - -
1.5210 380 0.0674 - - - - - -
1.5611 390 0.0558 - - - - - -
1.6012 400 0.0562 - - - - - -
1.6413 410 0.0688 - - - - - -
1.6814 420 0.0905 - - - - - -
1.7214 430 0.0463 - - - - - -
1.7615 440 0.0581 - - - - - -
1.8016 450 0.0586 - - - - - -
1.8417 460 0.0712 - - - - - -
1.8818 470 0.041 - - - - - -
1.9218 480 0.0578 - - - - - -
1.9619 490 0.063 - - - - - -
2.0 500 0.0505 0.8771 0.8780 0.8764 0.8690 0.8587 0.8353
2.0401 510 0.032 - - - - - -
2.0802 520 0.0239 - - - - - -
2.1202 530 0.029 - - - - - -
2.1603 540 0.0236 - - - - - -
2.2004 550 0.0381 - - - - - -
2.2405 560 0.028 - - - - - -
2.2806 570 0.0366 - - - - - -
2.3206 580 0.0372 - - - - - -
2.3607 590 0.0306 - - - - - -
2.4008 600 0.0294 - - - - - -
2.4409 610 0.0269 - - - - - -
2.4810 620 0.0411 - - - - - -
2.5210 630 0.0251 - - - - - -
2.5611 640 0.0299 - - - - - -
2.6012 650 0.0275 - - - - - -
2.6413 660 0.0267 - - - - - -
2.6814 670 0.0304 - - - - - -
2.7214 680 0.0246 - - - - - -
2.7615 690 0.025 - - - - - -
2.8016 700 0.037 - - - - - -
2.8417 710 0.0393 - - - - - -
2.8818 720 0.0405 - - - - - -
2.9218 730 0.0279 - - - - - -
2.9619 740 0.0243 - - - - - -
3.0 750 0.0284 0.8870 0.8858 0.8827 0.8745 0.8648 0.8499
3.0401 760 0.0166 - - - - - -
3.0802 770 0.024 - - - - - -
3.1202 780 0.0302 - - - - - -
3.1603 790 0.0263 - - - - - -
3.2004 800 0.0172 - - - - - -
3.2405 810 0.023 - - - - - -
3.2806 820 0.0313 - - - - - -
3.3206 830 0.0253 - - - - - -
3.3607 840 0.0189 - - - - - -
3.4008 850 0.0177 - - - - - -
3.4409 860 0.0187 - - - - - -
3.4810 870 0.0142 - - - - - -
3.5210 880 0.0281 - - - - - -
3.5611 890 0.0253 - - - - - -
3.6012 900 0.0184 - - - - - -
3.6413 910 0.0217 - - - - - -
3.6814 920 0.027 - - - - - -
3.7214 930 0.0192 - - - - - -
3.7615 940 0.0183 - - - - - -
3.8016 950 0.0242 - - - - - -
3.8417 960 0.0223 - - - - - -
3.8818 970 0.0161 - - - - - -
3.9218 980 0.0219 - - - - - -
3.9619 990 0.0236 - - - - - -
4.0 1000 0.0278 0.886 0.8859 0.8844 0.8753 0.8663 0.8522
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 4.55.4
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
30
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KayaTechAI/Qwen3-0.6B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-With-Config

Finetuned
(136)
this model

Papers for KayaTechAI/Qwen3-0.6B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-With-Config

Evaluation results