SentenceTransformer based on Snowflake/snowflake-arctic-embed-m

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Snowflake/snowflake-arctic-embed-m
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("federicovolponi/Snowflake-snowflake-arctic-embed-m-space-sup")
# Run inference
sentences = [
    ': Block diagram of the 7-band CCD-in-CMOS TDI sensor. Each TX slice has two serializers and its own PLL.\nThe CCD bands operate continuously and time interleaved. The output stages for the CCD arrays are implemented both at the top and bottom of each band to support the bi-directional operation. All 14 output stages in one column are connected to one delta-sigma column-level ADC with digital CDS implemented in the digital decimator. The outputs of every 128 ADCs are serialized to one of 32 LVDS outputs. Two clock signals are also provided via LVDS to synchronize the channels. These outputs are capable of running at an aggregate data rate of >50Gb/s using on-chip PLLs.\nThe sensor has been processed for Back-Side Illumination and it has been packaged in a custom ceramic PGA package. Figure 15 shows a picture of the sensor with its 7 bands. The figure shows the front-side and back-side versions of the chip side by side.\n(a) (b) Figure 15: 7-band CCD-in-CMOS TDI chip photograph. FSI shown only for reference (a) and BSI version (b).\nAs a proof-of-concept, an RGB butcher-brick filter has been used as glass lid for the sensor, to enable multicolor TDI, although filters may be processed directly on the wafer as well [9]. The sensor,\ncamera system and a color image captured from the setup are depicted in Figure 16, providing evidence that multispectral TDI is viable with the sensor.\nFigure 16: Colour TDI image captured from the sensor, sensor with RGB color filter and camera set-up.\nTable 3 below shows a comparison of different TDI sensors, including the first iteration of our sensor.\nIntegrated drivers\nThe measurements on the first iteration of the SoC verified',
    'What is the aggregate data rate of the outputs of the 7-band CCD-in-CMOS TDI sensor?\n\n',
    'What is the primary objective of the Zodiac Pioneer Mission?\n\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: dim_768
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@5	0.8408
cosine_accuracy@10	0.8843
cosine_precision@5	0.1682
cosine_precision@10	0.0884
cosine_recall@5	0.8408
cosine_recall@10	0.8843
cosine_ndcg@5	0.7496
cosine_ndcg@10	0.7639
cosine_mrr@5	0.719
cosine_mrr@10	0.725
cosine_map@5	0.719
cosine_map@10	0.725

Information Retrieval

Dataset: dim_512
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@5	0.8346
cosine_accuracy@10	0.8781
cosine_precision@5	0.1669
cosine_precision@10	0.0878
cosine_recall@5	0.8346
cosine_recall@10	0.8781
cosine_ndcg@5	0.7384
cosine_ndcg@10	0.7524
cosine_mrr@5	0.7061
cosine_mrr@10	0.7118
cosine_map@5	0.7061
cosine_map@10	0.7118

Information Retrieval

Dataset: dim_256
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@5	0.8147
cosine_accuracy@10	0.8632
cosine_precision@5	0.1629
cosine_precision@10	0.0863
cosine_recall@5	0.8147
cosine_recall@10	0.8632
cosine_ndcg@5	0.7159
cosine_ndcg@10	0.7318
cosine_mrr@5	0.6827
cosine_mrr@10	0.6894
cosine_map@5	0.6827
cosine_map@10	0.6894

Information Retrieval

Dataset: dim_768
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@5	0.9199
cosine_accuracy@10	0.9551
cosine_precision@5	0.184
cosine_precision@10	0.0955
cosine_recall@5	0.9199
cosine_recall@10	0.9551
cosine_ndcg@5	0.786
cosine_ndcg@10	0.7975
cosine_mrr@5	0.7408
cosine_mrr@10	0.7455
cosine_map@5	0.7408
cosine_map@10	0.7455

Information Retrieval

Dataset: dim_512
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@5	0.9071
cosine_accuracy@10	0.9519
cosine_precision@5	0.1814
cosine_precision@10	0.0952
cosine_recall@5	0.9071
cosine_recall@10	0.9519
cosine_ndcg@5	0.7794
cosine_ndcg@10	0.7943
cosine_mrr@5	0.7363
cosine_mrr@10	0.7427
cosine_map@5	0.7363
cosine_map@10	0.7427

Information Retrieval

Dataset: dim_256
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@5	0.8846
cosine_accuracy@10	0.9455
cosine_precision@5	0.1769
cosine_precision@10	0.0946
cosine_recall@5	0.8846
cosine_recall@10	0.9455
cosine_ndcg@5	0.7548
cosine_ndcg@10	0.7748
cosine_mrr@5	0.7108
cosine_mrr@10	0.7193
cosine_map@5	0.7108
cosine_map@10	0.7193

Training Details

Training Dataset

Unnamed Dataset

Size: 7,232 training samples
Columns: positive and anchor
Approximate statistics based on the first 1000 samples:
positive anchor
type string string
details
min: 5 tokens
mean: 354.69 tokens
max: 512 tokens

min: 9 tokens
mean: 19.21 tokens
max: 40 tokens

	positive	anchor
type	string	string
details	min: 5 tokens mean: 354.69 tokens max: 512 tokens	min: 9 tokens mean: 19.21 tokens max: 40 tokens

Samples:

positive	anchor
, using diverse software or hardware designs may double design and veriﬁcation costs due to having to build two different components for the same functionality. Hence, although DCLS execution also halves performance efﬁciency (the corresponding functionality is executed twice), it allows reusing the same design (e.g. the same core design) for the primary and the redundant paths (e.g. with staggered execution), thus containing design and veriﬁcation costs. Redundancy can be applied at different granularities accord- ing to the sphere of replication (SoR). Choosing the right SoR depends on several tradeoffs like area overheads, re- design costs, fault detection time, and overall system costs. In the context of DCLS, the SoR is placed at the level of the CPU (core), as done for the AURIX processors. This requires including two replicas of the same core and compare their memory transactions, which requires roughly duplicating com- putational resources in the chip and being able to ensure that replicas can provide independent behavior. On the other hand, storage (memories, caches) and communication means (buses, crossbars) do not need to be fully replicated and can build upon Error Correction Codes (ECC) and Cyclic Redundancy Check (CRC) as a form of lightweight redundancy with diversity. HPC ASIL-D capable platforms typically combine a low- performance microcontroller amenable for the automotive do- main (i.e. ASIL-D capable) and an HPC accelerator deliv- ering high computation throughput, but whose adherence to ISO26262 requirements is unknown, so its appropriate use for ASIL-C/D systems needs to be investigated. Without loss of generality, we consider an NVIDIA GPU accelerator, thus analogous to those in NVIDIA Drive and Xavier families for the automotive domain. However, the ﬁndings in this paper can easily be extrapolated to other products. Software faults and some hardware faults are regarded as systematic, and it must be proven that their failure risk is residual. However, random hardware faults cannot be avoided, and means are required to prevent them from causing hazards. Those faults can be caused by, for example, voltage droops	`What are the advantages of using the same design for the primary and redundant paths in DCLS execution?`
: First, the TT&C spectrum requirements of the new satellites shall be assessed. Second, the utilization of existing TT&C frequency allocations and their potential to incorporate the future number of satellites is studied. Only for the case that this study results in the need for new spectrum, the study groups were asked to investigate new potential TT&C frequency allocations in the frequency ranges 150.05-174 MHz and 400.15-420 MHz. The studies shall be completed for WRC-19. This paper presents the intermediate results of the study groups. A study of the spectrum requirements of small satellites has been completed. The required spectrum for TT&C is expected to be less than 2.5 MHz for downlink and less than 1 MHz for uplink. Consequently, the study groups conducted sharing studies in various bands which will be summarized and evaluated from a satellite developer’s perspective. After the Cubesat design standard was introduced in 1999 and first satellites of this new class have been launched in the subsequent years, small satellites have become increasingly popular in the past five years. Today not only universities use small satellite platforms for education and technology demonstration, but also commercial operators started to develop and deploy satellites with masses of typically less than 50 kg and reasonably short development times. Currently more than hundred new satellites are currently launched into space per year. The increase of launches was recognized by the International Telecommunication Union (ITU) which is responsible for the coordination of the shared use of frequencies. As the first Cubesats were mainly launched by new entrants into the space sector, mandatory regulatory procedures like frequency coordination were omitted or underestimated by the developers. Additionally, the new developers complaint that the existing regulatory procedures are too complicated and time-consuming for satellites with short development times. The ITU therefore decided at the WRC-12 to study the characteristics of picosatellites and nanosatellites and their current practice in filing satellites to the ITU. The studies were concluded in 2015 with two reports on the characteristics [1] and current filing practice [2]. In these reports it was identified that the characteristics that define small satellites (low mass, small dimensions, low power, …) are not relevant from a frequency coordination perspective and that the short development times are still long enough to properly file the systems to the ITU. As a result	`What are the spectrum requirements for TT&C of small satellites?`
:287–299, Dec 2019. [20] Tam´as Vink´o and Dario Izzo. Global optimi- sation heuristics and test problems for prelimi- nary spacecraft trajectory design. Technical re- port, 2008. [21] Matej Petkovic, Luke Lucas, Dragi Kocev, Saˇso Dˇzeroski, Redouane Boumghar, and Nikola Simidjievski. Quantifying the effects of gyro- less flying of the mars express spacecraft with machine learning. In 2019 IEEE International [22] Janhavi H. Borse, Dipti D. Patil, Vinod Kumar, and Sudhir Kumar. Soft landing parameter measurements for candidate navigation trajec- tories using deep learning and ai-enabled plan- etary descent. Mathematical Problems in Engi- neering, 2022	`What are some of the research topics and methods explored in the provided references?`

Loss: losses.WeightedMultipleNegativesRankingLoss with these parameters:
```
{
    "scale": 20,
    "similarity_fct": "cos_sim"
}
```

Evaluation Dataset

Unnamed Dataset

Size: 804 evaluation samples
Columns: positive and anchor
Approximate statistics based on the first 1000 samples:
positive anchor
type string string
details
min: 4 tokens
mean: 351.15 tokens
max: 512 tokens

min: 8 tokens
mean: 19.36 tokens
max: 45 tokens

	positive	anchor
type	string	string
details	min: 4 tokens mean: 351.15 tokens max: 512 tokens	min: 8 tokens mean: 19.36 tokens max: 45 tokens

Samples:

positive	anchor
, the total number of test thermocouples has been rationalized taking into account redundancy needs, accommodation constraints and hardware passivation needs for flight. The test is subdivided into 19 phases (see Figure 12) with two phases before and after the test for the health check functional tests under room conditions. Functional tests demonstrate anomalies such as the PCDU Reset and operational malfunctions of the RAX instrument at its high temperatures. The PCDU Reset anomaly was solved during the test by a software patch and validated during the final hot and cold plateaus. To address the RAX anomaly at hot, various test configurations were simulated using the thermal numerical model during the test to actually perform RAX functional test at an intermediate plateau facilitating mission operational constraints for flight. Data collected from hot and cold thermal balance test phases, as well as the rover OFF transition from hot to cold, are the inputs for correlation activities conducted post-TV/TB test. The thermal numerical model updates mainly focus on conductive couplings	`What was the solution to the PCDU Reset anomaly during the test?`
, where +Z axis orients to the earth, and sun pointing attitude mode during day time orienting -Z axis to the sun. Therefore, attitude control subsystem is required to maneuver the satellite attitude twice per revolution around its pitch axis. Figure 6 shows concept of the attitude maneuverer. Another attitude maneuverer is necessary to perform SAR observation and SAR data download to a to ground station, because X-band transmit antenna is oriented to +Z, so the satellite has to offset its attitude to orient the X-band transmit antenna toward the ground station. 3.4 High pointing accuracy Disturbance torque and system momentum profiles during few revolutions were estimated as shown in Figure 7 and 8. Four micro reaction wheels, which can respond to these profiles were selected which enable attitude maneuvers within a short period of time. In order to perform a pitch attitude maneuver quickly, two wheels are located on pitch axis while one wheel was located on each of the remaining roll and yaw axes. Figure 9 shows the satellite attitudes during SAR observation. There are three kinds of attitude, strip map mode, sliding spot light mode, and spotlight mode. Large change of momentum is required for pitch axis when the satellite is in spotlight mode. However, two pitch reaction wheels do not generate enough momentum to execute spotlight mode. So, sliding spotlight mode was selected for high resolution SAR observation mode instead of spotlight mode, in order to relax the torque and momentum requirements to the pitch wheels. In addition, two pitch Figure 7. Disturbance torque profile Figure 8. System momentum profile reaction wheels are accelerated to plus direction or minus direction by using magnet torque before observation. In order to obtain a high resolution SAR data, high attitude control accuracy is required for spotlight mode observation. To achieve high pointing accuracy against a defined ground target point, the attitude control loop applied feed forward compensation with estimated attitude angle and rate. Figure 10 shows an example of dynamic error during a spotlight mode observation maneuver.[4] Equipment for SAR mission consumes total large power more than 1300W, therefore PCDU has a risk of causing electrical and RF influence to the bus power and signal line. In order to research the system, electrical interface check was performed using bread board model of PCDU, battery	`What is the reason for selecting sliding spotlight mode instead of spotlight mode for high resolution SAR observation?`
, body shape and motion assumptions. Then, ORSAT uses DCA to determine the reentry risk posed to the Earth’s population based on the year of reentry and orbit inclination. It also predicts impact kinetic energy (impact velocity and impact mass) of objects that survive reentry[18]. ORSAT has been in use for the last decade and currently in its 6.0 version. However, unlike DAS, OR- SAT is not readily available. Only personnel at the Johnson Space Center, Orbital Debris Program Oﬃce run ORSAT. ORSAT is limited to ballistic reentry, only tumbling motions or stable orientations of objects are allowed which produce no lift. Partial melting of objects is considered by a demise factor and almost all materials in the database are temperature de- pendent. Heating by oxidation is also considered [20]. Therefore, ORSAT determines when and if a reentry object demises by using integrated trajectory, atmospheric, aerodynamic, aero-thermodynamic, and thermal models as outlined in section 3.1 [17, 18, 20]. Reentry demisability analysis using DAS requires the spacecraft to be deﬁned to the level of each individual hardware part constituting the spacecraft. This step facilitates population of the DAS Spacecraft Deﬁnition Module . Section 3.2.1 illustrates a generic spacecraft subdivision approach that can be followed to itemize the individual parts spacecraft parts. Subsequently, non-demisable parts are identiﬁed before or by the actual reentry analysis as explained in section 3.2.2. Itemization of the demisable spacecraft basic parts can be best approached by decompos- ing the spacecraft according to the Hierarchical System Terminology deﬁned in the NASA Systems Engineering Handbook [14]. Tables 3.2, 3.3 and 3.4 illustrate a generic approach to decompose a spacecraft into basic parts [29, 30, 9] excluding the payload. Description of the speciﬁc product for the basic part identiﬁed completes the process. Though slight vari- ations are likely to occur in the decomposition of diﬀerent missions, the Generic Spacecraft Subsystems Hierarchical Subdivision approach is robust, hence	`What is the limitation of ORSAT in terms of object motion?`

Loss: losses.WeightedMultipleNegativesRankingLoss with these parameters:
```
{
    "scale": 20,
    "similarity_fct": "cos_sim"
}
```

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 3e-06
weight_decay: 0.001
num_train_epochs: 20
bf16: True
tf32: False
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 3e-06
weight_decay: 0.001
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: False
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss	dim_256_cosine_map@10	dim_512_cosine_map@10	dim_768_cosine_map@10
0.4425	100	0.5883	-	-	-	-
0.8850	200	0.2765	-	-	-	-
1.3274	300	0.2047	-	-	-	-
1.7699	400	0.1628	-	-	-	-
2.2124	500	0.1519	0.1204	0.7094	0.7271	0.7266
2.6549	600	0.1309	-	-	-	-
3.0973	700	0.1228	-	-	-	-
3.5398	800	0.1062	-	-	-	-
3.9823	900	0.097	-	-	-	-
4.4248	1000	0.0853	0.1026	0.7281	0.7409	0.7468
4.8673	1100	0.086	-	-	-	-
5.3097	1200	0.0723	-	-	-	-
5.7522	1300	0.0678	-	-	-	-
6.1947	1400	0.0655	-	-	-	-
6.6372	1500	0.0583	0.0970	0.7252	0.7479	0.7502
7.0796	1600	0.0586	-	-	-	-
7.5221	1700	0.0521	-	-	-	-
7.9646	1800	0.049	-	-	-	-
8.4071	1900	0.0437	-	-	-	-
8.8496	2000	0.0443	0.0974	0.7193	0.7427	0.7455

Framework Versions

Python: 3.12.0
Sentence Transformers: 3.0.1
Transformers: 4.41.2
PyTorch: 2.3.1+cu118
Accelerate: 0.31.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

WeightedMultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for federicovolponi/arctic-embed-m-space-sup

Base model

Snowflake/snowflake-arctic-embed-m

Finetuned

(52)

this model

Papers for federicovolponi/arctic-embed-m-space-sup

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13

Efficient Natural Language Response Suggestion for Smart Reply

Paper • 1705.00652 • Published May 1, 2017

Evaluation results

Cosine Accuracy@5 on dim 768
self-reported

0.841
Cosine Accuracy@10 on dim 768
self-reported

0.884
Cosine Precision@5 on dim 768
self-reported

0.168
Cosine Precision@10 on dim 768
self-reported

0.088
Cosine Recall@5 on dim 768
self-reported

0.841
Cosine Recall@10 on dim 768
self-reported

0.884
Cosine Ndcg@5 on dim 768
self-reported

0.750
Cosine Ndcg@10 on dim 768
self-reported

0.764