Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use CalebMaresca/matrix-game-embeddings-ft-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("CalebMaresca/matrix-game-embeddings-ft-v1")
sentences = [
" What is the proposed alternative to imposing random events purely by chance on players?",
"half a day). \n• They are perceived to be new and innovative (despite being around since 1987). \n• They are easy to transport, requiring only pen and paper – with perhaps a few maps and \ncounters. \n• They work well in multi-domain, multi-agency contexts allowing all Actors to participate \nequally. \nA few Words of Warning \n• The fact that a Matrix Game requires little infrastructure can be a problem – it just \ndoesn't look sexy and the strengths that it can be done quickly with the minimum of \nfuss, can be reduced by efforts to make it look cool/expensive. \n• The non-quantitative nature of the game can frustrate analysts. \n• Matrix Games require an experience facilitator to run them.",
"inform the other players of their stated intentions. In many cases these are not really \n\"arguments\" as part of the game, so shouldn't count as their action for the turn, unless they \nwish to specify a measurable effect (such as increasing their approval ratings). \nTrade Agreements \nIn some games, trade forms a very important part of the game narrative. In most cases this \ncan be treated simply as part of the normal ebb and flow of the argument process. \nHowever, in some circumstances, particularly when timescales are long, trade can require \ngreater attention as to the nuances of the economic benefits and impacts. In these cases, it \nmay be necessary to get the two sides to make additional arguments as to what they expect",
"possible throughout the game, having “random events” happen completely at random is \nproblematic. An Actor may be disadvantaged purely by chance, more than once during the \ngame, which can reduce their immersion and engagement. The narrative develops during \nthe game based on the decisions of the players and their reactions to the decisions of other \nplayers. Having random events imposed on them by chance breaks this “cause and effect” \ncycle and degrades the game flow. \nThe alternative is to give the random event to the participants. They will then make a \ndecision as to how this can contribute to the narrative being developed by the players. They"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("CalebMaresca/matrix-game-embeddings-ft-v1")
# Run inference
sentences = [
' Why is it important for player roles in a Matrix Game to operate at broadly similar levels?',
'When you are designing a Matrix Game it is worth thinking about the level at which the players roles will be operating in the game. In is usually better, and produces a more balanced game, when the level on which the player roles are operating are broadly similar. It would be difficult to get a balanced game if 3 of the players are playing Generals in command of vast Armies, and another player is playing a simple individual soldier.\n\nLevels of Protection and Hidden Things.',
'Matrix Game Checklist .................................................................................... 38 \nSample Spendable Bonus Cards ...................................................................... 40 \nSample Random Events ................................................................................... 41 \nSample Voting Cards for Diceless Adjudication ............................................... 43 \nSample Estimative Probability Cards ............................................................... 44 \nSample Turn Order Cards ................................................................................ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........ 46',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.9348 |
| cosine_accuracy@3 | 1.0 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.9348 |
| cosine_precision@3 | 0.3333 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.9348 |
| cosine_recall@3 | 1.0 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9702 |
| cosine_mrr@10 | 0.9601 |
| cosine_map@100 | 0.9601 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
What distinguishes "established facts" from other types of facts in the game briefings or play? |
Forces soldiers are going to be much more effective in combat than untrained protestors; |
Why is it unreasonable to assume that unarmed protestors could fight off trained police according to the context? |
Forces soldiers are going to be much more effective in combat than untrained protestors; |
What was the outcome of the initial Russian attack against the German units, and how did it affect the ammunition status of both sides? |
The Russians succeed in pushing back one of the German units and forcing and already depleted unit to use up ammunition, (but are pushed back themselves and 2 units use a lot of ammo (one of which becomes combat ineffective on -3)). Overall, as the success is matched by failure, the line itself holds. The Russians attack again, the next day: |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 10per_device_eval_batch_size: 10num_train_epochs: 10multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 10per_device_eval_batch_size: 10per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | cosine_ndcg@10 |
|---|---|---|
| 1.0 | 37 | 0.9273 |
| 1.3514 | 50 | 0.9490 |
| 2.0 | 74 | 0.9462 |
| 2.7027 | 100 | 0.9527 |
| 3.0 | 111 | 0.9527 |
| 4.0 | 148 | 0.9783 |
| 4.0541 | 150 | 0.9811 |
| 5.0 | 185 | 0.9622 |
| 5.4054 | 200 | 0.9622 |
| 6.0 | 222 | 0.9702 |
| 6.7568 | 250 | 0.9622 |
| 7.0 | 259 | 0.9622 |
| 8.0 | 296 | 0.9702 |
| 8.1081 | 300 | 0.9702 |
| 9.0 | 333 | 0.9702 |
| 9.4595 | 350 | 0.9702 |
| 10.0 | 370 | 0.9702 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Snowflake/snowflake-arctic-embed-l