PsyEmbedding
Collection
4 items • Updated
How to use Culture-and-Morality-Lab/psyembedding-gte-large with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Culture-and-Morality-Lab/psyembedding-gte-large")
sentences = [
"\" My cousin said he share hoes with his brothers. He said sharing is caring and he love his brothers 😂\"",
"Clerical Error Led to Costa Rica's First Legal Gay Marriage",
"UPDATE: The leasing office is going to tape notes on ALL tenants doors in our building as to not single anyone out. However, we were not the first to complain. Our maintenance guy also caught them smoking in the breezeway of our building and told them to cut it out. Hopefully the note helps, if not, I don't care enough to make it a bigger issue and I'm only here for the next 8 months so smoke em if you got em I guess.\n\nAlright bare/bear with me here...\n\nMy husband and I live in a pretty decent apartment complex, we're on the top floor, and we are cool with most of our neighbors. We're 90% sure about which neighbor is smoking the devils lettuce, as it only really started when our next-door neighbors new roommate moved in.\n\nI have nothing against smoking, in any capacity, but it's literally all our apartment has smelled like the last few months. It's starting to permeate our clothes and furniture it's so bad. My husband and I both work for the government (and are drug tested) so this is not ideal. At first (before we realized what was actually going on) we thought a skunk had sprayed near our apartment and was just coming in through a window. Well, it's winter now, no windows are open, and every day we wake up to and come home to SKUNK. We have told the apartment complex about this, and nothing has changed.\n\nWIBTA if I left a note on this neighbors door? Something to the effect of \"Hey fellow apartment dwellers, pot is fun. However, consider blowing the smoke out into a paper towel roll that has a dryer sheet at the end. Or vape it. Or eat it. My entire apartment smells like skunky pot and I'm over it. Love, your neighbors\"\n\nThoughts?\n\nEdit: When we messaged the complex about it, we didn't single out them or their apartment, just said \"someone in our building.\" They actually have since responded saying that they'd \"send a message to our building about it,\" so we'll see what happens! I'll hold off on the note/talking to them in person for now, but if it continues I'm definitely considering buying two smoke buddies and wrapping them up nicely in Christmas paper with a little (not aggressive) card.\n\nEdit 2: Enough people are asking so I wanted to clarify, **I do not live in a state where it is legal.**",
"And just as Kamala Harris is dedicated to building an America for all peoples, so too am I.\n\nThat is why the idea that freedom of religion somehow comes primary is so offensive to the very idea of an America for all.\n\nSome religions will cite their bigotry as having a basis in scripture. That will apply to bigotry hitting the LDS community as well.\n\nAccording to Christian doctrine, [the LDS church is fundamentally heretical](\n\nWithout discrimination protections, a catholic, muslim, jewish or christian business can discriminate against the members of the LDS church without recourse.\n\nThat is why the protection of anti-discrimination measures must inevitably step on the toes of religious bigotry. Religion is cited to justify bigotry. Just as it was by the LDS church."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Besides that which the men brought him that were over the tributes, and the merchants, and they that sold by retail, and all the kings of Arabia, and the governors of the country.',
'If this needs a federal mandate and 100% global consensus, than leaders like Macron should let us renegotiate. As it stands right now, this agreement is 100% toothless. There are no penalties for not following through with it.',
"I don't look for much to come out of government ownership as long as we have Democrats and Republicans.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5648, 0.5502],
# [0.5648, 1.0000, 0.7965],
# [0.5502, 0.7965, 1.0000]])
similarityEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.3879 |
| spearman_cosine | 0.4048 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
He worked at Rothschild as an investment banker. Great. Am I supposed to be alarmed that France elected a technocrat who has worked in the private banking sector? |
Chad runs over the raccoon since it's been bothering him anyway. |
0.3535533905932737 |
Amazing effects for a movie of this time. A primer of the uselessness of war and how war becomes a nurturer of itself.A wonderful thing about this movie is it is now public domain and available at archive.org. No charge, no sign up necessary. Watch it in one sitting and you will be propelled.I plan to share this flick with as many people as possible as I had never heard of it before and I am a hard core sci fi fan.I would like to see how others react to this movie.Watch it.Rate it.Tell us what you think. |
First off, I must say that I made the mistake of watching the Election films out of sequence. I say unfortunately, because after seeing Election 2 first, Election seems a bit of a disappointment. Both films are gangster epics that are similar in form. And while Election is an enjoyable piece of cinema... it's just not nearly as good as it's sequel.In the first Election installment, we are shown the two competitors for Chairman; Big D and Lok. After a few scenes of discussion amongst the "Uncle's" as to who should have the Chairman title, they (almost unanimously) decide That Lok (Simon Yam) will helm the Triads. Suffice to say this doesn't go over very well with competitor Big D (Tony Leung Ka Fai) and in a bid to influence the takeover, Big D kidnaps two of the uncles in order to sway the election board to his side. This has disastrous results and heads the triads into an all out war. Lok is determined to become Chairman but won't become official until he can recover the "Dragon Head ... |
0.7071067811865475 |
MY SINCERE APOLOGIES 2U WHO I'VE OFFENDED WITH ALLEGATIONS OF COMPLACENT COWARDS & ASSHOLES FOR CLIMATE CHANGE INDIFFERENCE! |
yeah man fucking disgusting. as if we didn't waste enough time at work |
1.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32fp16: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | similarity_spearman_cosine |
|---|---|---|---|
| 0.0286 | 10 | - | 0.2006 |
| 0.0571 | 20 | - | 0.2012 |
| 0.0857 | 30 | - | 0.2023 |
| 0.1143 | 40 | - | 0.2036 |
| 0.1429 | 50 | - | 0.2054 |
| 0.1714 | 60 | - | 0.2081 |
| 0.2 | 70 | - | 0.2098 |
| 0.2286 | 80 | - | 0.2115 |
| 0.2571 | 90 | - | 0.2128 |
| 0.2857 | 100 | - | 0.2149 |
| 0.3143 | 110 | - | 0.2177 |
| 0.3429 | 120 | - | 0.2207 |
| 0.3714 | 130 | - | 0.2243 |
| 0.4 | 140 | - | 0.2278 |
| 0.4286 | 150 | - | 0.2310 |
| 0.4571 | 160 | - | 0.2332 |
| 0.4857 | 170 | - | 0.2350 |
| 0.5143 | 180 | - | 0.2361 |
| 0.5429 | 190 | - | 0.2360 |
| 0.5714 | 200 | - | 0.2369 |
| 0.6 | 210 | - | 0.2423 |
| 0.6286 | 220 | - | 0.2533 |
| 0.6571 | 230 | - | 0.2691 |
| 0.6857 | 240 | - | 0.2808 |
| 0.7143 | 250 | - | 0.2889 |
| 0.7429 | 260 | - | 0.2960 |
| 0.7714 | 270 | - | 0.2939 |
| 0.8 | 280 | - | 0.3007 |
| 0.8286 | 290 | - | 0.3010 |
| 0.8571 | 300 | - | 0.3016 |
| 0.8857 | 310 | - | 0.3035 |
| 0.9143 | 320 | - | 0.3078 |
| 0.9429 | 330 | - | 0.3138 |
| 0.9714 | 340 | - | 0.3206 |
| 1.0 | 350 | - | 0.3234 |
| 1.0286 | 360 | - | 0.3299 |
| 1.0571 | 370 | - | 0.3367 |
| 1.0857 | 380 | - | 0.3267 |
| 1.1143 | 390 | - | 0.3307 |
| 1.1429 | 400 | - | 0.3359 |
| 1.1714 | 410 | - | 0.3417 |
| 1.2 | 420 | - | 0.3504 |
| 1.2286 | 430 | - | 0.3324 |
| 1.2571 | 440 | - | 0.3365 |
| 1.2857 | 450 | - | 0.3580 |
| 1.3143 | 460 | - | 0.3622 |
| 1.3429 | 470 | - | 0.3073 |
| 1.3714 | 480 | - | 0.3596 |
| 1.4 | 490 | - | 0.3473 |
| 1.4286 | 500 | 0.1278 | 0.3573 |
| 1.4571 | 510 | - | 0.3539 |
| 1.4857 | 520 | - | 0.3355 |
| 1.5143 | 530 | - | 0.3299 |
| 1.5429 | 540 | - | 0.3559 |
| 1.5714 | 550 | - | 0.3285 |
| 1.6 | 560 | - | 0.3435 |
| 1.6286 | 570 | - | 0.3654 |
| 1.6571 | 580 | - | 0.3824 |
| 1.6857 | 590 | - | 0.3426 |
| 1.7143 | 600 | - | 0.3413 |
| 1.7429 | 610 | - | 0.3395 |
| 1.7714 | 620 | - | 0.3492 |
| 1.8 | 630 | - | 0.3664 |
| 1.8286 | 640 | - | 0.3634 |
| 1.8571 | 650 | - | 0.3392 |
| 1.8857 | 660 | - | 0.3686 |
| 1.9143 | 670 | - | 0.3722 |
| 1.9429 | 680 | - | 0.3557 |
| 1.9714 | 690 | - | 0.3896 |
| 2.0 | 700 | - | 0.3908 |
| 2.0286 | 710 | - | 0.3859 |
| 2.0571 | 720 | - | 0.3536 |
| 2.0857 | 730 | - | 0.3606 |
| 2.1143 | 740 | - | 0.3638 |
| 2.1429 | 750 | - | 0.3713 |
| 2.1714 | 760 | - | 0.3704 |
| 2.2 | 770 | - | 0.3441 |
| 2.2286 | 780 | - | 0.3435 |
| 2.2571 | 790 | - | 0.3668 |
| 2.2857 | 800 | - | 0.3735 |
| 2.3143 | 810 | - | 0.3373 |
| 2.3429 | 820 | - | 0.3474 |
| 2.3714 | 830 | - | 0.3560 |
| 2.4 | 840 | - | 0.3028 |
| 2.4286 | 850 | - | 0.3485 |
| 2.4571 | 860 | - | 0.3604 |
| 2.4857 | 870 | - | 0.3769 |
| 2.5143 | 880 | - | 0.3600 |
| 2.5429 | 890 | - | 0.3916 |
| 2.5714 | 900 | - | 0.3957 |
| 2.6 | 910 | - | 0.3797 |
| 2.6286 | 920 | - | 0.3875 |
| 2.6571 | 930 | - | 0.3978 |
| 2.6857 | 940 | - | 0.3951 |
| 2.7143 | 950 | - | 0.3831 |
| 2.7429 | 960 | - | 0.3912 |
| 2.7714 | 970 | - | 0.3800 |
| 2.8 | 980 | - | 0.3955 |
| 2.8286 | 990 | - | 0.3976 |
| 2.8571 | 1000 | 0.1036 | 0.4048 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}