Instructions to use Culture-and-Morality-Lab/psyembedding-gte-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Culture-and-Morality-Lab/psyembedding-gte-large with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Culture-and-Morality-Lab/psyembedding-gte-large")

sentences = [
"\" My cousin said he share hoes with his brothers. He said sharing is caring and he love his brothers &#128514;\"",
"Clerical Error Led to Costa Rica's First Legal Gay Marriage",
"UPDATE: The leasing office is going to tape notes on ALL tenants doors in our building as to not single anyone out. However, we were not the first to complain. Our maintenance guy also caught them smoking in the breezeway of our building and told them to cut it out. Hopefully the note helps, if not, I don't care enough to make it a bigger issue and I'm only here for the next 8 months so smoke em if you got em I guess.\n\nAlright bare/bear with me here...\n\nMy husband and I live in a pretty decent apartment complex, we're on the top floor, and we are cool with most of our neighbors. We're 90% sure about which neighbor is smoking the devils lettuce, as it only really started when our next-door neighbors new roommate moved in.\n\nI have nothing against smoking, in any capacity, but it's literally all our apartment has smelled like the last few months. It's starting to permeate our clothes and furniture it's so bad. My husband and I both work for the government (and are drug tested) so this is not ideal. At first (before we realized what was actually going on) we thought a skunk had sprayed near our apartment and was just coming in through a window. Well, it's winter now, no windows are open, and every day we wake up to and come home to SKUNK. We have told the apartment complex about this, and nothing has changed.\n\nWIBTA if I left a note on this neighbors door? Something to the effect of \"Hey fellow apartment dwellers, pot is fun. However, consider blowing the smoke out into a paper towel roll that has a dryer sheet at the end. Or vape it. Or eat it. My entire apartment smells like skunky pot and I'm over it. Love, your neighbors\"\n\nThoughts?\n\nEdit: When we messaged the complex about it, we didn't single out them or their apartment, just said \"someone in our building.\" They actually have since responded saying that they'd \"send a message to our building about it,\" so we'll see what happens! I'll hold off on the note/talking to them in person for now, but if it continues I'm definitely considering buying two smoke buddies and wrapping them up nicely in Christmas paper with a little (not aggressive) card.\n\nEdit 2: Enough people are asking so I wanted to clarify, **I do not live in a state where it is legal.**",
"And just as Kamala Harris is dedicated to building an America for all peoples, so too am I.\n\nThat is why the idea that freedom of religion somehow comes primary is so offensive to the very idea of an America for all.\n\nSome religions will cite their bigotry as having a basis in scripture. That will apply to bigotry hitting the LDS community as well.\n\nAccording to Christian doctrine, [the LDS church is fundamentally heretical](\n\nWithout discrimination protections, a catholic, muslim, jewish or christian business can discriminate against the members of the LDS church without recourse.\n\nThat is why the protection of anti-discrimination measures must inevitably step on the toes of religious bigotry. Religion is cited to justify bigotry. Just as it was by the LDS church."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Besides that which the men brought him that were over the tributes, and the merchants, and they that sold by retail, and all the kings of Arabia, and the governors of the country.',
    'If this needs a federal mandate and 100% global consensus, than leaders like Macron should let us renegotiate. As it stands right now, this agreement is 100% toothless. There are no penalties for not following through with it.',
    "I don't look for much to come out of government ownership as long as we have Democrats and Republicans.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5648, 0.5502],
#         [0.5648, 1.0000, 0.7965],
#         [0.5502, 0.7965, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Dataset: similarity
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.3879
spearman_cosine	0.4048

Training Details

Training Dataset

Unnamed Dataset

Size: 11,180 training samples
Columns: sentence_0, sentence_1, and label
Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label
type string string float
details
min: 6 tokens
mean: 104.26 tokens
max: 512 tokens

min: 4 tokens
mean: 118.5 tokens
max: 512 tokens

min: 0.0
mean: 0.53
max: 1.0

	sentence_0	sentence_1	label
type	string	string	float
details	min: 6 tokens mean: 104.26 tokens max: 512 tokens	min: 4 tokens mean: 118.5 tokens max: 512 tokens	min: 0.0 mean: 0.53 max: 1.0

Samples:

sentence_0	sentence_1	label
`He worked at Rothschild as an investment banker. Great. Am I supposed to be alarmed that France elected a technocrat who has worked in the private banking sector? I also don't give a shit about what macron does in his personal life. Clearly the French people don't either.`	`Chad runs over the raccoon since it's been bothering him anyway.`	`0.3535533905932737`
Amazing effects for a movie of this time. A primer of the uselessness of war and how war becomes a nurturer of itself.A wonderful thing about this movie is it is now public domain and available at archive.org. No charge, no sign up necessary. Watch it in one sitting and you will be propelled.I plan to share this flick with as many people as possible as I had never heard of it before and I am a hard core sci fi fan.I would like to see how others react to this movie.Watch it.Rate it.Tell us what you think.	First off, I must say that I made the mistake of watching the Election films out of sequence. I say unfortunately, because after seeing Election 2 first, Election seems a bit of a disappointment. Both films are gangster epics that are similar in form. And while Election is an enjoyable piece of cinema... it's just not nearly as good as it's sequel.In the first Election installment, we are shown the two competitors for Chairman; Big D and Lok. After a few scenes of discussion amongst the "Uncle's" as to who should have the Chairman title, they (almost unanimously) decide That Lok (Simon Yam) will helm the Triads. Suffice to say this doesn't go over very well with competitor Big D (Tony Leung Ka Fai) and in a bid to influence the takeover, Big D kidnaps two of the uncles in order to sway the election board to his side. This has disastrous results and heads the triads into an all out war. Lok is determined to become Chairman but won't become official until he can recover the "Dragon Head ...	`0.7071067811865475`
`MY SINCERE APOLOGIES 2U WHO I'VE OFFENDED WITH ALLEGATIONS OF COMPLACENT COWARDS & ASSHOLES FOR CLIMATE CHANGE INDIFFERENCE!`	`yeah man fucking disgusting. as if we didn't waste enough time at work`	`1.0`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
fp16: True
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	similarity_spearman_cosine
0.0286	10	-	0.2006
0.0571	20	-	0.2012
0.0857	30	-	0.2023
0.1143	40	-	0.2036
0.1429	50	-	0.2054
0.1714	60	-	0.2081
0.2	70	-	0.2098
0.2286	80	-	0.2115
0.2571	90	-	0.2128
0.2857	100	-	0.2149
0.3143	110	-	0.2177
0.3429	120	-	0.2207
0.3714	130	-	0.2243
0.4	140	-	0.2278
0.4286	150	-	0.2310
0.4571	160	-	0.2332
0.4857	170	-	0.2350
0.5143	180	-	0.2361
0.5429	190	-	0.2360
0.5714	200	-	0.2369
0.6	210	-	0.2423
0.6286	220	-	0.2533
0.6571	230	-	0.2691
0.6857	240	-	0.2808
0.7143	250	-	0.2889
0.7429	260	-	0.2960
0.7714	270	-	0.2939
0.8	280	-	0.3007
0.8286	290	-	0.3010
0.8571	300	-	0.3016
0.8857	310	-	0.3035
0.9143	320	-	0.3078
0.9429	330	-	0.3138
0.9714	340	-	0.3206
1.0	350	-	0.3234
1.0286	360	-	0.3299
1.0571	370	-	0.3367
1.0857	380	-	0.3267
1.1143	390	-	0.3307
1.1429	400	-	0.3359
1.1714	410	-	0.3417
1.2	420	-	0.3504
1.2286	430	-	0.3324
1.2571	440	-	0.3365
1.2857	450	-	0.3580
1.3143	460	-	0.3622
1.3429	470	-	0.3073
1.3714	480	-	0.3596
1.4	490	-	0.3473
1.4286	500	0.1278	0.3573
1.4571	510	-	0.3539
1.4857	520	-	0.3355
1.5143	530	-	0.3299
1.5429	540	-	0.3559
1.5714	550	-	0.3285
1.6	560	-	0.3435
1.6286	570	-	0.3654
1.6571	580	-	0.3824
1.6857	590	-	0.3426
1.7143	600	-	0.3413
1.7429	610	-	0.3395
1.7714	620	-	0.3492
1.8	630	-	0.3664
1.8286	640	-	0.3634
1.8571	650	-	0.3392
1.8857	660	-	0.3686
1.9143	670	-	0.3722
1.9429	680	-	0.3557
1.9714	690	-	0.3896
2.0	700	-	0.3908
2.0286	710	-	0.3859
2.0571	720	-	0.3536
2.0857	730	-	0.3606
2.1143	740	-	0.3638
2.1429	750	-	0.3713
2.1714	760	-	0.3704
2.2	770	-	0.3441
2.2286	780	-	0.3435
2.2571	790	-	0.3668
2.2857	800	-	0.3735
2.3143	810	-	0.3373
2.3429	820	-	0.3474
2.3714	830	-	0.3560
2.4	840	-	0.3028
2.4286	850	-	0.3485
2.4571	860	-	0.3604
2.4857	870	-	0.3769
2.5143	880	-	0.3600
2.5429	890	-	0.3916
2.5714	900	-	0.3957
2.6	910	-	0.3797
2.6286	920	-	0.3875
2.6571	930	-	0.3978
2.6857	940	-	0.3951
2.7143	950	-	0.3831
2.7429	960	-	0.3912
2.7714	970	-	0.3800
2.8	980	-	0.3955
2.8286	990	-	0.3976
2.8571	1000	0.1036	0.4048

Framework Versions

Python: 3.11.9
Sentence Transformers: 5.1.0
Transformers: 4.49.0
PyTorch: 2.8.0+cu128
Accelerate: 1.10.0
Datasets: 2.14.4
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 1

Safetensors

Model size

0.3B params

Tensor type

F32

Collection including Culture-and-Morality-Lab/psyembedding-gte-large

PsyEmbedding

Collection

4 items • Updated Dec 17, 2025

Paper for Culture-and-Morality-Lab/psyembedding-gte-large

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13

Evaluation results

Pearson Cosine on similarity
self-reported

0.388
Spearman Cosine on similarity
self-reported

0.405