metadata
base_model: manuel-couto-pintos/roberta_erisk
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:30288
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
Looks like a small cockroach, but much more colorful, 0.75" long.
[Atlanta, Georgia]
sentences:
- >-
Help me win a bet: What size gi does Marcelo Garcia wear? I suspect he
uses different size pants relative to the gi-top because of his epic
thighs relative to stature. My buddy just says A2 all around (on
average, recognizing that it varies by brand). What do you say?
- 'What little things about the Star Wars Universe do you love? '
- >-
Looks like a small cockroach, but much more colorful, 0.75" long.
[Atlanta, Georgia]
- source_sentence: >-
Clogged Construction on my brand new condo finished this summer. Not
wasting a second, I broke lease on my musky apartment, and moved in as
soon as possible. I rather enjoyed knowing I was the first resident living
here: there was no wear and tear, no smoke stains on the walls, and no
damage to the structure. The only issue was a light clattering sound
whenever I used the commercial sink in my laundry room. I rarely used it,
so I didn't bring up the problem to the contractors. Everything else
worked perfectly, and my home was as sterile as an operating table.
nbsp;
After a few months, I began noticing water pooling at the foot of my
shower. The drain must have been clogged. I took to my tools, unscrewed
the shower drain, and peered inside. I could see a collection of fibers
bunched up in the pipes. Reaching in with an unfolded coat hanger, I
pulled out mountains of dirty blond hair clogging the pipes. I live alone,
I don't have any pets, I haven't entertained a lady in over a year, and
I've been bald since I was 27.
nbsp;
The odd phenomena got me thinking about the sink in the laundry room. I
detached the aerator, placed my hand under the faucet, and turned on the
water. Dozens of molars came flying out, slipping through my fingers and
into the sink, bouncing up and down until ultimately falling down the
drain.
nbsp;
On a completely unrelated note: I have a beautiful, fully furnished,
barely-used condo for sale. Located in downtown Detroit. Anyone
interested?
sentences:
- >-
3-2 defense cannot stop corner 3s? Does anyone else have this problem?
My down low guys won't kick out to even try to defend an open 3 shot,
and the computer just spams this on me all day when I play offline.
- >-
tw.being suicidal but knowing someone whos commit is the worst thing in
the world. bc you see both sides. you see how it affects the people that
love that person. including yourself. you see how it doesnt end the pain
but it just passes it on to all the people who are left to deal with it.
but then it also makes it so much more understandable as to why someone
did it. you know what its like to want the pain to end. the feeling of
your brain sabotaging you and your happiness constantly. to stop feeling
like youre drowning in yourself. you get each and every point to it. and
in a sense it makes me feel even more guilty for ever having the thought
in the first place. for it becoming my safe space. knowing that if
things dont fall into place that im okay with not being here anymore but
not being okay leaving the people you love to clean up the mess / carry
it with them for the rest of their lives. sorry. end rant.
- >-
Clogged Construction on my brand new condo finished this summer. Not
wasting a second, I broke lease on my musky apartment, and moved in as
soon as possible. I rather enjoyed knowing I was the first resident
living here: there was no wear and tear, no smoke stains on the walls,
and no damage to the structure. The only issue was a light clattering
sound whenever I used the commercial sink in my laundry room. I rarely
used it, so I didn't bring up the problem to the contractors. Everything
else worked perfectly, and my home was as sterile as an operating table.
nbsp;
After a few months, I began noticing water pooling at the foot of my
shower. The drain must have been clogged. I took to my tools, unscrewed
the shower drain, and peered inside. I could see a collection of fibers
bunched up in the pipes. Reaching in with an unfolded coat hanger, I
pulled out mountains of dirty blond hair clogging the pipes. I live
alone, I don't have any pets, I haven't entertained a lady in over a
year, and I've been bald since I was 27.
nbsp;
The odd phenomena got me thinking about the sink in the laundry room. I
detached the aerator, placed my hand under the faucet, and turned on the
water. Dozens of molars came flying out, slipping through my fingers and
into the sink, bouncing up and down until ultimately falling down the
drain.
nbsp;
On a completely unrelated note: I have a beautiful, fully furnished,
barely-used condo for sale. Located in downtown Detroit. Anyone
interested?
- source_sentence: 'Top 10 Movies Trailers of 2017 Must watch It '
sentences:
- >-
Im on coke n 2 mg kpin and im anxious as fuckIdk what i can do to get
rid of this i know coke doesnt last long but the anxietys lingering n
the kpins are keeping me borderline okay, but I've never been this
anxious on coke i feel like im on a psychedelic having a bad trip but im
not tripping its just the anxiety. Can anyone help me thru this
- '[Giveaway] 10 BTS for new users '
- 'Top 10 Movies Trailers of 2017 Must watch It '
- source_sentence: >-
Vet says he nearly operated on himself when VA wouldn't pay medical
bill.
sentences:
- 'What kind of soap is best to get glitter off your skin? '
- 'Alvvays is nearly done tracking their next album '
- >-
Vet says he nearly operated on himself when VA wouldn't pay medical
bill.
- source_sentence: Age old questions[View Poll](https://www.reddit.com/poll/m89hf3)
sentences:
- >-
GUYS I MIGHT HAVE TO DELETE THIS ACCOUNT BECAUSE MY BF KNOWS MY ACC BUT
I DON'T WANT TO IT'S A MASSIVE URGENCE I'VE HAD THIS 3 YEARS So
basically me and my boyfriend was messing around but he decided to go
onto my reddit app and he "accidently" saw my reddit account name and he
said that he's not going to look cause he knows he won't like what he
sees but GUYS my post history is fucked i'm fucked it makes me look more
fucked then I am what the fuck do i dooooo D:
I don't wanna start over and there's a couple of subreddits that are
suscriber only so how the fuck am i gonna get back
he's said he's been curious about this before but he knows the sorta
stuff i post and he said it would really upset him but when he's curios
he usally won't stop wondering but I like to think that i can trust him
but I''m complety FUCKED.
apparently he forgot it too but he has good memory
- Age old questions[View Poll](https://www.reddit.com/poll/m89hf3)
- >-
Who else is in a opposite gender dominated industry? What have been your
experiences? I am a female in IT. I chose this field because I enjoy it,
and it turns out I am good at it. I am not concerned about the gender
bias because I feel my qualifications and experience speak for
themselves, and so far that has been the case (the only time I have been
discriminated against it has not affected my career progress). However,
I'm relatively inexperienced and I would love to know other people's
experiences in similar environments.
SentenceTransformer based on manuel-couto-pintos/roberta_erisk
This is a sentence-transformers model finetuned from manuel-couto-pintos/roberta_erisk. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: manuel-couto-pintos/roberta_erisk
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("manuel-couto-pintos/roberta_erisk_simcse")
# Run inference
sentences = [
'Age old questions[View Poll](https://www.reddit.com/poll/m89hf3)',
'Age old questions[View Poll](https://www.reddit.com/poll/m89hf3)',
"Who else is in a opposite gender dominated industry? What have been your experiences? I am a female in IT. I chose this field because I enjoy it, and it turns out I am good at it. I am not concerned about the gender bias because I feel my qualifications and experience speak for themselves, and so far that has been the case (the only time I have been discriminated against it has not affected my career progress). However, I'm relatively inexperienced and I would love to know other people's experiences in similar environments. ",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 30,288 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 9 tokens
- mean: 84.36 tokens
- max: 512 tokens
- min: 9 tokens
- mean: 84.36 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 Actor Cory Monteith, Who Played Finn Hudson On 'Glee,' Found DeadActor Cory Monteith, Who Played Finn Hudson On 'Glee,' Found DeadIs the AW3420DW worth double the cost of a $500 monitor?I've been researching ultrawides and wanted to know people's opinion if the extra cost for the Alienware AW3420DW ($999) was worth the extra over say a AOC CU34G2X ($449) or BenQ EX3501R ($649) or another monitor in that range? If I'm willing to spend the cash for the Alienware, should I just make the leap?Is the AW3420DW worth double the cost of a $500 monitor?I've been researching ultrawides and wanted to know people's opinion if the extra cost for the Alienware AW3420DW ($999) was worth the extra over say a AOC CU34G2X ($449) or BenQ EX3501R ($649) or another monitor in that range? If I'm willing to spend the cash for the Alienware, should I just make the leap?My first time making it to a week! Awesome! Nothing to say, just felt like sharing(: Have a good day!
EDIT: Oh my gosh, I meant to say month... Woops.My first time making it to a week! Awesome! Nothing to say, just felt like sharing(: Have a good day!
EDIT: Oh my gosh, I meant to say month... Woops. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 10per_device_eval_batch_size: 10num_train_epochs: 1multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 10per_device_eval_batch_size: 10per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.1651 | 500 | 0.8614 |
| 0.3301 | 1000 | 0.0012 |
| 0.4952 | 1500 | 0.0007 |
| 0.6603 | 2000 | 0.0002 |
| 0.8254 | 2500 | 0.0002 |
| 0.9904 | 3000 | 0.0 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.0.1
- Transformers: 4.44.2
- PyTorch: 2.0.1+cu117
- Accelerate: 0.32.0
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}