Culture-and-Morality-Lab commited on
Commit
c5bd80d
·
verified ·
1 Parent(s): f09aed6

Upload 10 files

Browse files
README.md ADDED
@@ -0,0 +1,662 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:11180
9
+ - loss:CosineSimilarityLoss
10
+ widget:
11
+ - source_sentence: 'I really shy away from early voting narratives, which I got into
12
+ in another question thats posted here (I think!) The polls could absolutely be
13
+ missing a meaningful chunk of Trumps support, as was the case in 16 and 20. Though
14
+ its worth noting that he is polling much better this time around than the last
15
+ two times. That could be an indication that the issue has been resolved, through
16
+ some combination of new polling methods and more willingness of Trump supporters
17
+ to answer polls and state their support for him. (I talked with one pollster who
18
+ believes he missed in 20 because theyd get Trump voters on the phone but, for
19
+ whatever reason, they wouldnt say they were voting for him. Now when voters hesitate,
20
+ this pollster nudges them a little to pick a candidate. He thinks its fixed the
21
+ problem. Well see.)
22
+
23
+
24
+
25
+
26
+ I did write about some of the polling issues here:
27
+
28
+
29
+ ['
30
+ sentences:
31
+ - Those are ways to cope individually but to actually abolish the institution of
32
+ capitalist employment relationships there has to be collective action. You can
33
+ call it however you want, socialism/communism happens to be a banner that people
34
+ who held these ideas have been fighting under for many many years.
35
+ - everyones scared shitless of trump winning. every single one of us is voting again
36
+ and then add in new voters and R voters going D and its a landslide win.
37
+ - I know, that's why I said one of the reasons. The main reason is his uselessness
38
+ when your team is bad
39
+ - source_sentence: This movie is exactly the same as Ridley Scott's, "Someone To Watch
40
+ Over Me", which is a classic. It stars Tom Berenger and Mimi Rogers and the bad
41
+ guy from the Fugitive. It's a quality movie, with good direction and great acting.
42
+ This movie is the polar opposite. It's the same plot, just minus any good acting
43
+ and adding TV movie direction. I do have to say I like the interaction between
44
+ Lowe and the woman. She's hot and their romance is believable on many levels.
45
+ Unfortunately, that alone cannot save this carbon copy of the classic Ridley Scott
46
+ film.Usually crappy TV movies have some redeeming quality that makes them watchable
47
+ at the very least. I think for me, this film's redeeming value is that Rob Lowe
48
+ and the lead female have chemistry. That and I was able to watch and compare the
49
+ plot to another quality film.
50
+ sentences:
51
+ - The real war will never get in the books.
52
+ - It's amazing to see how Nikhil Advani manages to attract people to the theater
53
+ till the very day of the release. I mean..... look at the cast here , the promotion
54
+ is superb, good enough songs and the trailers are fine. This makes it a house
55
+ full on the first day, but it's only when people go and see the film they realize
56
+ that there is no way their money is refundable. House full the first day , the
57
+ movie is out the next week. This film, inspired by 'Love Actually' is what they
58
+ say, didn't manage to handle the whole cast well. They tried to put in big stars
59
+ but ended up by not even managing to bring out even an average performance by
60
+ any one. The stories are hollow and cheesy, so the audience can't connect with
61
+ any single one of them. It's a big disappointment to all those who like big stars
62
+ or for that matter Nikhil Advani after his big success of 'Kal Ho Na Ho'.
63
+ - I can't believe this movie has an average rating of 7.0! It is a fiendishly bad
64
+ movie, and I saw it when it was fairly new, and I was in the age group that is
65
+ supposed to like it!
66
+ - source_sentence: Nationalism is a silly cock crowing on his own dunghill.
67
+ sentences:
68
+ - "I can't even go to a PC hardware sub and not see political stuff. Hell even dog\
69
+ \ subs have politics now.\n\n There was a thread with a female BLM supporter subtitling\
70
+ \ her dog's barks to saying that it hates racists and bigots and supports equality.\
71
+ \ Bloody ridiculous."
72
+ - Nov. 10, 2015 - Gaily Grind - Number of gay Americans in same-sex marriages closing
73
+ in on 1 million -
74
+ - 'The 1970''s saw a rise and fall of what we have come to know as "Blacksploitation"
75
+ Films. The term is a reference to kind of broad catch-all, rather than a true
76
+ Genre of Film. In short, any comedy, drama, adventure, western or urban cops &
77
+ robbers shoot-em-up, that are so constructed and so cast as to appeal to the large
78
+ Urban Black population of the Mid 20th Century. That indeed could embrace the
79
+ widest type of films, as long as the had a slant toward the inner-city black population.It
80
+ appears that the idea of producing these films of particularly keen interest to
81
+ Black Americans had its genesis with the Eastertime Release of 100 RIFLES (Marvin
82
+ Schwartz Prod./20th Century-Fox, 1969). In it, former Syracuse University All-American
83
+ Footballer and Several Times All-Pro Fullback for the Cleveland Browns, Jim Brown,
84
+ had a Co-Starring Billing. Having appeared in a number of films already, as for
85
+ example, RIO CONCHOS (1964),THE DIRTY DOZEN (1967), (ICE STTION ZEBRA (1968)*
86
+ and others, it was beginning to make more sense to the Studios'' "Suits" that
87
+ Jim was a hot property.Now this 100 RIFLES brings record numbers of Black patrons
88
+ to the Big Cities'' central business districts on Easter Sunday to view Mr. Brown.
89
+ Why not start to film more of these adventure epics and other types of film with
90
+ more Black Players and Stars? Why not, indeed.** So we saw a succession of Cops
91
+ & Robbers, Bad-ass Private Detective Films, Comedies, all going the route. Along
92
+ the way, we eventually got to some more family oriented, wider appealing films.
93
+ The movie goers were treated to SOUNDER (1972), THE TAKE (1974), CONRACK (1974)and,
94
+ ultimately, CLAUDINE (1974).In CLAUDINE, we find no stigma nor easy classification
95
+ as being "Blackploitation", as the story is universal, and could easily have been
96
+ done as a story about people of any descent, any where, and not just in the 1970''s
97
+ USA.That the story was done of a SINGLE mother, Claudine (Dianne Carroll), struggling
98
+ to keep a family together after "....two marriages and two almost marriages.",
99
+ is a far cry from a shoot-em-up Harlem Style. The problems that plague the everyday
100
+ citizens of our nation are confronted and examined under the ol'' sociological
101
+ microscope.But we also consider Claudine''s psychological and physical needs as
102
+ a female. For "Woman Needs Man and Man Must Have His MATE",***and we do concede
103
+ this point. (That''s S-E-X that we''re talking about, Schultz!) Claudine meets
104
+ up with a very masculine, broad shouldered, athletic type in Private Scavanger
105
+ Garbage Man, Ruppert B. Marshall (James Earl Jones) and they go on a date.The
106
+ Great Welfare State intervenes with the Couple as Claudine''s Welfare Case Worker,
107
+ Miss Tayback (Elisa Loti), comes snooping around to see just who is this unattached
108
+ Male, who is suddenly paying so much attention to Claudine''s family.After a humiliating
109
+ experience with the Welfare Bureau''s auditing and "deducting" binge, which would
110
+ be the norm for the family, the two decide to get married with or without the
111
+ blessing of Big Brother.Meanwhile, Claudine''s elder son has gotten involved with
112
+ some big talking but little doing Black Activist group. But, with Ruppert''s help,
113
+ he and they all come through it A.O.K.It ends on a Happy, Upbeat and Hopeful note.
114
+ We know that it may not be exactly "...Happily Ever After!", but rather the''ll
115
+ make it all together! If there is a single criticism that we must state it is
116
+ that sometimes in a movie like this, a misconception is spread to a large portion
117
+ of Urban Blacks. And that is, the apparent implied myth that all Whites are wealthy,
118
+ having none of their kind ever in need of a helping hand, out of work or suffering
119
+ any disabilities.Well, folks, it just ain''t true! NOTE: * At one point, Jim Brown''s
120
+ career was a real hit as a rugged actioner. He was even being tauted as "...The
121
+ Black John Wayne." NOTE: ** The idea of producing films with All-Black Casts,
122
+ filmed for All-Black consumption was not a new idea. In the 1920''s, ''30''s and
123
+ ''40''s, we saw productions from people like Noble Johnson, Spencer Williams,
124
+ Jr. and Rex Ingram.NOTE: *** That''s "As Time Goes By", you know, Schultz, it''s
125
+ from CASABLANCA (Warner Brothers, 1942).'
126
+ - source_sentence: I think this movie has got it all. It has really cool music that
127
+ I can never get out of my head. It has cool looking characters. IS REALLY funny(you
128
+ know, the kind that you'll crack up on the ground and you'll keep saying the funny
129
+ parts over every day for three weeks).Despite the bad acting, bad cgi, and bad
130
+ story(about cops going after a robot), its really cool. Its one of those movies
131
+ you and all of your family can watch, get together, eat pizza, laugh like crazy,
132
+ and watch it two more times.There are so many funny parts, like when Kurt was
133
+ trying to get Edison's attention and gave him the finger, and then threw a paint
134
+ ball gun at him so they could play paint ball. On that part, I kept saying "Remember,
135
+ Remember?"to my cousins who saw it and showed them what happened. There was also
136
+ a really funny part when Edision ran into the room and Kurt was there(just before
137
+ they fought) and Kurt was talking about his "Strange dream" and how he was "Superman".
138
+ I LOVED that part, although it has been a while since I saw it, so I don't remember
139
+ that part. Everything the actors said were funny, like how Kurt says, "I worship
140
+ you, like a GOD!" to the robot.Although there was some bad things, in all it was
141
+ a GREAT movie. Man, I can't stop laughing. I wish I had that movie. );
142
+ sentences:
143
+ - As I looked at this movie once again, I think it belongs among Hitchcock's greatest
144
+ films. The first time I saw it I was just blown away by the suspense, action and
145
+ imagery. It has the gripping ending, the deranged murderer, the innocent man framed
146
+ or victimized by circumstances, some great on-location shots, e.g. the Jefferson
147
+ Memorial in Washington and Penn Central in New York. It also has great supporting
148
+ actors with Hitch's daughter Patricia in the role of the younger sister to Ruth
149
+ Roman and the stalwart Leo G. Carroll in another of his Hitchcock movies. The
150
+ merry-go-round episode near the end is one of the most nerve-wracking in Hitchcock's
151
+ body of work.Robert Walker as Bruno Anthony (his last full film) gives a great
152
+ performance as the deranged stranger on the train, who worms himself into the
153
+ life of the unsuspecting tennis star, Guy Haines (Farley Granger). Granger plays
154
+ the nice guy who is caught up in a messy divorce. The movie opens with the camera
155
+ showing the shoes of two separate men as they leave their taxis to board the train.
156
+ Eventually, they meet and the story takes over. The stranger takes an unusual
157
+ interest in the tennis star and as the movie continues,the stranger becomes a
158
+ stalker. The action shifts from place to place, including Washington, the fictional
159
+ small town of Metcalf, the Forest Hills tennis championship, and a passenger train
160
+ taking the two leading men back and forth on separate missions. Towards the end,
161
+ the pace of a tennis game is woven into the plot as they race against time. The
162
+ camera cuts away to the faces of the athletes as they volley and serve in a remarkable
163
+ series of shots. When the closely-fought contest is over, the climactic chase
164
+ takes place. Hitchcock has a love for trains and it is great to see Penn Station,
165
+ long since gone. Trains are featured in the 39 Steps, the Lady Vanishes, Shadow
166
+ of a Doubt, Spellbound, North by Northwest and this movie. This classic Hitchcock
167
+ thriller took place at the start of a period of great creativity for the master
168
+ of suspense - the 1950's and I am convinced that one day it will be given its
169
+ due in the Hitchcock hall of fame.
170
+ - "My RfA (reprise) \n\nWell, it's been a week now that I've been an administrator\
171
+ \ and I'd like to take this moment to once again thank everyone who supported\
172
+ \ my RfA, and to let you all know that I don't think I've screwed anything up\
173
+ \ yet so I hope I'm living up to everyone's expectations for me. But if I ever\
174
+ \ fall short of those expectations, I'd certainly welcome folks telling me about\
175
+ \ it!"
176
+ - The first few minutes of this movie don't do it justice!For me, its not funny
177
+ until they board the sub and those hilarious characters begin to gel. I was born
178
+ and raised in Norfolk Virginia and met my share of "different" sailors- I even
179
+ married one! Most of my favorite movies are just funny, not topical, not dependent
180
+ on sex or violence and funny every time I see them. Groundhog Day, Bruce Almighty
181
+ and Down Periscope are still funny even after I know the dialog by heart. Kelsey
182
+ Grammar with his "God I LOVE this job!"was sincere, genuine and lovable. Rob Schneider
183
+ is hysterical as the crew gets back at him for being annoying. I am still amazed
184
+ at the size of that fishing boat next to a sub! I can see why folks who live this
185
+ life would notice the uh-oh's but its not a documentary after all its a comedy
186
+ and I just love it!
187
+ - source_sentence: Every American poet feels that the whole responsibility for contemporary
188
+ poetry has fallen upon his shoulders, that he is a literary aristocracy of one.
189
+ sentences:
190
+ - 'Many thousands of youth have been deprived of the benefit of education thereby,
191
+ their morals ruined, and talents irretrievably lost to society, for want of cultivation:
192
+ while two parties have been idly contending who should bestow it.'
193
+ - I'm not sure why Spike Lee made this train wreck of a movie and conned poor Stevie
194
+ Wonder into eternally pairing his beautiful music with this theatrical mess. I
195
+ also resent the way he uses profanity as a part of the normal prose of professional
196
+ Blacks. The abuse of his hold on ethnic movie goers is a shame. Scenes which seem
197
+ to be contrived out the blue and have nothing to do with the theme or sub themes,
198
+ play as if some college kid wrote this. I especially detest the ludicrous scene
199
+ where the two leads are playfully sparring for no reason at all and the cops come
200
+ and rough up Snipes. The overacting of the leads makes one feel as if Spike has
201
+ no respect for his viewers or he has no clue what a movie is all about. The final
202
+ scene appears to be thrown in to justify the use of a sledge hammer to tack a
203
+ point in. This movie also supports the myth that all people of culture use the
204
+ F-word in casual conversation. I am hoping he will realize that the rest of his
205
+ movies are in the same pool as this one where he is not growing as a film maker.
206
+ I think his union with Scorcesee in Clockers was a wise move. He should stick
207
+ to making documentaries like the Four Little Colored Girls. Shock movies do not
208
+ an Oscar make.
209
+ - I was a barman in the UK when it came out and loved it so much I'd pop out of
210
+ the bar to feed the Jukebox to keep it playing over and over. Never knew she was
211
+ only 14 or 15 at the time.
212
+ pipeline_tag: sentence-similarity
213
+ library_name: sentence-transformers
214
+ metrics:
215
+ - pearson_cosine
216
+ - spearman_cosine
217
+ model-index:
218
+ - name: SentenceTransformer
219
+ results:
220
+ - task:
221
+ type: semantic-similarity
222
+ name: Semantic Similarity
223
+ dataset:
224
+ name: similarity
225
+ type: similarity
226
+ metrics:
227
+ - type: pearson_cosine
228
+ value: 0.36937547630571615
229
+ name: Pearson Cosine
230
+ - type: spearman_cosine
231
+ value: 0.39006034741410334
232
+ name: Spearman Cosine
233
+ ---
234
+
235
+ # SentenceTransformer
236
+
237
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
238
+
239
+ ## Model Details
240
+
241
+ ### Model Description
242
+ - **Model Type:** Sentence Transformer
243
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
244
+ - **Maximum Sequence Length:** 512 tokens
245
+ - **Output Dimensionality:** 1024 dimensions
246
+ - **Similarity Function:** Cosine Similarity
247
+ <!-- - **Training Dataset:** Unknown -->
248
+ <!-- - **Language:** Unknown -->
249
+ <!-- - **License:** Unknown -->
250
+
251
+ ### Model Sources
252
+
253
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
254
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
255
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
256
+
257
+ ### Full Model Architecture
258
+
259
+ ```
260
+ SentenceTransformer(
261
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
262
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
263
+ )
264
+ ```
265
+
266
+ ## Usage
267
+
268
+ ### Direct Usage (Sentence Transformers)
269
+
270
+ First install the Sentence Transformers library:
271
+
272
+ ```bash
273
+ pip install -U sentence-transformers
274
+ ```
275
+
276
+ Then you can load this model and run inference.
277
+ ```python
278
+ from sentence_transformers import SentenceTransformer
279
+
280
+ # Download from the 🤗 Hub
281
+ model = SentenceTransformer("sentence_transformers_model_id")
282
+ # Run inference
283
+ sentences = [
284
+ 'Every American poet feels that the whole responsibility for contemporary poetry has fallen upon his shoulders, that he is a literary aristocracy of one.',
285
+ "I'm not sure why Spike Lee made this train wreck of a movie and conned poor Stevie Wonder into eternally pairing his beautiful music with this theatrical mess. I also resent the way he uses profanity as a part of the normal prose of professional Blacks. The abuse of his hold on ethnic movie goers is a shame. Scenes which seem to be contrived out the blue and have nothing to do with the theme or sub themes, play as if some college kid wrote this. I especially detest the ludicrous scene where the two leads are playfully sparring for no reason at all and the cops come and rough up Snipes. The overacting of the leads makes one feel as if Spike has no respect for his viewers or he has no clue what a movie is all about. The final scene appears to be thrown in to justify the use of a sledge hammer to tack a point in. This movie also supports the myth that all people of culture use the F-word in casual conversation. I am hoping he will realize that the rest of his movies are in the same pool as this one where he is not growing as a film maker. I think his union with Scorcesee in Clockers was a wise move. He should stick to making documentaries like the Four Little Colored Girls. Shock movies do not an Oscar make.",
286
+ 'Many thousands of youth have been deprived of the benefit of education thereby, their morals ruined, and talents irretrievably lost to society, for want of cultivation: while two parties have been idly contending who should bestow it.',
287
+ ]
288
+ embeddings = model.encode(sentences)
289
+ print(embeddings.shape)
290
+ # [3, 1024]
291
+
292
+ # Get the similarity scores for the embeddings
293
+ similarities = model.similarity(embeddings, embeddings)
294
+ print(similarities)
295
+ # tensor([[1.0000, 0.6358, 0.5940],
296
+ # [0.6358, 1.0000, 0.4347],
297
+ # [0.5940, 0.4347, 1.0000]])
298
+ ```
299
+
300
+ <!--
301
+ ### Direct Usage (Transformers)
302
+
303
+ <details><summary>Click to see the direct usage in Transformers</summary>
304
+
305
+ </details>
306
+ -->
307
+
308
+ <!--
309
+ ### Downstream Usage (Sentence Transformers)
310
+
311
+ You can finetune this model on your own dataset.
312
+
313
+ <details><summary>Click to expand</summary>
314
+
315
+ </details>
316
+ -->
317
+
318
+ <!--
319
+ ### Out-of-Scope Use
320
+
321
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
322
+ -->
323
+
324
+ ## Evaluation
325
+
326
+ ### Metrics
327
+
328
+ #### Semantic Similarity
329
+
330
+ * Dataset: `similarity`
331
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
332
+
333
+ | Metric | Value |
334
+ |:--------------------|:-----------|
335
+ | pearson_cosine | 0.3694 |
336
+ | **spearman_cosine** | **0.3901** |
337
+
338
+ <!--
339
+ ## Bias, Risks and Limitations
340
+
341
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
342
+ -->
343
+
344
+ <!--
345
+ ### Recommendations
346
+
347
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
348
+ -->
349
+
350
+ ## Training Details
351
+
352
+ ### Training Dataset
353
+
354
+ #### Unnamed Dataset
355
+
356
+ * Size: 11,180 training samples
357
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
358
+ * Approximate statistics based on the first 1000 samples:
359
+ | | sentence_0 | sentence_1 | label |
360
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:---------------------------------------------------------------|
361
+ | type | string | string | float |
362
+ | details | <ul><li>min: 6 tokens</li><li>mean: 107.53 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 111.53 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.53</li><li>max: 1.0</li></ul> |
363
+ * Samples:
364
+ | sentence_0 | sentence_1 | label |
365
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
366
+ | <code>I AM ANGRY AT YOU BILLJ! YOU GOT PEOPLE BLOCKED FOR AS LONG AS YOU LIVE! I ASKED YOU TO STOP DELETING MY EDITS OR I WILL BLOCK YOU FOR ALl EONS YOU ASSHOLE! WIKIPEDIA IS NOT CENSORED SO STOP REMOVING MY FUCKING MESSAGES OR I WILL BEAT YOU UP SILLY!</code> | <code>The thing is i don't see any shyness from people supporting far right anymore. The life of avg Joe became signicantly shittier after covid and global conflicts. They are very vocal about their distaste. And they blame lefties and immigrants for their problems. So they are vocal and very organized.<br><br>Also most of the public already act demented and noone remembers all the moronic stuff Trump pulled during his presidency.<br><br>I don't see much reason for them to be shy about.</code> | <code>0.4082482904638631</code> |
367
+ | <code>I understand that you may be confused, but you still shouldn't judge someone's sexual identity. Just because they haven't acted on all of their sexual inclinations, doesn't mean that they don't still have those feelings. Accept others as they present themselves.</code> | <code>The head of the Mormon church has married same sex couples in the temple because they were close family. It's all about $$$.</code> | <code>0.5773502691896258</code> |
368
+ | <code>Ugh there is so many bad decisions by conservative judges that need to be undone.</code> | <code>you say waste a draft pick on Manziel when we have Mallet. that's why I'm telling you to delete your account. You're retarded</code> | <code>0.5773502691896258</code> |
369
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
370
+ ```json
371
+ {
372
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
373
+ }
374
+ ```
375
+
376
+ ### Training Hyperparameters
377
+ #### Non-Default Hyperparameters
378
+
379
+ - `eval_strategy`: steps
380
+ - `per_device_train_batch_size`: 32
381
+ - `per_device_eval_batch_size`: 32
382
+ - `fp16`: True
383
+ - `multi_dataset_batch_sampler`: round_robin
384
+
385
+ #### All Hyperparameters
386
+ <details><summary>Click to expand</summary>
387
+
388
+ - `overwrite_output_dir`: False
389
+ - `do_predict`: False
390
+ - `eval_strategy`: steps
391
+ - `prediction_loss_only`: True
392
+ - `per_device_train_batch_size`: 32
393
+ - `per_device_eval_batch_size`: 32
394
+ - `per_gpu_train_batch_size`: None
395
+ - `per_gpu_eval_batch_size`: None
396
+ - `gradient_accumulation_steps`: 1
397
+ - `eval_accumulation_steps`: None
398
+ - `torch_empty_cache_steps`: None
399
+ - `learning_rate`: 5e-05
400
+ - `weight_decay`: 0.0
401
+ - `adam_beta1`: 0.9
402
+ - `adam_beta2`: 0.999
403
+ - `adam_epsilon`: 1e-08
404
+ - `max_grad_norm`: 1
405
+ - `num_train_epochs`: 3
406
+ - `max_steps`: -1
407
+ - `lr_scheduler_type`: linear
408
+ - `lr_scheduler_kwargs`: {}
409
+ - `warmup_ratio`: 0.0
410
+ - `warmup_steps`: 0
411
+ - `log_level`: passive
412
+ - `log_level_replica`: warning
413
+ - `log_on_each_node`: True
414
+ - `logging_nan_inf_filter`: True
415
+ - `save_safetensors`: True
416
+ - `save_on_each_node`: False
417
+ - `save_only_model`: False
418
+ - `restore_callback_states_from_checkpoint`: False
419
+ - `no_cuda`: False
420
+ - `use_cpu`: False
421
+ - `use_mps_device`: False
422
+ - `seed`: 42
423
+ - `data_seed`: None
424
+ - `jit_mode_eval`: False
425
+ - `use_ipex`: False
426
+ - `bf16`: False
427
+ - `fp16`: True
428
+ - `fp16_opt_level`: O1
429
+ - `half_precision_backend`: auto
430
+ - `bf16_full_eval`: False
431
+ - `fp16_full_eval`: False
432
+ - `tf32`: None
433
+ - `local_rank`: 0
434
+ - `ddp_backend`: None
435
+ - `tpu_num_cores`: None
436
+ - `tpu_metrics_debug`: False
437
+ - `debug`: []
438
+ - `dataloader_drop_last`: False
439
+ - `dataloader_num_workers`: 0
440
+ - `dataloader_prefetch_factor`: None
441
+ - `past_index`: -1
442
+ - `disable_tqdm`: False
443
+ - `remove_unused_columns`: True
444
+ - `label_names`: None
445
+ - `load_best_model_at_end`: False
446
+ - `ignore_data_skip`: False
447
+ - `fsdp`: []
448
+ - `fsdp_min_num_params`: 0
449
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
450
+ - `fsdp_transformer_layer_cls_to_wrap`: None
451
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
452
+ - `deepspeed`: None
453
+ - `label_smoothing_factor`: 0.0
454
+ - `optim`: adamw_torch
455
+ - `optim_args`: None
456
+ - `adafactor`: False
457
+ - `group_by_length`: False
458
+ - `length_column_name`: length
459
+ - `ddp_find_unused_parameters`: None
460
+ - `ddp_bucket_cap_mb`: None
461
+ - `ddp_broadcast_buffers`: False
462
+ - `dataloader_pin_memory`: True
463
+ - `dataloader_persistent_workers`: False
464
+ - `skip_memory_metrics`: True
465
+ - `use_legacy_prediction_loop`: False
466
+ - `push_to_hub`: False
467
+ - `resume_from_checkpoint`: None
468
+ - `hub_model_id`: None
469
+ - `hub_strategy`: every_save
470
+ - `hub_private_repo`: None
471
+ - `hub_always_push`: False
472
+ - `hub_revision`: None
473
+ - `gradient_checkpointing`: False
474
+ - `gradient_checkpointing_kwargs`: None
475
+ - `include_inputs_for_metrics`: False
476
+ - `include_for_metrics`: []
477
+ - `eval_do_concat_batches`: True
478
+ - `fp16_backend`: auto
479
+ - `push_to_hub_model_id`: None
480
+ - `push_to_hub_organization`: None
481
+ - `mp_parameters`:
482
+ - `auto_find_batch_size`: False
483
+ - `full_determinism`: False
484
+ - `torchdynamo`: None
485
+ - `ray_scope`: last
486
+ - `ddp_timeout`: 1800
487
+ - `torch_compile`: False
488
+ - `torch_compile_backend`: None
489
+ - `torch_compile_mode`: None
490
+ - `include_tokens_per_second`: False
491
+ - `include_num_input_tokens_seen`: False
492
+ - `neftune_noise_alpha`: None
493
+ - `optim_target_modules`: None
494
+ - `batch_eval_metrics`: False
495
+ - `eval_on_start`: False
496
+ - `use_liger_kernel`: False
497
+ - `liger_kernel_config`: None
498
+ - `eval_use_gather_object`: False
499
+ - `average_tokens_across_devices`: False
500
+ - `prompts`: None
501
+ - `batch_sampler`: batch_sampler
502
+ - `multi_dataset_batch_sampler`: round_robin
503
+ - `router_mapping`: {}
504
+ - `learning_rate_mapping`: {}
505
+
506
+ </details>
507
+
508
+ ### Training Logs
509
+ <details><summary>Click to expand</summary>
510
+
511
+ | Epoch | Step | Training Loss | similarity_spearman_cosine |
512
+ |:------:|:----:|:-------------:|:--------------------------:|
513
+ | 0.0286 | 10 | - | -0.0664 |
514
+ | 0.0571 | 20 | - | -0.0621 |
515
+ | 0.0857 | 30 | - | -0.0581 |
516
+ | 0.1143 | 40 | - | -0.0516 |
517
+ | 0.1429 | 50 | - | -0.0444 |
518
+ | 0.1714 | 60 | - | -0.0334 |
519
+ | 0.2 | 70 | - | -0.0194 |
520
+ | 0.2286 | 80 | - | -0.0061 |
521
+ | 0.2571 | 90 | - | 0.0177 |
522
+ | 0.2857 | 100 | - | 0.0317 |
523
+ | 0.3143 | 110 | - | 0.0510 |
524
+ | 0.3429 | 120 | - | 0.0667 |
525
+ | 0.3714 | 130 | - | 0.0892 |
526
+ | 0.4 | 140 | - | 0.1206 |
527
+ | 0.4286 | 150 | - | 0.1584 |
528
+ | 0.4571 | 160 | - | 0.1821 |
529
+ | 0.4857 | 170 | - | 0.1716 |
530
+ | 0.5143 | 180 | - | 0.1749 |
531
+ | 0.5429 | 190 | - | 0.2192 |
532
+ | 0.5714 | 200 | - | 0.2473 |
533
+ | 0.6 | 210 | - | 0.2399 |
534
+ | 0.6286 | 220 | - | 0.2419 |
535
+ | 0.6571 | 230 | - | 0.2637 |
536
+ | 0.6857 | 240 | - | 0.2672 |
537
+ | 0.7143 | 250 | - | 0.2754 |
538
+ | 0.7429 | 260 | - | 0.2942 |
539
+ | 0.7714 | 270 | - | 0.3079 |
540
+ | 0.8 | 280 | - | 0.3079 |
541
+ | 0.8286 | 290 | - | 0.3077 |
542
+ | 0.8571 | 300 | - | 0.3012 |
543
+ | 0.8857 | 310 | - | 0.3148 |
544
+ | 0.9143 | 320 | - | 0.3199 |
545
+ | 0.9429 | 330 | - | 0.3306 |
546
+ | 0.9714 | 340 | - | 0.3363 |
547
+ | 1.0 | 350 | - | 0.3419 |
548
+ | 1.0286 | 360 | - | 0.3402 |
549
+ | 1.0571 | 370 | - | 0.3366 |
550
+ | 1.0857 | 380 | - | 0.3402 |
551
+ | 1.1143 | 390 | - | 0.3360 |
552
+ | 1.1429 | 400 | - | 0.3371 |
553
+ | 1.1714 | 410 | - | 0.3536 |
554
+ | 1.2 | 420 | - | 0.3268 |
555
+ | 1.2286 | 430 | - | 0.3443 |
556
+ | 1.2571 | 440 | - | 0.3011 |
557
+ | 1.2857 | 450 | - | 0.3549 |
558
+ | 1.3143 | 460 | - | 0.3321 |
559
+ | 1.3429 | 470 | - | 0.3505 |
560
+ | 1.3714 | 480 | - | 0.3412 |
561
+ | 1.4 | 490 | - | 0.3337 |
562
+ | 1.4286 | 500 | 0.1211 | 0.3488 |
563
+ | 1.4571 | 510 | - | 0.3486 |
564
+ | 1.4857 | 520 | - | 0.3508 |
565
+ | 1.5143 | 530 | - | 0.3561 |
566
+ | 1.5429 | 540 | - | 0.3592 |
567
+ | 1.5714 | 550 | - | 0.2950 |
568
+ | 1.6 | 560 | - | 0.3287 |
569
+ | 1.6286 | 570 | - | 0.3369 |
570
+ | 1.6571 | 580 | - | 0.3407 |
571
+ | 1.6857 | 590 | - | 0.3283 |
572
+ | 1.7143 | 600 | - | 0.3547 |
573
+ | 1.7429 | 610 | - | 0.3665 |
574
+ | 1.7714 | 620 | - | 0.3459 |
575
+ | 1.8 | 630 | - | 0.3614 |
576
+ | 1.8286 | 640 | - | 0.3514 |
577
+ | 1.8571 | 650 | - | 0.3714 |
578
+ | 1.8857 | 660 | - | 0.3647 |
579
+ | 1.9143 | 670 | - | 0.3601 |
580
+ | 1.9429 | 680 | - | 0.3292 |
581
+ | 1.9714 | 690 | - | 0.3321 |
582
+ | 2.0 | 700 | - | 0.3624 |
583
+ | 2.0286 | 710 | - | 0.3605 |
584
+ | 2.0571 | 720 | - | 0.3702 |
585
+ | 2.0857 | 730 | - | 0.3783 |
586
+ | 2.1143 | 740 | - | 0.3788 |
587
+ | 2.1429 | 750 | - | 0.3813 |
588
+ | 2.1714 | 760 | - | 0.3736 |
589
+ | 2.2 | 770 | - | 0.3762 |
590
+ | 2.2286 | 780 | - | 0.3804 |
591
+ | 2.2571 | 790 | - | 0.3805 |
592
+ | 2.2857 | 800 | - | 0.3755 |
593
+ | 2.3143 | 810 | - | 0.3647 |
594
+ | 2.3429 | 820 | - | 0.3654 |
595
+ | 2.3714 | 830 | - | 0.3767 |
596
+ | 2.4 | 840 | - | 0.3727 |
597
+ | 2.4286 | 850 | - | 0.3824 |
598
+ | 2.4571 | 860 | - | 0.3660 |
599
+ | 2.4857 | 870 | - | 0.3791 |
600
+ | 2.5143 | 880 | - | 0.3723 |
601
+ | 2.5429 | 890 | - | 0.3818 |
602
+ | 2.5714 | 900 | - | 0.3861 |
603
+ | 2.6 | 910 | - | 0.3861 |
604
+ | 2.6286 | 920 | - | 0.3857 |
605
+ | 2.6571 | 930 | - | 0.3825 |
606
+ | 2.6857 | 940 | - | 0.3680 |
607
+ | 2.7143 | 950 | - | 0.3750 |
608
+ | 2.7429 | 960 | - | 0.3815 |
609
+ | 2.7714 | 970 | - | 0.3851 |
610
+ | 2.8 | 980 | - | 0.3879 |
611
+ | 2.8286 | 990 | - | 0.3863 |
612
+ | 2.8571 | 1000 | 0.1033 | 0.3818 |
613
+ | 2.8857 | 1010 | - | 0.3882 |
614
+ | 2.9143 | 1020 | - | 0.3896 |
615
+ | 2.9429 | 1030 | - | 0.3899 |
616
+ | 2.9714 | 1040 | - | 0.3901 |
617
+
618
+ </details>
619
+
620
+ ### Framework Versions
621
+ - Python: 3.11.9
622
+ - Sentence Transformers: 5.1.0
623
+ - Transformers: 4.53.3
624
+ - PyTorch: 2.5.1
625
+ - Accelerate: 1.10.0
626
+ - Datasets: 2.14.4
627
+ - Tokenizers: 0.21.0
628
+
629
+ ## Citation
630
+
631
+ ### BibTeX
632
+
633
+ #### Sentence Transformers
634
+ ```bibtex
635
+ @inproceedings{reimers-2019-sentence-bert,
636
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
637
+ author = "Reimers, Nils and Gurevych, Iryna",
638
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
639
+ month = "11",
640
+ year = "2019",
641
+ publisher = "Association for Computational Linguistics",
642
+ url = "https://arxiv.org/abs/1908.10084",
643
+ }
644
+ ```
645
+
646
+ <!--
647
+ ## Glossary
648
+
649
+ *Clearly define terms in order to be accessible across audiences.*
650
+ -->
651
+
652
+ <!--
653
+ ## Model Card Authors
654
+
655
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
656
+ -->
657
+
658
+ <!--
659
+ ## Model Card Contact
660
+
661
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
662
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.53.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.53.3",
6
+ "pytorch": "2.5.1"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74ea83c2a7cbe4fa6b038fb4c95fc95a58aa9ec7a86f18ad0cb86eea42b983b3
3
+ size 1340612432
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff