B0ketto commited on
Commit
538028a
·
verified ·
1 Parent(s): f008a91

hmm404/siamese2.0-20251802

Browse files
README.md CHANGED
@@ -4,75 +4,85 @@ tags:
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
- - dataset_size:57485
8
  - loss:ContrastiveLoss
9
- base_model: B0ketto/tmp_trainer
10
  widget:
11
- - source_sentence: Russia is acting no differently than other countries.
 
 
12
  sentences:
13
- - Animals are not as self aware as humans, so they would not be aware of their coming
14
- death until it's already happened. Especially as the methods of doing so are quick
15
- and painless.
16
- - 'The US has regularly interfered in the elections and governments of foreign states
17
- consistently throughout its history: in Latin America, the Ukraine and Russia
18
- to name just a few.'
19
- - Pedophiles can have abusive ex-partners, parents they ran away from, or generally
20
- other people that they want to stay hidden from. Publicising their identities
21
- and locations allows them to be found by those who could inflict harm upon them.
22
- - source_sentence: Stem cell donations are allowed by MSMs, so should blood.
 
 
23
  sentences:
24
- - Some crimes impact people worldwide, such as hacking and virus creation. The cost
25
- of restitution in these cases would take a lifetime to pay back.
26
- - Studies indicate that the Muslim nature of countries does not influence the likelihood
27
- of conflict to occur. Factors such as a lack of economic growth, a demographic
28
- youth bulge, political repression and ethnic dominance have far more impact in
29
- this context \(Sørli et al, p. 141, 155; Karakaya, p. 533\).
30
- - Stem cell donation does not pose a risk of AIDs transmission to the recipient.
31
- - source_sentence: Global sustainable energy production could be accomplished with
32
- nuclear fusion.
 
 
 
33
  sentences:
34
- - In the long-term, nuclear fusion will be cheaper than nuclear fission
35
- - Maybe P=NP is a hoax?
36
- - It is impossible to meaningfully consent to being harmed.
37
- - source_sentence: European nations have contradictory needs and interests. Making
38
- those countries agree on crucial points, like economical politics, is impossible
39
- due to their differences.
 
 
 
 
 
40
  sentences:
41
- - The different interests of the nations are actually an argument FOR a USE, because
42
- then the majority could force the minority to accept changes that are good for
43
- the EU as a whole, but bad for special interests.
44
- - Children with vaginal hypoplasia can have vaginoplasty to remedy the diagnosis.
45
- After this procedure the patients are required to have their vaginas dilated three
46
- times a week, a very painful procedure, until they become sexually active. \(pp.394\)
47
- - Prisons would not be "too full" to hold \(violent\) offenders.
48
- - source_sentence: This argument has the same flaw as the argument for banning alcohol
49
- \(illuminated by the disaster of prohibition\) - driving things underground makes
50
- them more not less dangerous.
 
 
 
 
51
  sentences:
52
- - Bi is the prefix meaning two. If in fact bisexuals claim more than two genders
53
- than the term to use is not bisexual. Therefore the B in LGBTQ should be changed
54
- to an M for multi sexual
55
- - It would be possible for teachers to write supplementary exams for students without
56
- including The Adventures of Huckleberry Finn.
57
- - The difference is that prohibition 'prohibited' alcohol entirely, rather than
58
- licensing and controlling it. Consider cigarettes - they are heavily restricted
59
- and taxed in many countries, and smoking rates have dropped precipitously, but
60
- its still more common to buy them at a local store than on the black market. The
61
- proposal does not suggest 'prohibition' of guns, just restrictions to important
62
- and appropriate uses by responsible owners.
63
  pipeline_tag: sentence-similarity
64
  library_name: sentence-transformers
65
  ---
66
 
67
- # SentenceTransformer based on B0ketto/tmp_trainer
68
 
69
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [B0ketto/tmp_trainer](https://huggingface.co/B0ketto/tmp_trainer). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
70
 
71
  ## Model Details
72
 
73
  ### Model Description
74
  - **Model Type:** Sentence Transformer
75
- - **Base model:** [B0ketto/tmp_trainer](https://huggingface.co/B0ketto/tmp_trainer) <!-- at revision 4cc993aec9623bb30f9892e2f9b29f6af685ddc0 -->
76
  - **Maximum Sequence Length:** 384 tokens
77
  - **Output Dimensionality:** 768 dimensions
78
  - **Similarity Function:** Cosine Similarity
@@ -114,9 +124,9 @@ from sentence_transformers import SentenceTransformer
114
  model = SentenceTransformer("sentence_transformers_model_id")
115
  # Run inference
116
  sentences = [
117
- 'This argument has the same flaw as the argument for banning alcohol \\(illuminated by the disaster of prohibition\\) - driving things underground makes them more not less dangerous.',
118
- "The difference is that prohibition 'prohibited' alcohol entirely, rather than licensing and controlling it. Consider cigarettes - they are heavily restricted and taxed in many countries, and smoking rates have dropped precipitously, but its still more common to buy them at a local store than on the black market. The proposal does not suggest 'prohibition' of guns, just restrictions to important and appropriate uses by responsible owners.",
119
- 'Bi is the prefix meaning two. If in fact bisexuals claim more than two genders than the term to use is not bisexual. Therefore the B in LGBTQ should be changed to an M for multi sexual',
120
  ]
121
  embeddings = model.encode(sentences)
122
  print(embeddings.shape)
@@ -170,19 +180,19 @@ You can finetune this model on your own dataset.
170
 
171
  #### Unnamed Dataset
172
 
173
- * Size: 57,485 training samples
174
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
175
  * Approximate statistics based on the first 1000 samples:
176
- | | sentence1 | sentence2 | label |
177
- |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------|
178
- | type | string | string | int |
179
- | details | <ul><li>min: 3 tokens</li><li>mean: 30.47 tokens</li><li>max: 137 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 35.81 tokens</li><li>max: 162 tokens</li></ul> | <ul><li>0: ~54.70%</li><li>1: ~45.30%</li></ul> |
180
  * Samples:
181
- | sentence1 | sentence2 | label |
182
- |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
183
- | <code>Large stockpiles might not be available and what little is available will only serve to fuel corruption.</code> | <code>It might serve as a deterrent.</code> | <code>1</code> |
184
- | <code>Conditional aid works as an excuse to stop providing support for those in need.</code> | <code>Making aid conditional will simply redirect foreign aid funds from corrupt states to non-corrupt states. This is a perfectly reasonable move for donor states to take, and there is no reason why they would need an excuse.</code> | <code>0</code> |
185
- | <code>Any obstruction in the predictability of their choices \(given all the relevant data\) would have to result from some external source \(randomness, demonic possession, etc\), not determined by the individual's will. The interjection of such an external source, though making the individual less predictable, would arguably make them even less free, since it would restrict their ability to choose according to their nature.</code> | <code>Under the branching model of the universe, every time there is an opportunity for a declension, the individual makes all decisions available to them, each choice making a new universe where the events are carried out, no external force needed.</code> | <code>0</code> |
186
  * Loss: [<code>ContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
187
  ```json
188
  {
@@ -318,49 +328,45 @@ You can finetune this model on your own dataset.
318
  ### Training Logs
319
  | Epoch | Step | Training Loss |
320
  |:------:|:-----:|:-------------:|
321
- | 0.0696 | 500 | 0.0238 |
322
- | 0.1392 | 1000 | 0.0251 |
323
- | 0.2087 | 1500 | 0.0261 |
324
- | 0.2783 | 2000 | 0.0257 |
325
- | 0.3479 | 2500 | 0.026 |
326
- | 0.4175 | 3000 | 0.0271 |
327
- | 0.4871 | 3500 | 0.054 |
328
- | 0.5566 | 4000 | 0.0384 |
329
- | 0.6262 | 4500 | 0.036 |
330
- | 0.6958 | 5000 | 0.0347 |
331
- | 0.7654 | 5500 | 0.0342 |
332
- | 0.8350 | 6000 | 0.0331 |
333
- | 0.9045 | 6500 | 0.033 |
334
- | 0.9741 | 7000 | 0.0327 |
335
- | 1.0437 | 7500 | 0.033 |
336
- | 1.1133 | 8000 | 0.0324 |
337
- | 1.1829 | 8500 | 0.0325 |
338
- | 1.2524 | 9000 | 0.0319 |
339
- | 1.3220 | 9500 | 0.032 |
340
- | 1.3916 | 10000 | 0.0325 |
341
- | 1.4612 | 10500 | 0.0321 |
342
- | 1.5308 | 11000 | 0.032 |
343
- | 1.6003 | 11500 | 0.0317 |
344
- | 1.6699 | 12000 | 0.0319 |
345
- | 1.7395 | 12500 | 0.0316 |
346
- | 1.8091 | 13000 | 0.0316 |
347
- | 1.8787 | 13500 | 0.0318 |
348
- | 1.9482 | 14000 | 0.0317 |
349
- | 2.0178 | 14500 | 0.0319 |
350
- | 2.0874 | 15000 | 0.0319 |
351
- | 2.1570 | 15500 | 0.0315 |
352
- | 2.2266 | 16000 | 0.0317 |
353
- | 2.2961 | 16500 | 0.0315 |
354
- | 2.3657 | 17000 | 0.0316 |
355
- | 2.4353 | 17500 | 0.0315 |
356
- | 2.5049 | 18000 | 0.0311 |
357
- | 2.5745 | 18500 | 0.0297 |
358
- | 2.6440 | 19000 | 0.0294 |
359
- | 2.7136 | 19500 | 0.0289 |
360
- | 2.7832 | 20000 | 0.0286 |
361
- | 2.8528 | 20500 | 0.0287 |
362
- | 2.9223 | 21000 | 0.0288 |
363
- | 2.9919 | 21500 | 0.0286 |
364
 
365
 
366
  ### Framework Versions
@@ -369,7 +375,7 @@ You can finetune this model on your own dataset.
369
  - Transformers: 4.48.3
370
  - PyTorch: 2.5.1+cu124
371
  - Accelerate: 1.3.0
372
- - Datasets: 3.3.0
373
  - Tokenizers: 0.21.0
374
 
375
  ## Citation
 
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
+ - dataset_size:164244
8
  - loss:ContrastiveLoss
9
+ base_model: sentence-transformers/all-mpnet-base-v2
10
  widget:
11
+ - source_sentence: Un modelo "desde cero" no es necesariamente mejor, es una opinión
12
+ subjetiva. Otra opinión igualmente válida sería que iterar sobre el modelo existente
13
+ es más eficiente.
14
  sentences:
15
+ - Un modelo desde cero, creado usando el tiempo que sea necesario, es mejor que
16
+ iterar sobre uno existente. Iterar sobre uno existente implica mantener ciertas
17
+ estructuras o pilares que se crearon hace tiempo y que probablemente ya no sirvan
18
+ a los intereses actuales de los ciudadanos.
19
+ - La razón principal de la insatisfacción en Cataluña es la economía y el autogobierno.
20
+ Se puede solventar con un pacto fiscal y/o un estado federal.
21
+ - Pero de esa "Cataluña insatisfecha", primero hay que remarcar que ni un 40% de
22
+ los ciudadanos son independentistas. Y, segundo, la mayor parte de los insatisfechos
23
+ son extremistas que prácticamente siempre han estado en contra de que Cataluña
24
+ formase parte de España. No por la opinión de una minoría ruidosa se debe meter
25
+ en el mismo saco a todos.
26
+ - source_sentence: Cataluña puede formar otras alianzas.
27
  sentences:
28
+ - Por ejemplo, uno de los principales impuestos del estado, el impuesto sobre la
29
+ renta de las personas físicas \(IRPF)\), también funciona de forma redistributiva.
30
+ - En ningún caso se ha hablado de erradicar el Español en Cataluña por parte de
31
+ los partidos partidarios al sí, más bien al contrario; ya se ha comentado abiertamente
32
+ que habrá una convivencia entre ambas lenguas.
33
+ - Hay varios países del tamaño de Cataluña fuera de la UE y no por ello son económicamente
34
+ inviables, se negociarán los tratados y se llegará a los acuerdos necesarios,
35
+ ya sean de unión o de libre movimiento.
36
+ - source_sentence: Cataluña tiene derecho de autodeterminación, es decir el derecho
37
+ de un pueblo a decidir sus propias formas de gobierno, perseguir su desarrollo
38
+ económico, social y cultural, y estructurarse libremente, sin injerencias externas
39
+ y de acuerdo con el principio de igualdad.
40
  sentences:
41
+ - El estado Español se ha opuesto siempre a que las selecciones Catalanas de distintos
42
+ deportes formasen parte de las federaciones internacionales, Incluso que los pilotos
43
+ de formula-1 compitiesen con bandera Catalana, a diferencia de por ejemplo el
44
+ escocés David Coulthard.
45
+ - La comisión de le UE ya ha reiterado en varias ocasiones que si Cataluña se independiza,
46
+ saldría fuera de la UE. Volver a entrar en la UE podría durar unos cuantos años.
47
+ Por otro, las regiones de Europa tienen sus lobbies y representacion en la UE
48
+ con fondos de la UE.
49
+ - El argumento de que no habrán más derechos se refiere a los derechos de cada individuo
50
+ como catalán, no a los derechos del "pueblo catalán" como ente abstracto.
51
+ - source_sentence: Cataluña podría tener un sistema fiscal y una financiación mejores.
52
  sentences:
53
+ - Si bien es cierto que Cataluña es pionera en la abolición de la tauromaquia, sería
54
+ pecar de inocentes pensar que eso se deba a un pensamiento más avanzado. En toda
55
+ España hay mucha gente en contra de la tauromaquia, pero que en Cataluña se viese
56
+ como un símbolo de identidad española favoreció su abolición, no por motivos puramente
57
+ animalistas, sino nacionalistas.
58
+ - Para tener el control de la solidaridad emitida a otras regiones de España. Al
59
+ controlar el 100% de las finanzas y al ser un sujeto internacional, la nueva República
60
+ Catalana podrá realizar acuerdos de cooperación con las regiones que realmente
61
+ necesiten de esta. Hoy en día, en el Estado Español, la solidaridad entre regiones
62
+ es controlada desde Madrid, obviando año tras año la capacidad solidaria de las
63
+ regiones emisoras y al mismo tiempo las necesidades reales de las regiones receptoras.
64
+ - Para demostrar al mundo que el pueblo puede mandar sobre el gobierno si se va
65
+ a una.
66
+ - source_sentence: Cataluña podría acabar con la monarquía.
67
  sentences:
68
+ - Porque Cataluña sería una República independiente de la monarquía Española.
69
+ - Por definición, cualquier nacionalismo es malo ya que crea divisiones artificiales
70
+ y es fuente de conflictos.
71
+ - No existen ejemplos de fusiones entre estados soberanos en la historia reciente.
72
+ Solamente de uniones regionales \(o estratégicas como el BRICS\).
 
 
 
 
 
 
73
  pipeline_tag: sentence-similarity
74
  library_name: sentence-transformers
75
  ---
76
 
77
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
78
 
79
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
80
 
81
  ## Model Details
82
 
83
  ### Model Description
84
  - **Model Type:** Sentence Transformer
85
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 9a3225965996d404b775526de6dbfe85d3368642 -->
86
  - **Maximum Sequence Length:** 384 tokens
87
  - **Output Dimensionality:** 768 dimensions
88
  - **Similarity Function:** Cosine Similarity
 
124
  model = SentenceTransformer("sentence_transformers_model_id")
125
  # Run inference
126
  sentences = [
127
+ 'Cataluña podría acabar con la monarquía.',
128
+ 'Porque Cataluña sería una República independiente de la monarquía Española.',
129
+ 'Por definición, cualquier nacionalismo es malo ya que crea divisiones artificiales y es fuente de conflictos.',
130
  ]
131
  embeddings = model.encode(sentences)
132
  print(embeddings.shape)
 
180
 
181
  #### Unnamed Dataset
182
 
183
+ * Size: 164,244 training samples
184
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
185
  * Approximate statistics based on the first 1000 samples:
186
+ | | sentence1 | sentence2 | label |
187
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------|
188
+ | type | string | string | int |
189
+ | details | <ul><li>min: 12 tokens</li><li>mean: 55.71 tokens</li><li>max: 173 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 70.67 tokens</li><li>max: 180 tokens</li></ul> | <ul><li>0: ~60.00%</li><li>1: ~40.00%</li></ul> |
190
  * Samples:
191
+ | sentence1 | sentence2 | label |
192
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
193
+ | <code>La soberanía y la decisión sobre la unidad de España residen en el conjunto de España.</code> | <code>Apostar por un proceso de secesión es ir en contra de la globalización, la corriente histórica que vivimos.</code> | <code>1</code> |
194
+ | <code>Apostar por un proceso de secesión es ir en contra de la globalización, la corriente histórica que vivimos.</code> | <code>La independencia de Cataluña choca contra el ideal consistente en que la humanidad como especie debe evolucionar a estar más unida, favoreciendo el intercambio científico y tecnológico.</code> | <code>1</code> |
195
+ | <code>La independencia de Cataluña choca contra el ideal consistente en que la humanidad como especie debe evolucionar a estar más unida, favoreciendo el intercambio científico y tecnológico.</code> | <code>Los pueblos deben estar unidos y favorecer el diálogo para solucionar problemas que importan y mejorar así la convivencia.</code> | <code>1</code> |
196
  * Loss: [<code>ContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
197
  ```json
198
  {
 
328
  ### Training Logs
329
  | Epoch | Step | Training Loss |
330
  |:------:|:-----:|:-------------:|
331
+ | 0.0244 | 500 | 0.0306 |
332
+ | 0.0487 | 1000 | 0.0278 |
333
+ | 0.0731 | 1500 | 0.0286 |
334
+ | 0.0974 | 2000 | 0.0274 |
335
+ | 0.1218 | 2500 | 0.0416 |
336
+ | 0.1461 | 3000 | 0.0352 |
337
+ | 0.1705 | 3500 | 0.0337 |
338
+ | 0.1948 | 4000 | 0.0329 |
339
+ | 0.2192 | 4500 | 0.0331 |
340
+ | 0.2435 | 5000 | 0.0323 |
341
+ | 0.2679 | 5500 | 0.0327 |
342
+ | 0.2922 | 6000 | 0.0319 |
343
+ | 0.3166 | 6500 | 0.0314 |
344
+ | 0.3409 | 7000 | 0.032 |
345
+ | 0.3653 | 7500 | 0.0326 |
346
+ | 0.3897 | 8000 | 0.0316 |
347
+ | 0.4140 | 8500 | 0.0318 |
348
+ | 0.4384 | 9000 | 0.0321 |
349
+ | 0.4627 | 9500 | 0.0325 |
350
+ | 0.4871 | 10000 | 0.0322 |
351
+ | 0.5114 | 10500 | 0.0326 |
352
+ | 0.5358 | 11000 | 0.0324 |
353
+ | 0.5601 | 11500 | 0.0324 |
354
+ | 0.5845 | 12000 | 0.0326 |
355
+ | 0.6088 | 12500 | 0.0325 |
356
+ | 0.6332 | 13000 | 0.0323 |
357
+ | 0.6575 | 13500 | 0.0317 |
358
+ | 0.6819 | 14000 | 0.0322 |
359
+ | 0.7062 | 14500 | 0.032 |
360
+ | 0.7306 | 15000 | 0.0316 |
361
+ | 0.7550 | 15500 | 0.0321 |
362
+ | 0.7793 | 16000 | 0.032 |
363
+ | 0.8037 | 16500 | 0.0314 |
364
+ | 0.8280 | 17000 | 0.0316 |
365
+ | 0.8524 | 17500 | 0.0319 |
366
+ | 0.8767 | 18000 | 0.0319 |
367
+ | 0.9011 | 18500 | 0.0317 |
368
+ | 0.9254 | 19000 | 0.0321 |
369
+ | 0.9498 | 19500 | 0.0322 |
 
 
 
 
370
 
371
 
372
  ### Framework Versions
 
375
  - Transformers: 4.48.3
376
  - PyTorch: 2.5.1+cu124
377
  - Accelerate: 1.3.0
378
+ - Datasets: 3.3.1
379
  - Tokenizers: 0.21.0
380
 
381
  ## Citation
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "B0ketto/tmp_trainer",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
 
1
  {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b6cbb02e57c6275c10e778bfdf112231d9de8dde06d5e50989ef9f2dd6ba0f22
3
  size 437967672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:850a77da1411adf05dc654619047254d73d5ee807988db47b6575634fe735f72
3
  size 437967672
runs/Feb19_08-26-55_e81f908b4990/events.out.tfevents.1739953616.e81f908b4990.2272.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97ab9224ecfa0e9c36e6d27d1b2f98569971675add9e0da217052cccd7619ef1
3
+ size 4375
runs/Feb19_08-30-25_e81f908b4990/events.out.tfevents.1739953825.e81f908b4990.2272.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71e6605ef6f10bcf7a51fef989ba0f7e5433f15aa10022a5d5ac01478bd38887
3
+ size 12632
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c398cd2d5e4d21698eaa16ec0e94ca18789de8b4a972809ccfc16c63280bdc18
3
  size 5560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89f378966bb3104d5addaea8c27b7be092c3b85068beb7e0af3ff45ee1f6b910
3
  size 5560