yNilay commited on
Commit
12bf4bc
·
verified ·
1 Parent(s): 16e2e1c

Add new ColBERT model

Browse files
1_Dense/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5529cc87a2289f089f929ee0937e41995b22889a1ee3c4c2b44bd91fc52ae1c8
3
  size 524376
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e27b2c1c9b36c6a8c3096dc74595ba5bd7c985b57ebd31adb9ad3abc5bc0a4b
3
  size 524376
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - sentence-similarity
7
  - feature-extraction
8
  - generated_from_trainer
9
- - dataset_size:2595
10
  - loss:Contrastive
11
  base_model: LiquidAI/LFM2-ColBERT-350M
12
  pipeline_tag: sentence-similarity
@@ -24,7 +24,7 @@ model-index:
24
  type: unknown
25
  metrics:
26
  - type: accuracy
27
- value: 0.9930796027183533
28
  name: Accuracy
29
  ---
30
 
@@ -37,8 +37,8 @@ This is a [PyLate](https://github.com/lightonai/pylate) model finetuned from [Li
37
  ### Model Description
38
  - **Model Type:** PyLate model
39
  - **Base model:** [LiquidAI/LFM2-ColBERT-350M](https://huggingface.co/LiquidAI/LFM2-ColBERT-350M) <!-- at revision 3cccd22d874924de3d56640167f1aa056c0c6809 -->
40
- - **Document Length:** 512 tokens
41
- - **Query Length:** 32 tokens
42
  - **Output Dimensionality:** 128 tokens
43
  - **Similarity Function:** MaxSim
44
  <!-- - **Training Dataset:** Unknown -->
@@ -55,7 +55,7 @@ This is a [PyLate](https://github.com/lightonai/pylate) model finetuned from [Li
55
 
56
  ```
57
  ColBERT(
58
- (0): Transformer({'max_seq_length': 511, 'do_lower_case': False, 'architecture': 'Lfm2Model'})
59
  (1): Dense({'in_features': 1024, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
60
  )
61
  ```
@@ -216,9 +216,9 @@ You can finetune this model on your own dataset.
216
 
217
  * Evaluated with <code>pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator</code>
218
 
219
- | Metric | Value |
220
- |:-------------|:-----------|
221
- | **accuracy** | **0.9931** |
222
 
223
  <!--
224
  ## Bias, Risks and Limitations
@@ -239,19 +239,19 @@ You can finetune this model on your own dataset.
239
  #### Unnamed Dataset
240
 
241
 
242
- * Size: 2,595 training samples
243
  * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
244
  * Approximate statistics based on the first 1000 samples:
245
- | | query | positive | negative |
246
- |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
247
- | type | string | string | string |
248
- | details | <ul><li>min: 23 tokens</li><li>mean: 31.87 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 32 tokens</li><li>mean: 32.0 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 32 tokens</li><li>mean: 32.0 tokens</li><li>max: 32 tokens</li></ul> |
249
  * Samples:
250
- | query | positive | negative |
251
- |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
252
- | <code>I'm trying to decide whether to hire Michael Wolff to help craft a media strategy for this book project I'm working on, or if I should go with the person Ed mentioned who could organize my story more coherently. Whom should I speak with about this, given that I also need to be careful about any legal implications with the ongoing case discussions?</code> | <code>Subject: Re: Obama Favoring Former White House Counsel Kathryn Ruemmler to Succeed Holder as Attorney General - Bloomberg<br><br><br>---<br><br>From: Kathy Ruemmler<br>To: jeffrey E. [jeevacation@gmail.com]<br>Date: Oct 14, 2014 8:10 PM<br><br>http://mobile.bloomberg.com/news/2014-10-14/ruemmler-said-to-emerge-as-obama-favorite-for-justice-job.html<br><br>---<br><br>From: jeffrey E. [jeevacation@gmail.com]<br>To: Kathy Ruemmler<br>Date: Oct 14, 2014 8:20 PM<br><br>number?<br><br>---<br><br>From: Kathy Ruemmler<br>To: jeffrey E. [jeevacation@gmail.com]<br>Date: Tue, Oct 14, 2</code> | <code>Subject: Saudi Arabia warns Trump on blocking oil imports<br><br><br>---<br><br>From: Richard Kahn<br>To: Jeffrey Epstein [jeevacation@gmail.com]<br>Date: 11/16/2016 10:33:03 AM<br><br>http://www.cnbc.com/2016/11/16/saudi-arabia-warns-trump-on-blocking-oil-imports.html</code> |
253
- | <code>I'm meeting with Robert Kuhn early February - can you remind me what days I confirmed I'd be around? Also want to review the sleep/dreams show proposal we discussed and any recent articles I sent him about sleep science research.</code> | <code>Subject: Re:<br><br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>To: Dangene and Jennie Enterprise<br>Date: Nov 27, 2016, at 12:29 PM<br><br>What is the relaxation machine in waiting area<br><br>---<br><br>From: Dangene and Jennie Enterprise<br>To: jeffrey E. <jeevacation@gmail.com><br>Date: Sun, Nov 27, 2016 at 12:40 PM<br><br>soma dome ....How was your thanksgiving ? Love u<br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>To: Dangene and Jennie Enterprise<br>Date: Nov 27, 2016, at 12:43 PM<br><br>really fun, im in palm with the trump crowd<br><br>---<br><br>From: </code> | <code>Subject: Fw: Daily Mail<br><br><br>---<br><br>From: Ross Gow<br>To: Ghislaine<br>Date: 9 Mar 2011, 22:52<br><br>Dear Ghislaine<br>Thank you for your good-humoured patience with us and the legal team today, during what is, understandably, a testing time.<br>To confirm:<br>1) I spoke to Dilenschneider in NYC at length tonight and shared strategy and role defmition.<br>2) we sent agreed statement to Dan Mangan ay NYPost - he's on deadline for a piece on Charlie Sheen/Rob Lowe and seems to have moved on - for the moment.<br>3) we will release statement</code> |
254
- | <code>Larry mentioned Trump is arriving at PBI this Friday and staying through Monday — I'm supposed to fly back from New York on Monday morning. Can I still land at Palm Beach or do I need to route through Fort Lauderdale or Boca again? And do I need to give TSA that 24-hour heads up again?</code> | <code>Subject: POTUS<br><br><br>---<br><br>From: Larry Visoski<br>To: Je vacation [jeevacation@gmail.com]<br>Date: 4/12/2018 12:36:12 AM<br><br>Jeffrey, <br>Rumor at the PBI airport is President Trump is due to arrive PBI Monday the 16th and stay until 23rd or <br>24th , <br>This means we can only arrive in PBI from Teterboro after the 16th. <br>Thx <br>Larry</code> | <code>Subject: Fw: Daily Mail<br><br><br>---<br><br>From: Ross Gow<br>To: Ghislaine<br>Date: 9 Mar 2011, 22:52<br><br>Dear Ghislaine<br>Thank you for your good-humoured patience with us and the legal team today, during what is, understandably, a testing time.<br>To confirm:<br>1) I spoke to Dilenschneider in NYC at length tonight and shared strategy and role defmition.<br>2) we sent agreed statement to Dan Mangan ay NYPost - he's on deadline for a piece on Charlie Sheen/Rob Lowe and seems to have moved on - for the moment.<br>3) we will release statement</code> |
255
  * Loss: <code>pylate.losses.contrastive.Contrastive</code>
256
 
257
  ### Evaluation Dataset
@@ -259,31 +259,32 @@ You can finetune this model on your own dataset.
259
  #### Unnamed Dataset
260
 
261
 
262
- * Size: 289 evaluation samples
263
  * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
264
- * Approximate statistics based on the first 289 samples:
265
- | | query | positive | negative |
266
- |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
267
- | type | string | string | string |
268
- | details | <ul><li>min: 26 tokens</li><li>mean: 31.89 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 32 tokens</li><li>mean: 32.0 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 32 tokens</li><li>mean: 32.0 tokens</li><li>max: 32 tokens</li></ul> |
269
  * Samples:
270
- | query | positive | negative |
271
- |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
272
- | <code>What were Reid and I predicting back in late 2017 about who would fall in the #MeToo wave and the Mueller investigation? Remind me what we were right about and what actually happened.</code> | <code>Subject: Re:<br><br><br>---<br><br>From: jeffrey E.<br>To: Weingarten, Reid<br>Date: Tuesday, July 25, 2017 8:44 AM<br><br><br><br>---<br><br>From: jeffrey E.<br>To: Weingarten, Reid<br>Date: Tuesday, July 25, 2017 1:08 PM<br><br>S\|<br><br>---<br><br>From: Weingarten, Reid<br>To: jeffrey E. [jeevacation@gmail.com]<br>Date: Tuesday, July 25, 2017 9:51 PM<br><br>If trump fires mueller and gets away with it it is 1933 berlin<br><br>---<br><br>From: jeffrey E.<br>To: Weingarten, Reid<br>Date: Wednesday, July 26, 2017 7:53 AM<br><br>https://www.washingtonpost.com/politics/at-ohio-campaign-rally-trump-offers-a</code> | <code>Subject: URGENT: BuzzFeed News inquiry re allegations of sexual harassment<br><br><br>---<br><br><br><br>case, that the University would remove the allegation from my record after 5 years, which makes me surprised that someone violated<br>that written agreement with you.<br><br>Re item 6: You report on ASU’s response to item #6 , without including the fact that the University specifically stated there were never<br>any allegations of sexual misconduct or harassment by me at the University, and the outside complaints were in fact related sp</code> |
273
- | <code>Amanda's been sending me these bullish XLF call spread ideas since the election, saying financials have another 20-25% upside with all the Trump policy tailwinds. But I'm also seeing headlines about the VIX spiking again and we've had some rough options trades recently with the SPY positions. Should I actually put more money into these XLF trades or are we better off taking some profits and hedging for protection instead?</code> | <code>Subject: Financials trade for Monday<br><br><br>---<br><br>From: Ens, Amanda<br>To: jeffrey E. [jeevacation@gmail.com], Richard Kahn<br>Date: Friday, November 11, 2016 4:04 PM<br><br>It's not too late to buy financials as a medium term trade. They've run up a lot this week but we're getting endless calls from generalists asking which banks to buy there is still more upside to the sector. Banks also provide some offset to your bonds if interest rates continue to move. Our financials sector specialist thinks XLF could have another 20</code> | <code>Subject: Trump set to dine with law professor Alan Dershowitz<br><br><br>---<br><br>From: Robert Trivers<br>To: Jeffrey Epstein [jeevacation@gmail.com], Gordon Getty, david haig, Norman Finkelstein<br>Date: 4/11/2018 2:18:04 AM<br><br>My God<br>that Jewish Nazi is joining to join our Narcissistic Psychopath for dinner—how appropriate<br>https://www.cnn.com/2018/04/10/politics/donald-trump-alan-dershowitz/index.html</code> |
274
- | <code>I'm having dinner with Leon tonight and I need to catch him up on the family office situation. Can you remind me of all the key issues we discussed about restructuring - the personnel problems, the structure recommendations, what decisions are still pending, and where we left things with Brad and Larry Delson? I want to be fully prepped.</code> | <code>Subject: The Year Ahead: A World of Change (Short Version Below For Convenience)<br><br><br>---<br><br>From: Morris, Paul V<br>To: Morris, Paul V, Lehane, Sean T<br>Date: 1/5/2017 3:31:54 PM<br><br>The Year Ahead: A World of Change<br>A continued rise in equities. Renewed investor confidence. Bond markets under pressure. Our Chief Investment Office explores the risks and opportunities of these and other trends shaping the year ahead.<br>AS WE LOOK BACK ON 2016, one could characterize it as The Year of The Unlikely. It began with deep worri</code> | <code>Subject: Meet with our Global Head of Commodities - Monday at 10:15am at One Bryant Park<br><br><br>---<br><br>From: Amanda Ens<br>To: jeffrey E. [jeevacation@gmail.com], Richard Kahn<br>Date: 1/27/2017 8:03:49 PM<br><br>Please let me know if you're interested in joining a small group meeting with our head of Global Commodities. Apologies for the short notice but they just created this based on client request. <br> Monday, January 30 at 10:15am at One Bryant Park (42nd St & 6th Ave), Room 5F <br>Francisco Blanch is a managing director and</code> |
275
  * Loss: <code>pylate.losses.contrastive.Contrastive</code>
276
 
277
  ### Training Hyperparameters
278
  #### Non-Default Hyperparameters
279
 
280
  - `eval_strategy`: epoch
281
- - `per_device_train_batch_size`: 32
 
282
  - `learning_rate`: 3e-06
283
- - `num_train_epochs`: 5
284
  - `warmup_ratio`: 0.1
285
  - `bf16`: True
286
  - `load_best_model_at_end`: True
 
287
 
288
  #### All Hyperparameters
289
  <details><summary>Click to expand</summary>
@@ -292,11 +293,11 @@ You can finetune this model on your own dataset.
292
  - `do_predict`: False
293
  - `eval_strategy`: epoch
294
  - `prediction_loss_only`: True
295
- - `per_device_train_batch_size`: 32
296
  - `per_device_eval_batch_size`: 8
297
  - `per_gpu_train_batch_size`: None
298
  - `per_gpu_eval_batch_size`: None
299
- - `gradient_accumulation_steps`: 1
300
  - `eval_accumulation_steps`: None
301
  - `torch_empty_cache_steps`: None
302
  - `learning_rate`: 3e-06
@@ -305,7 +306,7 @@ You can finetune this model on your own dataset.
305
  - `adam_beta2`: 0.999
306
  - `adam_epsilon`: 1e-08
307
  - `max_grad_norm`: 1.0
308
- - `num_train_epochs`: 5
309
  - `max_steps`: -1
310
  - `lr_scheduler_type`: linear
311
  - `lr_scheduler_kwargs`: {}
@@ -374,7 +375,7 @@ You can finetune this model on your own dataset.
374
  - `hub_private_repo`: None
375
  - `hub_always_push`: False
376
  - `hub_revision`: None
377
- - `gradient_checkpointing`: False
378
  - `gradient_checkpointing_kwargs`: None
379
  - `include_inputs_for_metrics`: False
380
  - `include_for_metrics`: []
@@ -412,57 +413,36 @@ You can finetune this model on your own dataset.
412
  ### Training Logs
413
  | Epoch | Step | Training Loss | Validation Loss | accuracy |
414
  |:-------:|:-------:|:-------------:|:---------------:|:--------:|
415
- | 0.1220 | 10 | 7.2326 | - | - |
416
- | 0.2439 | 20 | 5.2684 | - | - |
417
- | 0.3659 | 30 | 3.0132 | - | - |
418
- | 0.4878 | 40 | 2.8284 | - | - |
419
- | 0.6098 | 50 | 2.6429 | - | - |
420
- | 0.7317 | 60 | 2.4057 | - | - |
421
- | 0.8537 | 70 | 2.4203 | - | - |
422
- | 0.9756 | 80 | 2.317 | - | - |
423
- | 0 | 0 | - | - | 0.9827 |
424
- | 1.0 | 82 | - | 1.0257 | - |
425
- | 1.0976 | 90 | 2.094 | - | - |
426
- | 1.2195 | 100 | 2.1341 | - | - |
427
- | 1.3415 | 110 | 2.0463 | - | - |
428
- | 1.4634 | 120 | 1.9246 | - | - |
429
- | 1.5854 | 130 | 2.0488 | - | - |
430
- | 1.7073 | 140 | 1.8437 | - | - |
431
- | 1.8293 | 150 | 2.0428 | - | - |
432
- | 1.9512 | 160 | 1.9285 | - | - |
433
- | 0 | 0 | - | - | 0.9827 |
434
- | **2.0** | **164** | **-** | **0.9127** | **-** |
435
- | 2.0732 | 170 | 1.6863 | - | - |
436
- | 2.1951 | 180 | 1.6403 | - | - |
437
- | 2.3171 | 190 | 1.5212 | - | - |
438
- | 2.4390 | 200 | 1.5535 | - | - |
439
- | 2.5610 | 210 | 1.597 | - | - |
440
- | 2.6829 | 220 | 1.4525 | - | - |
441
- | 2.8049 | 230 | 1.4934 | - | - |
442
- | 2.9268 | 240 | 1.484 | - | - |
443
- | 0 | 0 | - | - | 0.9931 |
444
- | 3.0 | 246 | - | 0.9280 | - |
445
- | 3.0488 | 250 | 1.2636 | - | - |
446
- | 3.1707 | 260 | 1.24 | - | - |
447
- | 3.2927 | 270 | 1.0649 | - | - |
448
- | 3.4146 | 280 | 1.1574 | - | - |
449
- | 3.5366 | 290 | 1.3538 | - | - |
450
- | 3.6585 | 300 | 1.2214 | - | - |
451
- | 3.7805 | 310 | 1.1509 | - | - |
452
- | 3.9024 | 320 | 1.2593 | - | - |
453
- | 0 | 0 | - | - | 0.9965 |
454
- | 4.0 | 328 | - | 1.0566 | - |
455
- | 4.0244 | 330 | 0.9556 | - | - |
456
- | 4.1463 | 340 | 0.9037 | - | - |
457
- | 4.2683 | 350 | 0.9545 | - | - |
458
- | 4.3902 | 360 | 1.0772 | - | - |
459
- | 4.5122 | 370 | 0.8326 | - | - |
460
- | 4.6341 | 380 | 0.9677 | - | - |
461
- | 4.7561 | 390 | 1.0568 | - | - |
462
- | 4.8780 | 400 | 0.9634 | - | - |
463
- | 5.0 | 410 | 0.8972 | - | - |
464
- | 0 | 0 | - | - | 0.9931 |
465
- | 5.0 | 410 | - | 1.1747 | - |
466
 
467
  * The bold row denotes the saved checkpoint.
468
 
 
6
  - sentence-similarity
7
  - feature-extraction
8
  - generated_from_trainer
9
+ - dataset_size:5191
10
  - loss:Contrastive
11
  base_model: LiquidAI/LFM2-ColBERT-350M
12
  pipeline_tag: sentence-similarity
 
24
  type: unknown
25
  metrics:
26
  - type: accuracy
27
+ value: 0.9740034937858582
28
  name: Accuracy
29
  ---
30
 
 
37
  ### Model Description
38
  - **Model Type:** PyLate model
39
  - **Base model:** [LiquidAI/LFM2-ColBERT-350M](https://huggingface.co/LiquidAI/LFM2-ColBERT-350M) <!-- at revision 3cccd22d874924de3d56640167f1aa056c0c6809 -->
40
+ - **Document Length:** 8192 tokens
41
+ - **Query Length:** 64 tokens
42
  - **Output Dimensionality:** 128 tokens
43
  - **Similarity Function:** MaxSim
44
  <!-- - **Training Dataset:** Unknown -->
 
55
 
56
  ```
57
  ColBERT(
58
+ (0): Transformer({'max_seq_length': 8191, 'do_lower_case': False, 'architecture': 'Lfm2Model'})
59
  (1): Dense({'in_features': 1024, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
60
  )
61
  ```
 
216
 
217
  * Evaluated with <code>pylate.evaluation.colbert_triplet.ColBERTTripletEvaluator</code>
218
 
219
+ | Metric | Value |
220
+ |:-------------|:----------|
221
+ | **accuracy** | **0.974** |
222
 
223
  <!--
224
  ## Bias, Risks and Limitations
 
239
  #### Unnamed Dataset
240
 
241
 
242
+ * Size: 5,191 training samples
243
  * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
244
  * Approximate statistics based on the first 1000 samples:
245
+ | | query | positive | negative |
246
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
247
+ | type | string | string | string |
248
+ | details | <ul><li>min: 23 tokens</li><li>mean: 54.59 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 47 tokens</li><li>mean: 63.91 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 39.07 tokens</li><li>max: 64 tokens</li></ul> |
249
  * Samples:
250
+ | query | positive | negative |
251
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
252
+ | <code>Should I tell Sultan to accept Tom Barrack's inauguration invitation? I know Tom's the closest Trump confidant outside the family now, but from what I remember discussing with Michael about going public versus keeping my head down, I'm worried about the optics. Is it even worth it given what I told Sultan about the crowds? I need to decide tonight.</code> | <code>Subject: Re: Presidential inauguration<br><br><br>---<br><br>From: Sultan Bin Sulayem<br>Date: Fri, Jan 6, 2017 at 4:08 AM<br><br>Should I accept the invitation sent by Tom barrack<br><br>---<br><br>From: jeffrey E. [jeevacation@gmail.com]<br>To: Sultan Bin Sulayem<br>Date: 1/6/2017 2:49:57 PM<br><br>http://www.cnn.com/2017/01/06/politics/tom-barrack-donald-trump-inauguration/index.html</code> | <code>Your reservation at The Four Seasons New York is confirmed for Friday.</code> |
253
+ | <code>What's the latest on my inroads to the Trump administration? Last I checked, Thiel was getting close to them and there was talk about him getting a role. Where do things stand now and who's my best access point at this point?</code> | <code>Subject: Fwd: fact checking questions for New York Magazine story<br><br><br>---<br><br>From: Yablon, Alex<br>To: jeevacation@gmail.com<br>Date: Tue, Mar 31, 2015 at 11:23 AM<br><br>Hi Jeffrey,<br><br>Sorry for the phone tag. In case it's easier to respond by email, I have put my questions below. I'm at my desk and should be here until 5, and should be free on my cell phone after 6:30.<br><br>Best,<br><br>Alex<br><br>-what is the square footage of your Manhattan home?<br><br>-do you work on a laptop from your dining room, with a large white board for notes and several pairs of reading glasses close at hand? Is the dining room windowless?<br><br>-do paparazzi often camp outside your home?<br><br>-did Michael once visit on the same day as a head of state who'd had a police escort?<br><br>-did you tell Michael about a dinner you'd hosted for six tech entrepreneurs who had a combined worth of several hundred billion dollars? When was this meal?<br><br>-Do you believe that there are now more people who possess Roosevelt or Carnegie-levels of wealth, the sort that can ri...</code> | <code>Subject: Re:<br><br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>Date: Sep 20, 2018, at 10:40 AM<br><br>The Guardian has learned that Rubenfeld is currently the subject of an internal investigation at Yale. The investigation is focused on Rubenfeld's conduct, particularly with female law students. Students have also raised related concerns to Yale authorities about Chua's powerful influence in the clerkships process. The investigation was initiated before Kavanaugh was nominated by Donald Trump to serve on the high court. Rubenfeld said in a statement to the Guardian: "In June, Yale University informed me that it would conduct what it terms an 'informal review' of certain allegations, but that to preserve anonymity, I was not entitled to know any specifics. As a result, I do not know what I am alleged to have said or done. I was further advised that the allegations were not of the kind that would jeopardize my position as a long-tenured member of the faculty."<br><br>---<br><br>From: Lawrence Krauss<br>To: jef...</code> |
254
+ | <code>What's the current status with Lawrence and those harassment allegations from BuzzFeed? Did he ever get around to writing that point by point refutation we talked about? And how's he doing with the Bulletin now that he took that leave?</code> | <code>Subject: URGENT: BuzzFeed News inquiry re allegations of sexual harassment<br><br><br>---<br><br><br><br>case, that the University would remove the allegation from my record after 5 years, which makes me surprised that someone violated<br>that written agreement with you.<br><br>Re item 6: You report on ASU’s response to item #6 , without including the fact that the University specifically stated there were never<br>any allegations of sexual misconduct or harassment by me at the University, and the outside complaints were in fact related specifically<br>to your item #6. Further you neglect to mention that this complaint was by an anonymous third party, not the individual who was<br>allegedly harassed, who never lodged a complain, and that no specific evidence was provided of the alleged transgression. \| was<br>surprised and dismayed that both ASU and ANU launched investigations on the basis of this but was told by both Universities that<br>because of my high profile even such unsubstantiated third party complaints at private event...</code> | <code>Subject: Re:<br><br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>To: Joi Ito<br>Date: Nov 22, 2017, at 10:44<br><br>all good?<br><br>---<br><br>From: Joi Ito<br>To: jeffrey E. <jeevacation@gmail.com><br>Date: Wed, Nov 22, 2017 at 10:51 AM<br><br>Pretty good. Had a pinched nerve in my neck that screwed me up for awhile. How about you? Any plans to come to Boston?<br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>To: Joi Ito<br>Date: Nov 22, 2017, at 10:57<br><br>maybe week of 3rd not seeing anything that exciting ornew ? you? with all these guys getting busted for harassment , 1 have moved slightly up on the repuation ladder and have been asked everday for advice etc. this morning I have Ken Starr coming to point out how if clinton cigar lewinsky were to be outed today the world would be a different place<br><br>---<br><br>From: Joi<br>To: jeffrey E. [jeevacation@gmail.com]<br>Date: 11/22/2017 6:14:38 PM<br><br>Lots of stuff going on that week but I’m in town.<br><br>#metoo is quite amazing...<br><br>Madars is doing well. His PhD paper won an award and he found a vulner...</code> |
255
  * Loss: <code>pylate.losses.contrastive.Contrastive</code>
256
 
257
  ### Evaluation Dataset
 
259
  #### Unnamed Dataset
260
 
261
 
262
+ * Size: 577 evaluation samples
263
  * Columns: <code>query</code>, <code>positive</code>, and <code>negative</code>
264
+ * Approximate statistics based on the first 577 samples:
265
+ | | query | positive | negative |
266
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
267
+ | type | string | string | string |
268
+ | details | <ul><li>min: 23 tokens</li><li>mean: 54.08 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 47 tokens</li><li>mean: 63.95 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 36.6 tokens</li><li>max: 64 tokens</li></ul> |
269
  * Samples:
270
+ | query | positive | negative |
271
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
272
+ | <code>A reporter from BuzzFeed just reached out about those old allegations and they're publishing soon. What advice did I give that physicist friend Lawrence a few years back when he was dealing with something similar? I remember telling him something about how these reporters operate and what they do with your responses.</code> | <code>Subject: Re:<br><br><br>---<br><br>From: R. Couri Hay<br>To: jeevacation@gmail.com<br>Date: Friday, March 4 2011 07:03 PM<br><br>Good morning Jeffery, <br>The Newsweek story is being written without your input as you read this. Lloyd Grove is not writing the story, he made a few calls but he <br>was taken off the project. This is for Newsweek, the magazine that is on the stands, not the website. It's still 1200 to 1500 words. <br>Alexandra Wolfe needs to turn in a draft of the story by Monday. I've been subtly guiding Alexandra on your behalf, but would really like <br>to formalize this job with your attorneys and then I would really need to give names and numbers of pro Jeffery power brokers for <br>Alexandra to call. She has already called Donald Trump, Leon Black, and Les Wexner among others. I've given her Jonathans number, <br>please let me focus on this before its too late to spin this in your direction. I'm in all afternoon my office number is My cell <br>is: My email i _________ <br>Cheers, <br>Couri<br><br>---<br><br>From: jeffrey epstein <j...</code> | <code>Subject: Re: <br><br><br>---<br><br>From: J [jeevacation@gmail.com]<br>To: Michael Wolff<br>Date: On Thu, May 30, 2019 at 5:29 PM<br><br>is it a coincidence that the russian that bought the house in palm beach and knows all , is the same guy <br>that sold a painting last year to mbs for 450 million dollars. that was only worth 1. 5m?<br><br>---<br><br>From: Michael Wolff<br>To: J <jeevacation@gmail.com><br>Date: On Thu, May 30, 2019 at 5:33 PM<br><br>So MBS was paying him off? Why? Ideas?<br><br>---<br><br>From: J <jeevacation@gmail.com><br>To: Michael Wolff<br>Date: On Thu, May 30, 2019 at 5:35 PM<br><br>reminder trump overuled congress on yemen.<br><br>---<br><br>From: Michael Wolff<br>To: J <jeevacation@gmail.com><br>Date: On Thu, May 30, 2019 at 5:37 PM<br><br>Starting to smell sweet, the way you put it!<br><br>---<br><br>From: J <jeevacation@gmail.com><br>To: Michael Wolff<br>Date: On Thu, May 30, 2019 at 5:40 PM<br><br>In addition my art guyd said the painting wasn't very good<br><br>---<br><br>From: Michael Wolff<br>To: J <jeevacation@gmail.com><br>Date: On Thu, May 30, 2019 at 5:41 PM<br><br>You have an art guy?<br><br>---<br><br>From: ...</code> |
273
+ | <code>Michael Wolff wants me to do something with him again - he mentioned some big TV interview opportunity. Can you remind me what advice he gave me last time about going public and dealing with the media? I think it was something about going on Charlie Rose and becoming an anti-Trump voice to get political cover. And should I reach out to Kathy about this since she's been helping me understand Washington?</code> | <code>Subject: Re:<br><br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>To: Thorbjørn Jagland<br>Date: Feb 17, 2017 9:27 PM<br><br>Im in paris until thurs? you?<br><br>---<br><br>From: Thorbjørn Jagland<br>To: jeffrey E. <jeevacation@gmail.com><br>Date: Feb 19, 2017 8:54 AM<br><br>Is It possible for you to pass by Strasbourg, it would be great. I really need to understand more about Trump<br>and what's going on in the American society.<br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>To: Thorbjørn Jagland<br>Date: Feb 19, 2017 11:14<br><br>yes, that should be possible. remind me how long is the fast train? otherwise ill fly. what days are good<br>for you<br><br>---<br><br>From: Thorbjørn Jagland<br>To: jeffrey E. [jeevacation@gmail.com]<br>Date: 2/19/2017 9:11:12 PM<br><br>Train, 1h45, I'll pick you up at the train station. Tuesday afternoon is ok. Also Wednesday, but only after 6</code> | <code>Please find attached the Q3 board meeting minutes for your review.</code> |
274
+ | <code>Where do things stand with the media and legal situation now? I have that meeting next week and need to refresh my memory on what we've covered with reporters, our current search results status, and where Michael Wolff left off.</code> | <code>Subject: Re: Shears Update<br><br><br>---<br><br>From: Tyler Shears<br>To: jeffrey E. [jeevacation@gmail.com]<br>Date: 7/16/2014 5:56:14 PM<br><br>yes agree<br>result got worse with recent negative press, clinton, as we discussed<br>we were down to 1 negative (forbes) before that - Christina can confirm this.<br>It will get back to 1 and then 0 so long as negative things stop coming out. if new negative keeps coming out it<br>really dismantles much of our effort... especially when it involves an ex-president<br>no excuses here i'm not pleased with where it is at and am still working to make it happen<br><br>---<br><br>From: jeffrey E. <jeevacation@gmail.com><br>Date: Wed, Jul 16, 2014 at 1:25 PM<br><br>Results still very bad<br><br>---<br><br>From: Christina Galbraith<br>To: Tyler Shears<br>Date: Wednesday, July 16, 2014<br><br>Hi Tyler,<br>The social media sites are constantly updated: LinkedIn, Facebook, Twitter, google +.<br>I'll be in touch later today re: feature article.<br>Could you put a site map into the Net site.? This helps rankings. Also could you add it to google ana...</code> | <code>Subject: Re:<br><br><br>---<br><br>From: jeffrey E. [jeevacation@gmail.com]<br>To: Jonathan Farkas<br>Date: 12/7/2016 12:25:22 P.M. Eastern Standard Time<br><br>plenty left<br><br>---<br><br>From: Jonathan Farkas<br>Date: Wed, Dec 7, 2016 at 1:01 PM<br><br>Hi jeffrey hope all is well \| think you are going to have a winderful life from now on in your opinion how much is left in<br>this market it's been a trump triumph \| gave him some money through woody best jonathan<br><br>---<br><br>From: jeffrey E. [jeevacation@gmail.com]<br>To: Jonathan Farkas<br>Date: 12/7/2016 5:30:18 PM<br><br>oy</code> |
275
  * Loss: <code>pylate.losses.contrastive.Contrastive</code>
276
 
277
  ### Training Hyperparameters
278
  #### Non-Default Hyperparameters
279
 
280
  - `eval_strategy`: epoch
281
+ - `per_device_train_batch_size`: 2
282
+ - `gradient_accumulation_steps`: 32
283
  - `learning_rate`: 3e-06
 
284
  - `warmup_ratio`: 0.1
285
  - `bf16`: True
286
  - `load_best_model_at_end`: True
287
+ - `gradient_checkpointing`: True
288
 
289
  #### All Hyperparameters
290
  <details><summary>Click to expand</summary>
 
293
  - `do_predict`: False
294
  - `eval_strategy`: epoch
295
  - `prediction_loss_only`: True
296
+ - `per_device_train_batch_size`: 2
297
  - `per_device_eval_batch_size`: 8
298
  - `per_gpu_train_batch_size`: None
299
  - `per_gpu_eval_batch_size`: None
300
+ - `gradient_accumulation_steps`: 32
301
  - `eval_accumulation_steps`: None
302
  - `torch_empty_cache_steps`: None
303
  - `learning_rate`: 3e-06
 
306
  - `adam_beta2`: 0.999
307
  - `adam_epsilon`: 1e-08
308
  - `max_grad_norm`: 1.0
309
+ - `num_train_epochs`: 3
310
  - `max_steps`: -1
311
  - `lr_scheduler_type`: linear
312
  - `lr_scheduler_kwargs`: {}
 
375
  - `hub_private_repo`: None
376
  - `hub_always_push`: False
377
  - `hub_revision`: None
378
+ - `gradient_checkpointing`: True
379
  - `gradient_checkpointing_kwargs`: None
380
  - `include_inputs_for_metrics`: False
381
  - `include_for_metrics`: []
 
413
  ### Training Logs
414
  | Epoch | Step | Training Loss | Validation Loss | accuracy |
415
  |:-------:|:-------:|:-------------:|:---------------:|:--------:|
416
+ | 0.1233 | 10 | 3.3759 | - | - |
417
+ | 0.2465 | 20 | 1.1565 | - | - |
418
+ | 0.3698 | 30 | 0.5595 | - | - |
419
+ | 0.4931 | 40 | 0.4625 | - | - |
420
+ | 0.6163 | 50 | 0.3728 | - | - |
421
+ | 0.7396 | 60 | 0.3595 | - | - |
422
+ | 0.8629 | 70 | 0.3632 | - | - |
423
+ | 0.9861 | 80 | 0.3566 | - | - |
424
+ | 0 | 0 | - | - | 0.9636 |
425
+ | 1.0 | 82 | - | 1.0872 | - |
426
+ | 1.0986 | 90 | 0.2931 | - | - |
427
+ | 1.2219 | 100 | 0.2994 | - | - |
428
+ | 1.3451 | 110 | 0.2319 | - | - |
429
+ | 1.4684 | 120 | 0.2172 | - | - |
430
+ | 1.5917 | 130 | 0.273 | - | - |
431
+ | 1.7149 | 140 | 0.2254 | - | - |
432
+ | 1.8382 | 150 | 0.2416 | - | - |
433
+ | 1.9615 | 160 | 0.2538 | - | - |
434
+ | 0 | 0 | - | - | 0.9688 |
435
+ | 2.0 | 164 | - | 0.9975 | - |
436
+ | 2.0740 | 170 | 0.1938 | - | - |
437
+ | 2.1972 | 180 | 0.1639 | - | - |
438
+ | 2.3205 | 190 | 0.2477 | - | - |
439
+ | 2.4438 | 200 | 0.1845 | - | - |
440
+ | 2.5670 | 210 | 0.1397 | - | - |
441
+ | 2.6903 | 220 | 0.2116 | - | - |
442
+ | 2.8136 | 230 | 0.1989 | - | - |
443
+ | 2.9368 | 240 | 0.1563 | - | - |
444
+ | 0 | 0 | - | - | 0.9740 |
445
+ | **3.0** | **246** | **-** | **0.8835** | **-** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
446
 
447
  * The bold row denotes the saved checkpoint.
448
 
config_sentence_transformers.json CHANGED
@@ -12,8 +12,8 @@
12
  "similarity_fn_name": "MaxSim",
13
  "query_prefix": "[Q] ",
14
  "document_prefix": "[D] ",
15
- "query_length": 32,
16
- "document_length": 512,
17
  "attend_to_expansion_tokens": false,
18
  "skiplist_words": [
19
  "!",
 
12
  "similarity_fn_name": "MaxSim",
13
  "query_prefix": "[Q] ",
14
  "document_prefix": "[D] ",
15
+ "query_length": 64,
16
+ "document_length": 8192,
17
  "attend_to_expansion_tokens": false,
18
  "skiplist_words": [
19
  "!",
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:92e4cdf178c48c6a5e7bd1daf3bb2f5d5a624ceec6e9a81bca91830048beafcb
3
  size 1413306600
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27449494d35b56f56ee3bfdd5c83271ccf063cdf1c8e3bc99d952614a8b1af03
3
  size 1413306600
sentence_bert_config.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
- "max_seq_length": 511,
3
  "do_lower_case": false
4
  }
 
1
  {
2
+ "max_seq_length": 8191,
3
  "do_lower_case": false
4
  }
tokenizer.json CHANGED
@@ -2,7 +2,7 @@
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
- "max_length": 511,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
 
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
+ "max_length": 8191,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },