File size: 9,779 Bytes
052269f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Running loglikelihood requests: 100%|█████████████████████████████████████████████████████████████| 130186/130186 [1:01:21<00:00, 35.36it/s]
hf-mlm (pretrained=../../checkpoints/electra-tiny/,backend=mlm), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|                                Tasks                                |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|blimp_supplement                                                     |N/A    |none  |     0|acc   |0.4979|±  |0.0068|
| - blimp_supplement_hypernym                                         |      1|none  |     0|acc   |0.5238|±  |0.0172|
| - blimp_supplement_qa_congruence_easy                               |      1|none  |     0|acc   |0.4531|±  |0.0627|
| - blimp_supplement_qa_congruence_tricky                             |      1|none  |     0|acc   |0.5818|±  |0.0385|
| - blimp_supplement_subject_aux_inversion                            |      1|none  |     0|acc   |0.3949|±  |0.0079|
| - blimp_supplement_turn_taking                                      |      1|none  |     0|acc   |0.5357|±  |0.0299|
|blimp_filtered                                                       |N/A    |none  |     0|acc   |0.5065|±  |0.0019|
| - blimp_adjunct_island_filtered                                     |      1|none  |     0|acc   |0.4321|±  |0.0163|
| - blimp_anaphor_gender_agreement_filtered                           |      1|none  |     0|acc   |0.4830|±  |0.0160|
| - blimp_anaphor_number_agreement_filtered                           |      1|none  |     0|acc   |0.6423|±  |0.0157|
| - blimp_animate_subject_passive_filtered                            |      1|none  |     0|acc   |0.5140|±  |0.0167|
| - blimp_animate_subject_trans_filtered                              |      1|none  |     0|acc   |0.1603|±  |0.0121|
| - blimp_causative_filtered                                          |      1|none  |     0|acc   |0.4315|±  |0.0173|
| - blimp_complex_NP_island_filtered                                  |      1|none  |     0|acc   |0.5130|±  |0.0172|
| - blimp_coordinate_structure_constraint_complex_left_branch_filtered|      1|none  |     0|acc   |0.4570|±  |0.0166|
| - blimp_coordinate_structure_constraint_object_extraction_filtered  |      1|none  |     0|acc   |0.4826|±  |0.0162|
| - blimp_determiner_noun_agreement_1_filtered                        |      1|none  |     0|acc   |0.4930|±  |0.0164|
| - blimp_determiner_noun_agreement_2_filtered                        |      1|none  |     0|acc   |0.4952|±  |0.0164|
| - blimp_determiner_noun_agreement_irregular_1_filtered              |      1|none  |     0|acc   |0.5477|±  |0.0191|
| - blimp_determiner_noun_agreement_irregular_2_filtered              |      1|none  |     0|acc   |0.4878|±  |0.0175|
| - blimp_determiner_noun_agreement_with_adj_2_filtered               |      1|none  |     0|acc   |0.4708|±  |0.0163|
| - blimp_determiner_noun_agreement_with_adj_irregular_1_filtered     |      1|none  |     0|acc   |0.5432|±  |0.0186|
| - blimp_determiner_noun_agreement_with_adj_irregular_2_filtered     |      1|none  |     0|acc   |0.5976|±  |0.0169|
| - blimp_determiner_noun_agreement_with_adjective_1_filtered         |      1|none  |     0|acc   |0.4845|±  |0.0164|
| - blimp_distractor_agreement_relational_noun_filtered               |      1|none  |     0|acc   |0.5127|±  |0.0178|
| - blimp_distractor_agreement_relative_clause_filtered               |      1|none  |     0|acc   |0.4914|±  |0.0169|
| - blimp_drop_argument_filtered                                      |      1|none  |     0|acc   |0.5109|±  |0.0165|
| - blimp_ellipsis_n_bar_1_filtered                                   |      1|none  |     0|acc   |0.4539|±  |0.0176|
| - blimp_ellipsis_n_bar_2_filtered                                   |      1|none  |     0|acc   |0.3986|±  |0.0170|
| - blimp_existential_there_object_raising_filtered                   |      1|none  |     0|acc   |0.5456|±  |0.0175|
| - blimp_existential_there_quantifiers_1_filtered                    |      1|none  |     0|acc   |0.4140|±  |0.0162|
| - blimp_existential_there_quantifiers_2_filtered                    |      1|none  |     0|acc   |0.4643|±  |0.0165|
| - blimp_existential_there_subject_raising_filtered                  |      1|none  |     0|acc   |0.5855|±  |0.0162|
| - blimp_expletive_it_object_raising_filtered                        |      1|none  |     0|acc   |0.4822|±  |0.0181|
| - blimp_inchoative_filtered                                         |      1|none  |     0|acc   |0.5287|±  |0.0171|
| - blimp_intransitive_filtered                                       |      1|none  |     0|acc   |0.4689|±  |0.0169|
| - blimp_irregular_past_participle_adjectives_filtered               |      1|none  |     0|acc   |0.2799|±  |0.0145|
| - blimp_irregular_past_participle_verbs_filtered                    |      1|none  |     0|acc   |0.5255|±  |0.0163|
| - blimp_irregular_plural_subject_verb_agreement_1_filtered          |      1|none  |     0|acc   |0.5100|±  |0.0176|
| - blimp_irregular_plural_subject_verb_agreement_2_filtered          |      1|none  |     0|acc   |0.5415|±  |0.0167|
| - blimp_left_branch_island_echo_question_filtered                   |      1|none  |     0|acc   |0.3939|±  |0.0159|
| - blimp_left_branch_island_simple_question_filtered                 |      1|none  |     0|acc   |0.5037|±  |0.0162|
| - blimp_matrix_question_npi_licensor_present_filtered               |      1|none  |     0|acc   |0.0000|±  |0.0000|
| - blimp_npi_present_1_filtered                                      |      1|none  |     0|acc   |0.4356|±  |0.0165|
| - blimp_npi_present_2_filtered                                      |      1|none  |     0|acc   |0.3950|±  |0.0162|
| - blimp_only_npi_licensor_present_filtered                          |      1|none  |     0|acc   |0.4739|±  |0.0168|
| - blimp_only_npi_scope_filtered                                     |      1|none  |     0|acc   |0.1493|±  |0.0123|
| - blimp_passive_1_filtered                                          |      1|none  |     0|acc   |0.5024|±  |0.0173|
| - blimp_passive_2_filtered                                          |      1|none  |     0|acc   |0.5017|±  |0.0166|
| - blimp_principle_A_c_command_filtered                              |      1|none  |     0|acc   |0.3446|±  |0.0155|
| - blimp_principle_A_case_1_filtered                                 |      1|none  |     0|acc   |0.3827|±  |0.0161|
| - blimp_principle_A_case_2_filtered                                 |      1|none  |     0|acc   |0.5148|±  |0.0165|
| - blimp_principle_A_domain_1_filtered                               |      1|none  |     0|acc   |0.3698|±  |0.0160|
| - blimp_principle_A_domain_2_filtered                               |      1|none  |     0|acc   |0.5213|±  |0.0165|
| - blimp_principle_A_domain_3_filtered                               |      1|none  |     0|acc   |0.4899|±  |0.0163|
| - blimp_principle_A_reconstruction_filtered                         |      1|none  |     0|acc   |0.3909|±  |0.0157|
| - blimp_regular_plural_subject_verb_agreement_1_filtered            |      1|none  |     0|acc   |0.5180|±  |0.0168|
| - blimp_regular_plural_subject_verb_agreement_2_filtered            |      1|none  |     0|acc   |0.5196|±  |0.0163|
| - blimp_sentential_negation_npi_licensor_present_filtered           |      1|none  |     0|acc   |1.0000|±  |0.0000|
| - blimp_sentential_negation_npi_scope_filtered                      |      1|none  |     0|acc   |0.5281|±  |0.0169|
| - blimp_sentential_subject_island_filtered                          |      1|none  |     0|acc   |0.4755|±  |0.0161|
| - blimp_superlative_quantifiers_1_filtered                          |      1|none  |     0|acc   |0.6016|±  |0.0157|
| - blimp_superlative_quantifiers_2_filtered                          |      1|none  |     0|acc   |0.5213|±  |0.0159|
| - blimp_tough_vs_raising_1_filtered                                 |      1|none  |     0|acc   |0.4757|±  |0.0162|
| - blimp_tough_vs_raising_2_filtered                                 |      1|none  |     0|acc   |0.4913|±  |0.0165|
| - blimp_transitive_filtered                                         |      1|none  |     0|acc   |0.5253|±  |0.0170|
| - blimp_wh_island_filtered                                          |      1|none  |     0|acc   |0.4896|±  |0.0161|
| - blimp_wh_questions_object_gap_filtered                            |      1|none  |     0|acc   |0.9569|±  |0.0069|
| - blimp_wh_questions_subject_gap_filtered                           |      1|none  |     0|acc   |0.9944|±  |0.0025|
| - blimp_wh_questions_subject_gap_long_distance_filtered             |      1|none  |     0|acc   |0.9673|±  |0.0061|
| - blimp_wh_vs_that_no_gap_filtered                                  |      1|none  |     0|acc   |1.0000|±  |0.0000|
| - blimp_wh_vs_that_no_gap_long_distance_filtered                    |      1|none  |     0|acc   |0.9954|±  |0.0023|
| - blimp_wh_vs_that_with_gap_filtered                                |      1|none  |     0|acc   |0.5484|±  |0.0164|
| - blimp_wh_vs_that_with_gap_long_distance_filtered                  |      1|none  |     0|acc   |0.0088|±  |0.0031|

|     Groups     |Version|Filter|n-shot|Metric|Value |   |Stderr|
|----------------|-------|------|-----:|------|-----:|---|-----:|
|blimp_supplement|N/A    |none  |     0|acc   |0.4979|±  |0.0068|
|blimp_filtered  |N/A    |none  |     0|acc   |0.5065|±  |0.0019|