permutans commited on
Commit
f6fe748
·
verified ·
1 Parent(s): 3775141

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md CHANGED
@@ -282,6 +282,106 @@ I-literate_third_person_reference 0.714 0.833 0.769 6
282
 
283
  </details>
284
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
285
 
286
  **Macro F1 (all 145 labels):** 0.487
287
  **Weighted F1:** 0.645
 
282
 
283
  </details>
284
 
285
+ <details><summary>Click to show split proportions per marker</summary>
286
+
287
+ ```
288
+ bio_train.jsonl: 3460 markers across 72 types
289
+ bio_val.jsonl: 514 markers across 70 types
290
+ bio_test.jsonl: 500 markers across 70 types
291
+
292
+ ======================================================================
293
+ Marker Train Val Test Total
294
+ ======================================================================
295
+ oral_inclusive_we 207 26 29 262
296
+ oral_second_person 160 25 25 210
297
+ literate_agentless_passive 158 22 24 204
298
+ oral_named_individual 157 26 20 203
299
+ literate_relative_chain 146 8 22 176
300
+ literate_epistemic_hedge 125 23 24 172
301
+ oral_vocative 118 17 27 162
302
+ oral_rhetorical_question 132 16 2 150
303
+ oral_anaphora 115 10 15 140
304
+ oral_imperative 104 16 14 134
305
+ literate_nested_clauses 103 4 22 129
306
+ literate_abstract_noun 95 20 14 129
307
+ oral_discourse_formula 93 15 6 114
308
+ literate_conditional 85 10 14 109
309
+ oral_specific_place 81 22 3 106
310
+ literate_contrastive 65 11 8 84
311
+ literate_causal_explicit 69 3 11 83
312
+ oral_temporal_anchor 66 14 3 83
313
+ oral_parallelism 66 10 7 83
314
+ oral_lexical_repetition 48 12 10 70
315
+ literate_technical_term 56 8 3 67
316
+ literate_aside 51 6 9 66
317
+ literate_nominalization 44 3 10 57
318
+ oral_tricolon 43 8 2 53
319
+ literate_concessive 37 6 2 45
320
+ oral_epithet 36 5 2 43
321
+ literate_additive_formal 29 4 3 36
322
+ oral_polysyndeton 15 10 10 35
323
+ literate_list_structure 28 5 1 34
324
+ oral_embodied_action 19 6 6 31
325
+ literate_metadiscourse 22 5 4 31
326
+ oral_binomial_expression 23 3 5 31
327
+ oral_alliteration 23 5 3 31
328
+ literate_causal_chain 22 5 3 30
329
+ oral_epistrophe 23 4 3 30
330
+ oral_refrain 25 4 1 30
331
+ oral_audience_response 25 1 4 30
332
+ oral_self_correction 23 4 3 30
333
+ literate_methodological_framing 21 5 4 30
334
+ oral_rhythm 21 3 6 30
335
+ oral_conflict_frame 24 1 5 30
336
+ literate_footnote_reference 25 2 3 30
337
+ literate_definitional_move 25 4 1 30
338
+ literate_evidential 13 6 11 30
339
+ oral_phatic_filler 24 1 5 30
340
+ oral_phatic_check 25 4 1 30
341
+ literate_agent_demoted 21 5 4 30
342
+ literate_enumeration 24 3 3 30
343
+ literate_conceptual_metaphor 21 3 6 30
344
+ oral_everyday_example 22 5 3 30
345
+ oral_us_them 24 3 3 30
346
+ oral_intensifier_doubling 25 2 3 30
347
+ literate_institutional_subject 22 4 3 29
348
+ literate_temporal_embedding 23 2 4 29
349
+ literate_concessive_connector 22 2 5 29
350
+ literate_third_person_reference 21 5 3 29
351
+ literate_probability 21 3 5 29
352
+ literate_citation 12 7 10 29
353
+ oral_religious_formula 24 3 2 29
354
+ literate_technical_abbreviation 24 3 2 29
355
+ literate_qualified_assertion 23 1 5 29
356
+ literate_categorical_statement 24 1 4 29
357
+ oral_first_person 22 2 5 29
358
+ oral_simple_conjunction 21 5 3 29
359
+ literate_paradox 18 7 3 28
360
+ oral_proverb 22 0 6 28 ⚠️
361
+ literate_objectifying_stance 21 3 4 28
362
+ oral_asyndeton 24 3 1 28
363
+ oral_sensory_detail 21 5 1 27
364
+ oral_dramatic_pause 20 4 2 26
365
+ literate_cross_reference 21 5 0 26 ⚠️
366
+ oral_paradox 2 0 0 2 ⚠️
367
+ ======================================================================
368
+ TOTAL 3460 514 500 4474
369
+
370
+ --- Long Tail Summary ---
371
+
372
+ Markers with < 10 examples: 1 (1%)
373
+ Markers with < 20 examples: 1 (1%)
374
+ Markers with < 30 examples: 20 (28%)
375
+ Markers with < 50 examples: 48 (67%)
376
+ Markers with <100 examples: 57 (79%)
377
+ ```
378
+
379
+ - **Note**: ⚠️ indicates a 0 sized split
380
+ - `oral_proverb`: 0 val split
381
+ - `literate_cross_reference`: 0 test split
382
+ - `oral_paradox`: 0 val/test splits
383
+
384
+ </details>
385
 
386
  **Macro F1 (all 145 labels):** 0.487
387
  **Weighted F1:** 0.645