Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -282,6 +282,106 @@ I-literate_third_person_reference 0.714 0.833 0.769 6
|
|
| 282 |
|
| 283 |
</details>
|
| 284 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 285 |
|
| 286 |
**Macro F1 (all 145 labels):** 0.487
|
| 287 |
**Weighted F1:** 0.645
|
|
|
|
| 282 |
|
| 283 |
</details>
|
| 284 |
|
| 285 |
+
<details><summary>Click to show split proportions per marker</summary>
|
| 286 |
+
|
| 287 |
+
```
|
| 288 |
+
bio_train.jsonl: 3460 markers across 72 types
|
| 289 |
+
bio_val.jsonl: 514 markers across 70 types
|
| 290 |
+
bio_test.jsonl: 500 markers across 70 types
|
| 291 |
+
|
| 292 |
+
======================================================================
|
| 293 |
+
Marker Train Val Test Total
|
| 294 |
+
======================================================================
|
| 295 |
+
oral_inclusive_we 207 26 29 262
|
| 296 |
+
oral_second_person 160 25 25 210
|
| 297 |
+
literate_agentless_passive 158 22 24 204
|
| 298 |
+
oral_named_individual 157 26 20 203
|
| 299 |
+
literate_relative_chain 146 8 22 176
|
| 300 |
+
literate_epistemic_hedge 125 23 24 172
|
| 301 |
+
oral_vocative 118 17 27 162
|
| 302 |
+
oral_rhetorical_question 132 16 2 150
|
| 303 |
+
oral_anaphora 115 10 15 140
|
| 304 |
+
oral_imperative 104 16 14 134
|
| 305 |
+
literate_nested_clauses 103 4 22 129
|
| 306 |
+
literate_abstract_noun 95 20 14 129
|
| 307 |
+
oral_discourse_formula 93 15 6 114
|
| 308 |
+
literate_conditional 85 10 14 109
|
| 309 |
+
oral_specific_place 81 22 3 106
|
| 310 |
+
literate_contrastive 65 11 8 84
|
| 311 |
+
literate_causal_explicit 69 3 11 83
|
| 312 |
+
oral_temporal_anchor 66 14 3 83
|
| 313 |
+
oral_parallelism 66 10 7 83
|
| 314 |
+
oral_lexical_repetition 48 12 10 70
|
| 315 |
+
literate_technical_term 56 8 3 67
|
| 316 |
+
literate_aside 51 6 9 66
|
| 317 |
+
literate_nominalization 44 3 10 57
|
| 318 |
+
oral_tricolon 43 8 2 53
|
| 319 |
+
literate_concessive 37 6 2 45
|
| 320 |
+
oral_epithet 36 5 2 43
|
| 321 |
+
literate_additive_formal 29 4 3 36
|
| 322 |
+
oral_polysyndeton 15 10 10 35
|
| 323 |
+
literate_list_structure 28 5 1 34
|
| 324 |
+
oral_embodied_action 19 6 6 31
|
| 325 |
+
literate_metadiscourse 22 5 4 31
|
| 326 |
+
oral_binomial_expression 23 3 5 31
|
| 327 |
+
oral_alliteration 23 5 3 31
|
| 328 |
+
literate_causal_chain 22 5 3 30
|
| 329 |
+
oral_epistrophe 23 4 3 30
|
| 330 |
+
oral_refrain 25 4 1 30
|
| 331 |
+
oral_audience_response 25 1 4 30
|
| 332 |
+
oral_self_correction 23 4 3 30
|
| 333 |
+
literate_methodological_framing 21 5 4 30
|
| 334 |
+
oral_rhythm 21 3 6 30
|
| 335 |
+
oral_conflict_frame 24 1 5 30
|
| 336 |
+
literate_footnote_reference 25 2 3 30
|
| 337 |
+
literate_definitional_move 25 4 1 30
|
| 338 |
+
literate_evidential 13 6 11 30
|
| 339 |
+
oral_phatic_filler 24 1 5 30
|
| 340 |
+
oral_phatic_check 25 4 1 30
|
| 341 |
+
literate_agent_demoted 21 5 4 30
|
| 342 |
+
literate_enumeration 24 3 3 30
|
| 343 |
+
literate_conceptual_metaphor 21 3 6 30
|
| 344 |
+
oral_everyday_example 22 5 3 30
|
| 345 |
+
oral_us_them 24 3 3 30
|
| 346 |
+
oral_intensifier_doubling 25 2 3 30
|
| 347 |
+
literate_institutional_subject 22 4 3 29
|
| 348 |
+
literate_temporal_embedding 23 2 4 29
|
| 349 |
+
literate_concessive_connector 22 2 5 29
|
| 350 |
+
literate_third_person_reference 21 5 3 29
|
| 351 |
+
literate_probability 21 3 5 29
|
| 352 |
+
literate_citation 12 7 10 29
|
| 353 |
+
oral_religious_formula 24 3 2 29
|
| 354 |
+
literate_technical_abbreviation 24 3 2 29
|
| 355 |
+
literate_qualified_assertion 23 1 5 29
|
| 356 |
+
literate_categorical_statement 24 1 4 29
|
| 357 |
+
oral_first_person 22 2 5 29
|
| 358 |
+
oral_simple_conjunction 21 5 3 29
|
| 359 |
+
literate_paradox 18 7 3 28
|
| 360 |
+
oral_proverb 22 0 6 28 ⚠️
|
| 361 |
+
literate_objectifying_stance 21 3 4 28
|
| 362 |
+
oral_asyndeton 24 3 1 28
|
| 363 |
+
oral_sensory_detail 21 5 1 27
|
| 364 |
+
oral_dramatic_pause 20 4 2 26
|
| 365 |
+
literate_cross_reference 21 5 0 26 ⚠️
|
| 366 |
+
oral_paradox 2 0 0 2 ⚠️
|
| 367 |
+
======================================================================
|
| 368 |
+
TOTAL 3460 514 500 4474
|
| 369 |
+
|
| 370 |
+
--- Long Tail Summary ---
|
| 371 |
+
|
| 372 |
+
Markers with < 10 examples: 1 (1%)
|
| 373 |
+
Markers with < 20 examples: 1 (1%)
|
| 374 |
+
Markers with < 30 examples: 20 (28%)
|
| 375 |
+
Markers with < 50 examples: 48 (67%)
|
| 376 |
+
Markers with <100 examples: 57 (79%)
|
| 377 |
+
```
|
| 378 |
+
|
| 379 |
+
- **Note**: ⚠️ indicates a 0 sized split
|
| 380 |
+
- `oral_proverb`: 0 val split
|
| 381 |
+
- `literate_cross_reference`: 0 test split
|
| 382 |
+
- `oral_paradox`: 0 val/test splits
|
| 383 |
+
|
| 384 |
+
</details>
|
| 385 |
|
| 386 |
**Macro F1 (all 145 labels):** 0.487
|
| 387 |
**Weighted F1:** 0.645
|