File size: 3,384 Bytes
7a0e7b6
 
 
 
 
 
 
0cac774
7a0e7b6
 
7af5112
71e2845
1cc5038
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7af5112
 
71e2845
7af5112
 
0cac774
7af5112
 
 
 
 
 
71e2845
1555ad5
7af5112
 
 
 
 
 
 
 
 
 
 
 
 
 
71e2845
1555ad5
7af5112
71e2845
7af5112
 
 
 
 
 
 
 
71e2845
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
language: 
  - en
tags:
- classics
- citation mining
widget:
- text: "Homer's Iliad opens with an invocation to the muse (1. 1)."
---

### Model and entities
`roberta_classics_ner` is a domain-specific RoBERTa-based model for named entity recognition in Classical Studies. It recognises bibliographical entities, such as:


| id  | label         | desciption                                  | Example               |
| --- | ------------- | ------------------------------------------- | --------------------- |
| 0   | 'O'           | Out of entity                               |                       | 
| 1   | 'B-AAUTHOR'   | Ancient authors                             | *Herodotus*           |
| 2   | 'I-AAUTHOR'   |                                             |                       |
| 3   | 'B-AWORK'     | The title of an ancient work                | *Symposium*, *Aeneid* |
| 4   | 'I-AWORK'     |                                             |                       |
| 5   | 'B-REFAUWORK' | A structured reference to an ancient work   | *Homer, Il.*          |
| 6   | 'I-REFAUWORK' |                                             |                       |
| 7   | 'B-REFSCOPE'  | The scope of a reference                    | *II.1.993a30–b11*     |
| 8   | 'I-REFSCOPE'  |                                             |                       |
| 9   | 'B-FRAGREF'   | A reference to fragmentary texts or scholia | *Frag. 19. West*      |
| 10  | 'I-FRAGREF'   |                                             |                       |


### Example


```
B-AAUTHOR   B-AWORK                                      B-REFSCOPE
Homer  's   Iliad opens with an invocation to the muse ( 1. 1).
```



### Dataset


`roberta_classics_ner` was fine-tuned and evaluated on `EpiBau`, a dataset which has not been released publicly yet. It is composed of four volumes of [Structures of Epic Poetry](https://www.epische-bauformen.uni-rostock.de/), a compendium on the narrative patterns and structural elements in ancient epic. 
Entity counts of the `Epibau` dataset are the following: 

|                | train-set | dev-set | test-set |
| -------------- | --------- | ------- | -------- |
| word count     | 712462    | 125729  | 122324   |
| AAUTHOR        | 4436      | 1368    | 1511     |
| AWORK          | 3145      | 780     | 670      |
| REFAUWORK      | 5102      | 988     | 1209     |
| REFSCOPE       | 14768     | 3193    | 2847     |
| FRAGREF        | 266       | 29      | 33       |
| total entities | 13822     | 1415    | 2419     |


### Results

The model was developed in the context of experiments reported [here](http://infoscience.epfl.ch/record/291236?&ln=en).Trained and tested on `EpiBau` with a 85-15 split, the model yields a general F1 score of **.82** (micro-averages). Detailed scores are displayed below. Evaluation was performed with the [CLEF-HIPE-scorer](https://github.com/impresso/CLEF-HIPE-2020-scorer), in strict mode) 


| metric    | AAUTHOR | AWORK | REFSCOPE | REFAUWORK |
| --------- | ------- | ----- | -------- | --------- |
| F1        | .819    | .796  | .863     | .756      |
| Precision | .842    | .818  | .860     | .755      |
| Recall    | .797    | .766  | .756     | .866      | 




Questions, remarks, help or contribution ? Get in touch [here](https://github.com/AjaxMultiCommentary), we'll be happy to chat !