daelba commited on
Commit
82af289
·
verified ·
1 Parent(s): 2ace45f

evaluation, acknowledgement

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -14,7 +14,7 @@ inference:
14
  max_new_tokens: 100
15
  ---
16
 
17
- A model for annotating entries in biographical dictionaries using Wikidata entities. Based on <a href="https://huggingface.co/google/mt5-small">Google's mT5</a>.
18
 
19
  Example input text:
20
 
@@ -24,4 +24,23 @@ Example output text:
24
 
25
  <code>{{WD|label|Anschiringer, Anton}}, {{WD|<a href="https://www.wikidata.org/entity/P106">P106</a>|<a href="https://www.wikidata.org/entity/Q6051619">Q6051619</a>|Publizist}}, * {{WD|<a href="https://www.wikidata.org/entity/P569">P569</a>|1812}} {{WD|<a href="https://www.wikidata.org/entity/P19">P19</a>|<a href="https://www.wikidata.org/entity/Q1741">Q1741</a>|Wien}}, † {{WD|<a href="https://www.wikidata.org/entity/P570">P570</a>|1873-12-17|17. 12. 1873}} {{WD|<a href="https://www.wikidata.org/entity/P20">P20</a>|<a href="https://www.wikidata.org/entity/Q146351">Q146351</a>|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...</code>
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  Computational resources were provided by the <a href="https://www.e-infra.cz/">e-INFRA CZ project</a> (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
 
14
  max_new_tokens: 100
15
  ---
16
 
17
+ A model for annotating entries in biographical dictionaries using Wikidata entities. Based on <a href="https://huggingface.co/google/mt5-small">Google's mT5</a>.
18
 
19
  Example input text:
20
 
 
24
 
25
  <code>{{WD|label|Anschiringer, Anton}}, {{WD|<a href="https://www.wikidata.org/entity/P106">P106</a>|<a href="https://www.wikidata.org/entity/Q6051619">Q6051619</a>|Publizist}}, * {{WD|<a href="https://www.wikidata.org/entity/P569">P569</a>|1812}} {{WD|<a href="https://www.wikidata.org/entity/P19">P19</a>|<a href="https://www.wikidata.org/entity/Q1741">Q1741</a>|Wien}}, † {{WD|<a href="https://www.wikidata.org/entity/P570">P570</a>|1873-12-17|17. 12. 1873}} {{WD|<a href="https://www.wikidata.org/entity/P20">P20</a>|<a href="https://www.wikidata.org/entity/Q146351">Q146351</a>|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...</code>
26
 
27
+ <h2>Evaluation</h2>
28
+
29
+ After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of **0.3878** for this model.
30
+
31
+ More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II.
32
+
33
+ | | Basic statements | Qualifier statements | Total |
34
+ |-|------------------|----------------------|-------|
35
+ | Ground truth | 1,209 | 572 | 1,781 |
36
+ | Valid statements by the model | 714 | 120 | 834 |
37
+ | Accuracy | 0.5906 | 0.2098 | 0.4683 |
38
+ | **Loss** | **0.4094** | 0.7902 | **0.5317** |
39
+
40
+ In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements.
41
+
42
+ <h2>Acknowledgement</h2>
43
+
44
+ The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the <a href="https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_Fund/Wikimedia_versus_traditional_biographical_encyclopedias._Overlaps,_gaps,_quality_and_future_possibilities">Wikimedia Research Fund</a>.
45
+
46
  Computational resources were provided by the <a href="https://www.e-infra.cz/">e-INFRA CZ project</a> (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.