File size: 2,599 Bytes
b7a6de9 1c2f1a0 2ace45f 1c2f1a0 2b09258 739cc90 1c2f1a0 556f139 42edd64 70c8d8f 82af289 70c8d8f 06d1935 82af289 06d1935 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: cc-by-nc-sa-4.0
language:
- cs
- de
pipeline_tag: translation
tags:
- code
- wikidata
- biography
- semantic_annotation
- entity_linking
base_model: google/mt5-small
inference:
parameters:
max_new_tokens: 100
---
A model for annotating entries in biographical dictionaries using Wikidata entities. Based on <a href="https://huggingface.co/google/mt5-small">Google's mT5</a>.
Example input text:
<code>Anschiringer, Anton, Publizist, * 1812 Wien, † 17. 12. 1873 Reichenberg (Liberec). Erzieher im Hause des Großindustriellen...</code>
Example output text:
<code>{{WD|label|Anschiringer, Anton}}, {{WD|<a href="https://www.wikidata.org/entity/P106">P106</a>|<a href="https://www.wikidata.org/entity/Q6051619">Q6051619</a>|Publizist}}, * {{WD|<a href="https://www.wikidata.org/entity/P569">P569</a>|1812}} {{WD|<a href="https://www.wikidata.org/entity/P19">P19</a>|<a href="https://www.wikidata.org/entity/Q1741">Q1741</a>|Wien}}, † {{WD|<a href="https://www.wikidata.org/entity/P570">P570</a>|1873-12-17|17. 12. 1873}} {{WD|<a href="https://www.wikidata.org/entity/P20">P20</a>|<a href="https://www.wikidata.org/entity/Q146351">Q146351</a>|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...</code>
<h2>Evaluation</h2>
After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of **0.3878** for this model.
More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II.
| | Basic statements | Qualifier statements | Total |
|-|------------------|----------------------|-------|
| Ground truth | 1,209 | 572 | 1,781 |
| Valid statements by the model | 714 | 120 | 834 |
| Accuracy | 0.5906 | 0.2098 | 0.4683 |
| **Loss** | **0.4094** | 0.7902 | **0.5317** |
In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements.
<h2>Acknowledgement</h2>
The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the <a href="https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_Fund/Wikimedia_versus_traditional_biographical_encyclopedias._Overlaps,_gaps,_quality_and_future_possibilities">Wikimedia Research Fund</a>.
Computational resources were provided by the <a href="https://www.e-infra.cz/">e-INFRA CZ project</a> (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic. |