|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
language: |
|
|
- cs |
|
|
- de |
|
|
pipeline_tag: translation |
|
|
tags: |
|
|
- code |
|
|
- wikidata |
|
|
- biography |
|
|
- semantic_annotation |
|
|
- entity_linking |
|
|
base_model: google/mt5-small |
|
|
inference: |
|
|
parameters: |
|
|
max_new_tokens: 100 |
|
|
--- |
|
|
|
|
|
A model for annotating entries in biographical dictionaries using Wikidata entities. Based on <a href="https://huggingface.co/google/mt5-small">Google's mT5</a>. |
|
|
|
|
|
Example input text: |
|
|
|
|
|
<code>Anschiringer, Anton, Publizist, * 1812 Wien, † 17. 12. 1873 Reichenberg (Liberec). Erzieher im Hause des Großindustriellen...</code> |
|
|
|
|
|
Example output text: |
|
|
|
|
|
<code>{{WD|label|Anschiringer, Anton}}, {{WD|<a href="https://www.wikidata.org/entity/P106">P106</a>|<a href="https://www.wikidata.org/entity/Q6051619">Q6051619</a>|Publizist}}, * {{WD|<a href="https://www.wikidata.org/entity/P569">P569</a>|1812}} {{WD|<a href="https://www.wikidata.org/entity/P19">P19</a>|<a href="https://www.wikidata.org/entity/Q1741">Q1741</a>|Wien}}, † {{WD|<a href="https://www.wikidata.org/entity/P570">P570</a>|1873-12-17|17. 12. 1873}} {{WD|<a href="https://www.wikidata.org/entity/P20">P20</a>|<a href="https://www.wikidata.org/entity/Q146351">Q146351</a>|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...</code> |
|
|
|
|
|
<h2>Evaluation</h2> |
|
|
|
|
|
After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of **0.3878** for this model. |
|
|
|
|
|
More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II. |
|
|
|
|
|
| | Basic statements | Qualifier statements | Total | |
|
|
|-|------------------|----------------------|-------| |
|
|
| Ground truth | 1,209 | 572 | 1,781 | |
|
|
| Valid statements by the model | 714 | 120 | 834 | |
|
|
| Accuracy | 0.5906 | 0.2098 | 0.4683 | |
|
|
| **Loss** | **0.4094** | 0.7902 | **0.5317** | |
|
|
|
|
|
In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements. |
|
|
|
|
|
<h2>Acknowledgement</h2> |
|
|
|
|
|
The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the <a href="https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_Fund/Wikimedia_versus_traditional_biographical_encyclopedias._Overlaps,_gaps,_quality_and_future_possibilities">Wikimedia Research Fund</a>. |
|
|
|
|
|
Computational resources were provided by the <a href="https://www.e-infra.cz/">e-INFRA CZ project</a> (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic. |