evaluation, acknowledgement
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ inference:
|
|
| 14 |
max_new_tokens: 100
|
| 15 |
---
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
Example input text:
|
| 20 |
|
|
@@ -24,4 +24,23 @@ Example output text:
|
|
| 24 |
|
| 25 |
<code>{{WD|label|Anschiringer, Anton}}, {{WD|<a href="https://www.wikidata.org/entity/P106">P106</a>|<a href="https://www.wikidata.org/entity/Q6051619">Q6051619</a>|Publizist}}, * {{WD|<a href="https://www.wikidata.org/entity/P569">P569</a>|1812}} {{WD|<a href="https://www.wikidata.org/entity/P19">P19</a>|<a href="https://www.wikidata.org/entity/Q1741">Q1741</a>|Wien}}, † {{WD|<a href="https://www.wikidata.org/entity/P570">P570</a>|1873-12-17|17. 12. 1873}} {{WD|<a href="https://www.wikidata.org/entity/P20">P20</a>|<a href="https://www.wikidata.org/entity/Q146351">Q146351</a>|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...</code>
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
Computational resources were provided by the <a href="https://www.e-infra.cz/">e-INFRA CZ project</a> (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
|
|
|
|
| 14 |
max_new_tokens: 100
|
| 15 |
---
|
| 16 |
|
| 17 |
+
A model for annotating entries in biographical dictionaries using Wikidata entities. Based on <a href="https://huggingface.co/google/mt5-small">Google's mT5</a>.
|
| 18 |
|
| 19 |
Example input text:
|
| 20 |
|
|
|
|
| 24 |
|
| 25 |
<code>{{WD|label|Anschiringer, Anton}}, {{WD|<a href="https://www.wikidata.org/entity/P106">P106</a>|<a href="https://www.wikidata.org/entity/Q6051619">Q6051619</a>|Publizist}}, * {{WD|<a href="https://www.wikidata.org/entity/P569">P569</a>|1812}} {{WD|<a href="https://www.wikidata.org/entity/P19">P19</a>|<a href="https://www.wikidata.org/entity/Q1741">Q1741</a>|Wien}}, † {{WD|<a href="https://www.wikidata.org/entity/P570">P570</a>|1873-12-17|17. 12. 1873}} {{WD|<a href="https://www.wikidata.org/entity/P20">P20</a>|<a href="https://www.wikidata.org/entity/Q146351">Q146351</a>|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...</code>
|
| 26 |
|
| 27 |
+
<h2>Evaluation</h2>
|
| 28 |
+
|
| 29 |
+
After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of **0.3878** for this model.
|
| 30 |
+
|
| 31 |
+
More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II.
|
| 32 |
+
|
| 33 |
+
| | Basic statements | Qualifier statements | Total |
|
| 34 |
+
|-|------------------|----------------------|-------|
|
| 35 |
+
| Ground truth | 1,209 | 572 | 1,781 |
|
| 36 |
+
| Valid statements by the model | 714 | 120 | 834 |
|
| 37 |
+
| Accuracy | 0.5906 | 0.2098 | 0.4683 |
|
| 38 |
+
| **Loss** | **0.4094** | 0.7902 | **0.5317** |
|
| 39 |
+
|
| 40 |
+
In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements.
|
| 41 |
+
|
| 42 |
+
<h2>Acknowledgement</h2>
|
| 43 |
+
|
| 44 |
+
The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the <a href="https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_Fund/Wikimedia_versus_traditional_biographical_encyclopedias._Overlaps,_gaps,_quality_and_future_possibilities">Wikimedia Research Fund</a>.
|
| 45 |
+
|
| 46 |
Computational resources were provided by the <a href="https://www.e-infra.cz/">e-INFRA CZ project</a> (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
|