Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,9 @@ base_model:
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
widget:
|
| 9 |
- text: >-
|
| 10 |
-
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
|
|
@@ -19,7 +21,7 @@ widget:
|
|
| 19 |
|
| 20 |
This model is designed to classify geographic encyclopedia articles describing places.
|
| 21 |
It is a fine-tuned version of the bert-base-multilingual-cased model.
|
| 22 |
-
It has been trained on a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).
|
| 23 |
|
| 24 |
|
| 25 |
|
|
@@ -38,37 +40,37 @@ It has been trained on a manually annotated subset of the French *Encyclopédie
|
|
| 38 |
## Class labels
|
| 39 |
|
| 40 |
|
| 41 |
-
The tagset is as follows:
|
| 42 |
-
- **
|
| 43 |
-
-
|
| 44 |
-
- **
|
| 45 |
-
- **
|
| 46 |
-
- **
|
| 47 |
-
- **
|
| 48 |
-
- **
|
| 49 |
-
- **
|
| 50 |
-
- **
|
| 51 |
-
- **
|
| 52 |
|
| 53 |
|
| 54 |
## Dataset
|
| 55 |
|
| 56 |
-
|
| 57 |
-
The
|
| 58 |
-
The datasets have the following distribution of entries among datasets and classes:
|
| 59 |
|
| 60 |
| | Train | Validation | Test|
|
| 61 |
|---|:---:|:---:|:---:|
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
|
|
| 67 |
-
|
|
| 68 |
-
|
|
| 69 |
-
|
|
| 70 |
-
|
|
| 71 |
-
|
|
|
|
|
| 72 |
|
| 73 |
|
| 74 |
## Evaluation
|
|
@@ -78,30 +80,30 @@ The datasets have the following distribution of entries among datasets and class
|
|
| 78 |
|
| 79 |
| Precision | Recall | F-score |
|
| 80 |
|:---:|:---:|:---:|
|
| 81 |
-
|0.
|
| 82 |
|
| 83 |
|
| 84 |
* Overall weighted-average model performances
|
| 85 |
|
| 86 |
| Precision | Recall | F-score |
|
| 87 |
|:---:|:---:|:---:|
|
| 88 |
-
|0.
|
| 89 |
|
| 90 |
|
| 91 |
* Model performances (Test set)
|
| 92 |
|
| 93 |
| | Precision | Recall | F-score | Support |
|
| 94 |
|---|:---:|:---:|:---:|:---:|
|
| 95 |
-
|
|
| 96 |
-
|
|
| 97 |
-
|
|
| 98 |
-
|
|
| 99 |
-
|
|
| 100 |
-
|
|
| 101 |
-
|
|
| 102 |
-
|
|
| 103 |
-
|
|
| 104 |
-
|
|
| 105 |
|
| 106 |
|
| 107 |
|
|
@@ -132,6 +134,11 @@ samples = [
|
|
| 132 |
for sample in samples:
|
| 133 |
print(pipe(sample))
|
| 134 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
```
|
| 136 |
|
| 137 |
|
|
@@ -146,4 +153,4 @@ This model was trained entirely on French encyclopaedic entries classified as Ge
|
|
| 146 |
## Acknowledgement
|
| 147 |
|
| 148 |
The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
|
| 149 |
-
Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.
|
|
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
widget:
|
| 9 |
- text: >-
|
| 10 |
+
* ALBI, (Géog.) ville de France, capitale de l'Albigeois, dans le haut Languedoc : elle est sur le Tarn. Long. 19. 49. lat. 43. 55. 44.
|
| 11 |
+
datasets:
|
| 12 |
+
- GEODE/GeoEDdA-TopoRel
|
| 13 |
---
|
| 14 |
|
| 15 |
|
|
|
|
| 21 |
|
| 22 |
This model is designed to classify geographic encyclopedia articles describing places.
|
| 23 |
It is a fine-tuned version of the bert-base-multilingual-cased model.
|
| 24 |
+
It has been trained on [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel), a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).
|
| 25 |
|
| 26 |
|
| 27 |
|
|
|
|
| 40 |
## Class labels
|
| 41 |
|
| 42 |
|
| 43 |
+
The tagset is as follows (with examples from the dataset):
|
| 44 |
+
- **City**: villes, bourgs, villages, etc.
|
| 45 |
+
- **Island**: îles, presqu'îles, etc.
|
| 46 |
+
- **Region**: régions, contrées, provinces, cercles, etc.
|
| 47 |
+
- **River**: rivières, fleuves,etc.
|
| 48 |
+
- **Mountain**: montagnes, vallées, etc.
|
| 49 |
+
- **Country**: pays, royaumes, etc.
|
| 50 |
+
- **Sea**: mer, golphe, baie, etc.
|
| 51 |
+
- **Other**: promontoires, caps, rivages, déserts, etc.
|
| 52 |
+
- **Human-made**: ports, châteaux, forteresses, abbayes, etc.
|
| 53 |
+
- **Lake**: lacs, étangs, marais, etc.
|
| 54 |
|
| 55 |
|
| 56 |
## Dataset
|
| 57 |
|
| 58 |
+
The model was trained using the [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel) dataset.
|
| 59 |
+
The dataset is splitted into train, validation and test sets which have the following distribution of entries among classes:
|
|
|
|
| 60 |
|
| 61 |
| | Train | Validation | Test|
|
| 62 |
|---|:---:|:---:|:---:|
|
| 63 |
+
| City | 921 | 33 | 40 |
|
| 64 |
+
| Island | 216 | 20 | 27 |
|
| 65 |
+
| Region | 138 | 40 | 28 |
|
| 66 |
+
| River | 133 | 20 | 28 |
|
| 67 |
+
| Mountain | 63 | 29 | 22 |
|
| 68 |
+
| Human-made | 38 | 10 | 9 |
|
| 69 |
+
| Other | 27 | 12 | 12 |
|
| 70 |
+
| Sea | 26 | 13 | 12 |
|
| 71 |
+
| Lake | 22 | 9 | 9 |
|
| 72 |
+
| Country | 16 | 14 | 13 |
|
| 73 |
+
|
| 74 |
|
| 75 |
|
| 76 |
## Evaluation
|
|
|
|
| 80 |
|
| 81 |
| Precision | Recall | F-score |
|
| 82 |
|:---:|:---:|:---:|
|
| 83 |
+
|0.95 | 0.92 | 0.93 |
|
| 84 |
|
| 85 |
|
| 86 |
* Overall weighted-average model performances
|
| 87 |
|
| 88 |
| Precision | Recall | F-score |
|
| 89 |
|:---:|:---:|:---:|
|
| 90 |
+
|0.94 | 0.94 | 0.94 |
|
| 91 |
|
| 92 |
|
| 93 |
* Model performances (Test set)
|
| 94 |
|
| 95 |
| | Precision | Recall | F-score | Support |
|
| 96 |
|---|:---:|:---:|:---:|:---:|
|
| 97 |
+
| City | 0.91 | 1.00 | 0.95 | 40|
|
| 98 |
+
| Island | 0.96 | 0.96 | 0.96 | 27|
|
| 99 |
+
| River | 0.97 | 1.00 | 0.98 | 28|
|
| 100 |
+
| Region | 0.86 | 0.89 | 0.88 | 28|
|
| 101 |
+
| Mountain | 1.00 | 0.95 | 0.98 | 22|
|
| 102 |
+
| Country | 1.00 | 0.85 | 0.92 | 13|
|
| 103 |
+
| Sea | 1.00 | 0.92 | 0.96 | 12|
|
| 104 |
+
| Other | 0.90 | 0.75 | 0.82 | 12|
|
| 105 |
+
| Human-made | 0.90 | 1.00 | 0.95 | 9|
|
| 106 |
+
| Lake | 1.00 | 0.89 | 0.94 | 9|
|
| 107 |
|
| 108 |
|
| 109 |
|
|
|
|
| 134 |
for sample in samples:
|
| 135 |
print(pipe(sample))
|
| 136 |
|
| 137 |
+
|
| 138 |
+
# Output
|
| 139 |
+
|
| 140 |
+
[{'label': 'City', 'score': 0.9969543218612671}]
|
| 141 |
+
[{'label': 'Region', 'score': 0.9811353087425232}]
|
| 142 |
```
|
| 143 |
|
| 144 |
|
|
|
|
| 153 |
## Acknowledgement
|
| 154 |
|
| 155 |
The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
|
| 156 |
+
Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.
|