lmoncla commited on
Commit
a8a6a6a
·
verified ·
1 Parent(s): ed28a88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -39
README.md CHANGED
@@ -7,7 +7,9 @@ base_model:
7
  pipeline_tag: text-classification
8
  widget:
9
  - text: >-
10
- MAEATAE, (Géogr. anc.) anciens peuples de l'île de la grande Bretagne ; ils étoient auprès du mur qui coupoit l'île en deux parties.
 
 
11
  ---
12
 
13
 
@@ -19,7 +21,7 @@ widget:
19
 
20
  This model is designed to classify geographic encyclopedia articles describing places.
21
  It is a fine-tuned version of the bert-base-multilingual-cased model.
22
- It has been trained on a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).
23
 
24
 
25
 
@@ -38,37 +40,37 @@ It has been trained on a manually annotated subset of the French *Encyclopédie
38
  ## Class labels
39
 
40
 
41
- The tagset is as follows:
42
- - **Ville**: villes, bourgs, villages, etc.
43
- - **Île**: îles, presqu'îles, etc.
44
- - **Région**: régions, contrées, provinces, cercles, etc.
45
- - **Rivière**: rivières, fleuves,etc.
46
- - **Montagne**: montagnes, vallées, etc.
47
- - **Pays**: pays, royaumes, etc.
48
- - **Mer**: mer, golphe, baie, etc.
49
- - **Autre**: promontoires, caps, rivages, déserts, etc.
50
- - **ConstructionHumaine**: ports, châteaux, forteresses, abbayes, etc.
51
- - **Lac**: lacs, étangs, marais, etc.
52
 
53
 
54
  ## Dataset
55
 
56
-
57
- The model was trained using a set of 8665 entries classified as 'Place' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-geography-entry-classification) among entries classified as 'Geography' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-edda-domain-classification).
58
- The datasets have the following distribution of entries among datasets and classes:
59
 
60
  | | Train | Validation | Test|
61
  |---|:---:|:---:|:---:|
62
- | Ville | 3786 | 811 | 811 |
63
- | Île | 543 | 116 | 117 |
64
- | Rivière | 342 | 73 | 74 |
65
- | Région | 329 | 70 | 71 |
66
- | Montagne | 138 | 29 | 30 |
67
- | Pays | 64 | 14 | 13 |
68
- | Mer | 57 | 13 | 12 |
69
- | Autre | 55 | 12 | 12 |
70
- | ConstructionHumaine | 43 | 10 | 9 |
71
- | Lac | 44 | 9 | 9 |
 
72
 
73
 
74
  ## Evaluation
@@ -78,30 +80,30 @@ The datasets have the following distribution of entries among datasets and class
78
 
79
  | Precision | Recall | F-score |
80
  |:---:|:---:|:---:|
81
- |0.94 | 0.92 | 0.92 |
82
 
83
 
84
  * Overall weighted-average model performances
85
 
86
  | Precision | Recall | F-score |
87
  |:---:|:---:|:---:|
88
- |0.99 | 0.99 | 0.99 |
89
 
90
 
91
  * Model performances (Test set)
92
 
93
  | | Precision | Recall | F-score | Support |
94
  |---|:---:|:---:|:---:|:---:|
95
- | Ville | 0.99 | 1.00 | 1.00 | 811|
96
- | Île | 0.99 | 0.98 | 0.99 | 117|
97
- | Rivière | 1.00 | 0.99 | 0.99 | 74|
98
- | Région | 0.97 | 0.94 | 0.96 | 71|
99
- | Montagne | 0.97 | 0.97 | 0.97 | 30|
100
- | Pays | 0.87 | 1.00 | 0.93 | 13|
101
- | Mer | 1.00 | 1.00 | 1.00 | 12|
102
- | Autre | 0.60 | 0.75 | 0.67 | 12|
103
- |ConstructionHumaine | 1.00 | 0.67 | 0.80 | 9|
104
- | Lac | 1.00 | 0.89 | 0.94 | 9|
105
 
106
 
107
 
@@ -132,6 +134,11 @@ samples = [
132
  for sample in samples:
133
  print(pipe(sample))
134
 
 
 
 
 
 
135
  ```
136
 
137
 
@@ -146,4 +153,4 @@ This model was trained entirely on French encyclopaedic entries classified as Ge
146
  ## Acknowledgement
147
 
148
  The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
149
- Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.
 
7
  pipeline_tag: text-classification
8
  widget:
9
  - text: >-
10
+ * ALBI, (Géog.) ville de France, capitale de l'Albigeois, dans le haut Languedoc : elle est sur le Tarn. Long. 19. 49. lat. 43. 55. 44.
11
+ datasets:
12
+ - GEODE/GeoEDdA-TopoRel
13
  ---
14
 
15
 
 
21
 
22
  This model is designed to classify geographic encyclopedia articles describing places.
23
  It is a fine-tuned version of the bert-base-multilingual-cased model.
24
+ It has been trained on [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel), a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).
25
 
26
 
27
 
 
40
  ## Class labels
41
 
42
 
43
+ The tagset is as follows (with examples from the dataset):
44
+ - **City**: villes, bourgs, villages, etc.
45
+ - **Island**: îles, presqu'îles, etc.
46
+ - **Region**: régions, contrées, provinces, cercles, etc.
47
+ - **River**: rivières, fleuves,etc.
48
+ - **Mountain**: montagnes, vallées, etc.
49
+ - **Country**: pays, royaumes, etc.
50
+ - **Sea**: mer, golphe, baie, etc.
51
+ - **Other**: promontoires, caps, rivages, déserts, etc.
52
+ - **Human-made**: ports, châteaux, forteresses, abbayes, etc.
53
+ - **Lake**: lacs, étangs, marais, etc.
54
 
55
 
56
  ## Dataset
57
 
58
+ The model was trained using the [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel) dataset.
59
+ The dataset is splitted into train, validation and test sets which have the following distribution of entries among classes:
 
60
 
61
  | | Train | Validation | Test|
62
  |---|:---:|:---:|:---:|
63
+ | City | 921 | 33 | 40 |
64
+ | Island | 216 | 20 | 27 |
65
+ | Region | 138 | 40 | 28 |
66
+ | River | 133 | 20 | 28 |
67
+ | Mountain | 63 | 29 | 22 |
68
+ | Human-made | 38 | 10 | 9 |
69
+ | Other | 27 | 12 | 12 |
70
+ | Sea | 26 | 13 | 12 |
71
+ | Lake | 22 | 9 | 9 |
72
+ | Country | 16 | 14 | 13 |
73
+
74
 
75
 
76
  ## Evaluation
 
80
 
81
  | Precision | Recall | F-score |
82
  |:---:|:---:|:---:|
83
+ |0.95 | 0.92 | 0.93 |
84
 
85
 
86
  * Overall weighted-average model performances
87
 
88
  | Precision | Recall | F-score |
89
  |:---:|:---:|:---:|
90
+ |0.94 | 0.94 | 0.94 |
91
 
92
 
93
  * Model performances (Test set)
94
 
95
  | | Precision | Recall | F-score | Support |
96
  |---|:---:|:---:|:---:|:---:|
97
+ | City | 0.91 | 1.00 | 0.95 | 40|
98
+ | Island | 0.96 | 0.96 | 0.96 | 27|
99
+ | River | 0.97 | 1.00 | 0.98 | 28|
100
+ | Region | 0.86 | 0.89 | 0.88 | 28|
101
+ | Mountain | 1.00 | 0.95 | 0.98 | 22|
102
+ | Country | 1.00 | 0.85 | 0.92 | 13|
103
+ | Sea | 1.00 | 0.92 | 0.96 | 12|
104
+ | Other | 0.90 | 0.75 | 0.82 | 12|
105
+ | Human-made | 0.90 | 1.00 | 0.95 | 9|
106
+ | Lake | 1.00 | 0.89 | 0.94 | 9|
107
 
108
 
109
 
 
134
  for sample in samples:
135
  print(pipe(sample))
136
 
137
+
138
+ # Output
139
+
140
+ [{'label': 'City', 'score': 0.9969543218612671}]
141
+ [{'label': 'Region', 'score': 0.9811353087425232}]
142
  ```
143
 
144
 
 
153
  ## Acknowledgement
154
 
155
  The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
156
+ Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.