lmoncla commited on
Commit
dec3ccd
·
verified ·
1 Parent(s): 5a01ff6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -17
README.md CHANGED
@@ -7,7 +7,10 @@ base_model:
7
  pipeline_tag: text-classification
8
  widget:
9
  - text: >-
10
- MAEATAE, (Géogr. anc.) anciens peuples de l'île de la grande Bretagne ; ils étoient auprès du mur qui coupoit l'île en deux parties.
 
 
 
11
  ---
12
 
13
 
@@ -28,9 +31,9 @@ It has been trained on a manually annotated subset of the French *Encyclopédie
28
 
29
  <!-- Provide a longer summary of what this model is. -->
30
 
31
- - **Developed by:** [Ludovic Moncla](https://ludovicmoncla.github.io) in the framework of the [GEODE](https://geode-project.github.io) project.
32
  - **Model type:** Text classification
33
- - **Repository:** [https://github.com/GEODE-project/semantic-entity-detection-encyclopedia](https://github.com/GEODE-project/semantic-entity-detection-encyclopedia)
34
  - **Language(s) (NLP):** French
35
  - **License:** cc-by-nc-4.0
36
 
@@ -41,20 +44,20 @@ It has been trained on a manually annotated subset of the French *Encyclopédie
41
  The tagset is as follows:
42
  - **Place**: encyclopedia entry describing the name of a place (such as a city, a river, a country, etc.)
43
  - **Person**: encyclopedia entry describing the name of a people or community
44
- - **Misc**: encyclopedia entry describing any other type of entity (such as abstract geographic concepts, cross-references to other entries, etc.)
45
 
46
 
47
  ## Dataset
48
 
49
 
50
- The model was trained using a set of 1423 entries (only first paragraphs) classified as 'Geography' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-edda-domain-classification). First paragraphs
51
- The datasets have the following distribution of entries among datasets and classes:
52
 
53
  | | Train | Validation | Test|
54
  |---|:---:|:---:|:---:|
55
- | Place | 707 | 125 | 147|
56
- | Person | 123 | 22 | 26 |
57
- | Misc | 197 | 35 | 41 |
58
 
59
 
60
  ## Evaluation
@@ -65,7 +68,7 @@ The datasets have the following distribution of entries among datasets and class
65
 
66
  | | Precision | Recall | F-score |
67
  |---|:---:|:---:|:---:|
68
- | | 0.95 | 0.95 | 0.95 |
69
 
70
 
71
 
@@ -73,9 +76,9 @@ The datasets have the following distribution of entries among datasets and class
73
 
74
  | | Precision | Recall | F-score | Support |
75
  |---|:---:|:---:|:---:|:---:|
76
- | Place | 0.97 | 0.97 | 0.97 | 147 |
77
- | Person | 0.92 | 0.92 | 0.92 | 26 |
78
- | Misc | 0.90 | 0.90 | 0.90 | 41 |
79
 
80
 
81
 
@@ -106,9 +109,9 @@ for sample in samples:
106
  print(pipe(sample))
107
 
108
  # Output
109
- [{'label': 'Place', 'score': 0.9956912398338318}]
110
- [{'label': 'Person', 'score': 0.9895496368408203}]
111
- [{'label': 'Misc', 'score': 0.993893563747406}]
112
 
113
  ```
114
 
@@ -124,4 +127,4 @@ This model was trained entirely on French encyclopaedic entries classified as Ge
124
  ## Acknowledgement
125
 
126
  The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
127
- Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.
 
7
  pipeline_tag: text-classification
8
  widget:
9
  - text: >-
10
+ MAEATAE, (Géogr. anc.) anciens peuples de l'île de la grande Bretagne ; ils
11
+ étoient auprès du mur qui coupoit l'île en deux parties.
12
+ datasets:
13
+ - GEODE/GeoEDdA-TopoRel
14
  ---
15
 
16
 
 
31
 
32
  <!-- Provide a longer summary of what this model is. -->
33
 
34
+ - **Authors:** Bin Yang, [Ludovic Moncla](https://ludovicmoncla.github.io), [Fabien Duchateau](https://perso.liris.cnrs.fr/fabien.duchateau/) and [Frédérique Laforest](https://perso.liris.cnrs.fr/flaforest/) in the framework of the [ECoDA](https://liris.cnrs.fr/projet-institutionnel/fil-2025-projet-ecoda) and [GEODE](https://geode-project.github.io) projects
35
  - **Model type:** Text classification
36
+ - **Repository:** [https://gitlab.liris.cnrs.fr/ecoda/encyclopedia2geokg](https://gitlab.liris.cnrs.fr/ecoda/encyclopedia2geokg)
37
  - **Language(s) (NLP):** French
38
  - **License:** cc-by-nc-4.0
39
 
 
44
  The tagset is as follows:
45
  - **Place**: encyclopedia entry describing the name of a place (such as a city, a river, a country, etc.)
46
  - **Person**: encyclopedia entry describing the name of a people or community
47
+ - **Other**: encyclopedia entry describing any other type of entity (such as abstract geographic concepts, cross-references to other entries, etc.)
48
 
49
 
50
  ## Dataset
51
 
52
 
53
+ The model was trained using the [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel) dataset.
54
+ The dataset is splitted into train, validation and test sets which have the following distribution of entries among classes:
55
 
56
  | | Train | Validation | Test|
57
  |---|:---:|:---:|:---:|
58
+ | Place | 1,800 | 225 | 225|
59
+ | Person | 200 | 25 | 25 |
60
+ | Misc | 200 | 25 | 25 |
61
 
62
 
63
  ## Evaluation
 
68
 
69
  | | Precision | Recall | F-score |
70
  |---|:---:|:---:|:---:|
71
+ | | 0.980 | 0.978 | 0.979 |
72
 
73
 
74
 
 
76
 
77
  | | Precision | Recall | F-score | Support |
78
  |---|:---:|:---:|:---:|:---:|
79
+ | Place | 0.99 | 0.98 | 0.99 | 225 |
80
+ | Person | 1.00 | 0.96 | 0.98 | 25 |
81
+ | Other | 0.83 | 0.96 | 0.89 | 25 |
82
 
83
 
84
 
 
109
  print(pipe(sample))
110
 
111
  # Output
112
+ [{'label': 'Place', 'score': 0.9984742999076843}]
113
+ [{'label': 'Person', 'score': 0.9927592277526855}]
114
+ [{'label': 'Other', 'score': 0.9885557293891907}]
115
 
116
  ```
117
 
 
127
  ## Acknowledgement
128
 
129
  The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
130
+ Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.