Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,7 @@ base_model:
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
widget:
|
| 9 |
- text: >-
|
| 10 |
-
|
| 11 |
---
|
| 12 |
|
| 13 |
|
|
@@ -39,18 +39,19 @@ It has been trained on a manually annotated subset of the French *Encyclopédie
|
|
| 39 |
|
| 40 |
|
| 41 |
The tagset is as follows:
|
| 42 |
-
- **single**:
|
| 43 |
-
- **multiple**:
|
| 44 |
|
| 45 |
## Dataset
|
| 46 |
|
| 47 |
|
| 48 |
-
The model was trained using a set of
|
| 49 |
The datasets have the following distribution of entries among datasets and classes:
|
| 50 |
|
| 51 |
| | Train | Validation | Test|
|
| 52 |
|---|:---:|:---:|:---:|
|
| 53 |
-
|
|
|
|
| 54 |
|
| 55 |
|
| 56 |
## Evaluation
|
|
@@ -58,25 +59,24 @@ The datasets have the following distribution of entries among datasets and class
|
|
| 58 |
|
| 59 |
* Overall macro-average model performances
|
| 60 |
|
| 61 |
-
|
| 62 |
|:---:|:---:|:---:|
|
| 63 |
-
|
| 64 |
|
| 65 |
|
| 66 |
* Overall weighted-average model performances
|
| 67 |
|
| 68 |
| Precision | Recall | F-score |
|
| 69 |
|:---:|:---:|:---:|
|
| 70 |
-
|
| 71 |
|
| 72 |
|
| 73 |
* Model performances (Test set)
|
| 74 |
|
| 75 |
| | Precision | Recall | F-score | Support |
|
| 76 |
|---|:---:|:---:|:---:|:---:|
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
|
| 81 |
|
| 82 |
|
|
@@ -98,7 +98,7 @@ pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, truncat
|
|
| 98 |
|
| 99 |
samples = [
|
| 100 |
"* ALBI, (Géog.) ville de France, capitale de l'Albigeois, dans le haut Languedoc : elle est sur le Tarn. Long. 19. 49. lat. 43. 55. 44.",
|
| 101 |
-
"
|
| 102 |
]
|
| 103 |
|
| 104 |
|
|
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
widget:
|
| 9 |
- text: >-
|
| 10 |
+
PEGOE, (Géog. anc.) 1°. ville de l'Achaie, dans la Mégaride ; 2°. ville de l'Hellespont, selon Ortelius ; 3°. ville de l'île de Cypre ou de la Cyrénie, selon Etienne le géographe.
|
| 11 |
---
|
| 12 |
|
| 13 |
|
|
|
|
| 39 |
|
| 40 |
|
| 41 |
The tagset is as follows:
|
| 42 |
+
- **single**: only one place is described
|
| 43 |
+
- **multiple**: several places are described (a single name with multiple locations)
|
| 44 |
|
| 45 |
## Dataset
|
| 46 |
|
| 47 |
|
| 48 |
+
The model was trained using a set of 8658 entries classified as 'Place' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-geography-entry-classification) among entries classified as 'Geography' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-edda-domain-classification).
|
| 49 |
The datasets have the following distribution of entries among datasets and classes:
|
| 50 |
|
| 51 |
| | Train | Validation | Test|
|
| 52 |
|---|:---:|:---:|:---:|
|
| 53 |
+
| Single | 5760 | 1235 | 1234 |
|
| 54 |
+
| Multiple | 300 | 64 | 65 |
|
| 55 |
|
| 56 |
|
| 57 |
## Evaluation
|
|
|
|
| 59 |
|
| 60 |
* Overall macro-average model performances
|
| 61 |
|
| 62 |
+
| Precision | Recall | F-score |
|
| 63 |
|:---:|:---:|:---:|
|
| 64 |
+
| 0.92 | 0.92 | 0.92 |
|
| 65 |
|
| 66 |
|
| 67 |
* Overall weighted-average model performances
|
| 68 |
|
| 69 |
| Precision | Recall | F-score |
|
| 70 |
|:---:|:---:|:---:|
|
| 71 |
+
| 0.98 | 0.98 | 0.98 |
|
| 72 |
|
| 73 |
|
| 74 |
* Model performances (Test set)
|
| 75 |
|
| 76 |
| | Precision | Recall | F-score | Support |
|
| 77 |
|---|:---:|:---:|:---:|:---:|
|
| 78 |
+
| Multiple | 0.85 | 0.85 | 0.85 | 65 |
|
| 79 |
+
| Single | 0.99 | 0.99 | 0.99 | 1234 |
|
|
|
|
| 80 |
|
| 81 |
|
| 82 |
|
|
|
|
| 98 |
|
| 99 |
samples = [
|
| 100 |
"* ALBI, (Géog.) ville de France, capitale de l'Albigeois, dans le haut Languedoc : elle est sur le Tarn. Long. 19. 49. lat. 43. 55. 44.",
|
| 101 |
+
"PEGOE, (Géog. anc.) 1°. ville de l'Achaie, dans la Mégaride ; 2°. ville de l'Hellespont, selon Ortelius ; 3°. ville de l'île de Cypre ou de la Cyrénie, selon Etienne le géographe. "
|
| 102 |
]
|
| 103 |
|
| 104 |
|