File size: 4,326 Bytes
ce7854e
 
 
 
 
 
 
 
 
e544a54
ce7854e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b839592
 
ce7854e
 
 
 
e544a54
ce7854e
 
 
 
e544a54
 
ce7854e
 
 
 
 
 
 
e544a54
ce7854e
e544a54
ce7854e
 
 
 
 
 
e544a54
ce7854e
 
 
 
 
 
e544a54
 
ce7854e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e544a54
ce7854e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
license: cc-by-nc-4.0
language:
- fr
base_model:
- google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
widget:
- text: >-
    PEGOE, (Géog. anc.) 1°. ville de l'Achaie, dans la Mégaride ; 2°. ville de l'Hellespont, selon Ortelius ; 3°. ville de l'île de Cypre ou de la Cyrénie, selon Etienne le géographe. 
---



# bert-base-multilingual-cased-single-multiple-place-classification


<!-- Provide a quick summary of what the model is/does. -->

This model is designed to classify geographic encyclopedia articles describing places.
It is a fine-tuned version of the bert-base-multilingual-cased model.
It has been trained on a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).




## Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Bin Yang, [Ludovic Moncla](https://ludovicmoncla.github.io), [Fabien Duchateau](https://perso.liris.cnrs.fr/fabien.duchateau/) and [Frédérique Laforest](https://perso.liris.cnrs.fr/flaforest/)
- **Model type:** Text classification
- **Repository:** 
- **Language(s) (NLP):** French
- **License:** cc-by-nc-4.0


## Class labels


The tagset is as follows:
- **Single**: only one place is described
- **Multiple**: several places are described (a single name with multiple locations)

## Dataset


The model was trained using a set of 8658 entries classified as 'Place' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-geography-entry-classification) among entries classified as 'Geography' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-edda-domain-classification).
The datasets have the following distribution of entries among datasets and classes: 

|   | Train | Validation | Test|
|---|:---:|:---:|:---:|
| Single     | 5760 | 1235 | 1234 | 
| Multiple   | 300 | 64 | 65 |


## Evaluation


* Overall macro-average model performances

| Precision | Recall | F-score |
|:---:|:---:|:---:|
| 0.92 | 0.92 | 0.92 | 


* Overall weighted-average model performances

| Precision | Recall | F-score |
|:---:|:---:|:---:|
| 0.98 | 0.98 | 0.98 | 


* Model performances (Test set)

|   | Precision | Recall | F-score | Support |
|---|:---:|:---:|:---:|:---:|
| Multiple | 0.85 | 0.85 | 0.85 | 65 | 
| Single | 0.99 | 0.99 | 0.99 | 1234 |




## How to Get Started with the Model

Use the code below to get started with the model.


```python
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("mps" if torch.backends.mps.is_available() else ("cuda" if torch.cuda.is_available() else "cpu"))

tokenizer = AutoTokenizer.from_pretrained("GEODE/bert-base-multilingual-cased-single-multiple-place-classification")
model = AutoModelForSequenceClassification.from_pretrained("GEODE/bert-base-multilingual-cased-single-multiple-place-classification")

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, truncation=True, device=device)

samples = [
    "* ALBI, (Géog.) ville de France, capitale de l'Albigeois, dans le haut Languedoc : elle est sur le Tarn. Long. 19. 49. lat. 43. 55. 44.",
    "PEGOE, (Géog. anc.) 1°. ville de l'Achaie, dans la Mégaride ; 2°. ville de l'Hellespont, selon Ortelius ; 3°. ville de l'île de Cypre ou de la Cyrénie, selon Etienne le géographe. "
]


for sample in samples:
    print(pipe(sample))

```


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

This model was trained entirely on French encyclopaedic entries classified as Geography (and place) and will likely not perform well on text in other languages or other corpora. 



## Acknowledgement

The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.