File size: 5,400 Bytes
6f0fb73
 
 
 
 
 
 
 
 
 
 
 
 
22838b9
6f0fb73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: cc-by-nc-4.0
language:
- fr
base_model:
- google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
datasets:
- GEODE/GeoEDdA-TopoRel
---



# bert-base-multilingual-cased-classification-relation


<!-- Provide a quick summary of what the model is/does. -->

This model is designed to classify spatial relations recognized from geographic encyclopedia articles.
It is a fine-tuned version of the bert-base-multilingual-cased model.
It has been trained on [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel), a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).




## Model Description

<!-- Provide a longer summary of what this model is. -->

- **Authors:** Bin Yang, [Ludovic Moncla](https://ludovicmoncla.github.io), [Fabien Duchateau](https://perso.liris.cnrs.fr/fabien.duchateau/) and [Frédérique Laforest](https://perso.liris.cnrs.fr/flaforest/) in the framework of the [ECoDA](https://liris.cnrs.fr/projet-institutionnel/fil-2025-projet-ecoda) and [GEODE](https://geode-project.github.io) projects
- **Model type:** Text classification
- **Repository:** [https://gitlab.liris.cnrs.fr/ecoda/encyclopedia2geokg](https://gitlab.liris.cnrs.fr/ecoda/encyclopedia2geokg)
- **Language(s) (NLP):** French
- **License:** cc-by-nc-4.0


## Class labels


The tagset is as follows:
- **Adjacency**: 
- **Crosses**: 
- **Distance-Orientation**: 
- **Inclusion**: 
- **Movement**: 
- **Other**: 


## Dataset


The model was trained using the [GeoEDdA-TopoRel](https://huggingface.co/datasets/GEODE/GeoEDdA-TopoRel) dataset.
The dataset is splitted into train, validation and test sets which have the following distribution of entries among classes: 

|   | Train | Validation | Test|
|---|:---:|:---:|:---:|
| Adjacency | 498 | 59 | 75|
| Crosses | 397 | 50 | 29 |
| Distance-Orientation | 1,065 | 163 | 115 |
| Inclusion | 1,319 | 131 | 156 |
| Movement | 184 | 15 | 35 |
| Other | 195 | 30 | 42 |


## Evaluation


* Overall weighted-average model performances


|   | Precision | Recall | F-score |
|---|:---:|:---:|:---:|
|    | 0.92   | 0.92   | 0.92 | 



* Model performances (Test set)

|   | Precision | Recall | F-score | Support |
|---|:---:|:---:|:---:|:---:|
| Adjacency | 0.85 | 0.84 | 0.85 | 75|
| Crosses | 0.78 | 0.86 | 0.82 | 29 |
| Distance-Orientation | 0.93 | 0.99 | 0.96 | 115 |
| Inclusion | 0.97 | 0.98 | 0.97 | 156 |
| Movement | 0.89 | 0.69 | 0.77 | 35 |
| Other | 0.95 | 0.88 | 0.91 | 42 |





## How to Get Started with the Model

Use the code below to get started with the model.


```python
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("mps" if torch.backends.mps.is_available() else ("cuda" if torch.cuda.is_available() else "cpu"))

ner = pipeline("token-classification", model="GEODE/camembert-base-edda-span-classification", aggregation_strategy="simple", device=device)
relation_classifier = pipeline("text-classification", model="GEODE/bert-base-multilingual-cased-classification-relation", truncation=True, device=device)

def get_context(text, span, ngram_context_size=5):
    word = span["word"]
    start = span["start"]
    end = span["end"]
    label = span["entity_group"]

    # Extract context
    previous_text = text[:start].strip()
    next_text = text[end:].strip()
    previous_words = previous_text.split()[-ngram_context_size:]
    next_words = next_text.split()[:ngram_context_size]

    # Build context string
    context = f"[{word}]: {' '.join(previous_words)} {word} {' '.join(next_words)}"
    return word, context, label

content = "WINCHESTER, (Géog. mod.) ou plutôt Wintchester, ville d'Angleterre, capitale du Hampshire, sur le bord de l'Itching, à dix-huit milles au sud-est de Salisbury, & à soixante sud-ouest de Londres. Long. 16. 20. latit. 51. 3."

spans = ner(content)
for span in spans:
    if span['entity_group'] == 'Relation':
        word, context, label = get_context(content, span, ngram_context_size=5)
        print(f"Relation: {word}")

        label = relation_classifier(context)
        print(f"Predicted label: {label}")


# Output
Relation: sur le bord de
Predicted label: [{'label': 'Crosses', 'score': 0.9778845906257629}]
Relation: à dix-huit milles au sud-est de
Predicted label: [{'label': 'Distance-Orientation', 'score': 0.9959626793861389}]
Relation: à soixante sud-ouest de
Predicted label: [{'label': 'Distance-Orientation', 'score': 0.9963018894195557}]

```


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

This model was trained entirely on French encyclopaedic entries classified as Geography and will likely not perform well on text in other languages or other corpora. 



## Acknowledgement

The authors are grateful to the [ASLAN project](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).
Data courtesy the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu), University of Chicago.