GEODE
/

bert-base-multilingual-cased-geography-entry-classification

Text Classification

Model card Files Files and versions

lmoncla commited on Apr 15, 2025

Commit

f9c30fa

·

verified ·

1 Parent(s): 656b104

Update README.md

Files changed (1) hide show

README.md +27 -9

README.md CHANGED Viewed

@@ -17,9 +17,9 @@ widget:
 <!-- Provide a quick summary of what the model is/does. -->
-This model is designed to classify encyclopedia articles into
 It is a fine-tuned version of the bert-base-multilingual-cased model.
-It has been trained on the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).
@@ -47,13 +47,8 @@ The tagset is as follows:
 ## Dataset
-The model was trained using a set of 2200 paragraphs randomly selected out of 2001 Encyclopédie's entries.
-All paragraphs were written in French and are distributed as follows among the Encyclopédie knowledge domains:
-The spans/entities were labeled by the project team along with using pre-labelling with early models to speed up the labelling process.
-A train/val/test split was used.
-Validation and test sets are composed of 200 paragraphs each: 100 classified as 'Géographie' and 100 from another knowledge domain.
-The datasets have the following breakdown of tokens and spans/entities.
 |   | Train | Validation | Test|
 |---|:---:|:---:|:---:|
@@ -62,6 +57,29 @@ The datasets have the following breakdown of tokens and spans/entities.
 | Misc | 197 | 35 | 41 |
 ## How to Get Started with the Model

 <!-- Provide a quick summary of what the model is/does. -->
+This model is designed to classify geographic encyclopedia articles into place, person, or misc.
 It is a fine-tuned version of the bert-base-multilingual-cased model.
+It has been trained on a manually annotated subset of the French *Encyclopédie ou dictionnaire raisonné des sciences des arts et des métiers par une société de gens de lettres (1751-1772)* edited by Diderot and d'Alembert (provided by the [ARTFL Encyclopédie Project](https://artfl-project.uchicago.edu)).
 ## Dataset
+The model was trained using a set of 1423 entries (only first paragraphs) classified as 'Geography' (using this model: https://huggingface.co/GEODE/bert-base-multilingual-cased-edda-domain-classification). First paragraphs
+The datasets have the following distribution of entries among datasets and classes:
 |   | Train | Validation | Test|
 |---|:---:|:---:|:---:|
 | Misc | 197 | 35 | 41 |
+## Evaluation
+* Overall weighted-average model performances
+|   | Precision | Recall | F-score |
+|---|:---:|:---:|:---:|
+|    | 0.95   | 0.95   | 0.95 |
+* Model performances (Test set)
+|   | Precision | Recall | F-score | Support |
+|---|:---:|:---:|:---:|:---:|
+| Place    |  0.97  |  0.97  |  0.97 | 147 |
+| Person   |  0.92  |  0.92  |  0.92 | 26 |
+| Misc     |  0.90  |  0.90  |  0.90 | 41 |
 ## How to Get Started with the Model