Update README.md
Browse files
README.md
CHANGED
|
@@ -44,6 +44,24 @@ The tagset is as follows:
|
|
| 44 |
- **Misc**: encyclopedia entry describing any other type of entity (such as abstract geographic concepts, cross-references to other entries, etc.)
|
| 45 |
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
## How to Get Started with the Model
|
| 49 |
|
|
@@ -66,7 +84,6 @@ samples = [
|
|
| 66 |
for sample in samples:
|
| 67 |
print(pipe(sample))
|
| 68 |
|
| 69 |
-
|
| 70 |
# Output
|
| 71 |
[{'label': 'Place', 'score': 0.9984947443008423}]
|
| 72 |
[{'label': 'Person', 'score': 0.9661000370979309}]
|
|
|
|
| 44 |
- **Misc**: encyclopedia entry describing any other type of entity (such as abstract geographic concepts, cross-references to other entries, etc.)
|
| 45 |
|
| 46 |
|
| 47 |
+
## Dataset
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
The model was trained using a set of 2200 paragraphs randomly selected out of 2001 Encyclopédie's entries.
|
| 51 |
+
All paragraphs were written in French and are distributed as follows among the Encyclopédie knowledge domains:
|
| 52 |
+
|
| 53 |
+
The spans/entities were labeled by the project team along with using pre-labelling with early models to speed up the labelling process.
|
| 54 |
+
A train/val/test split was used.
|
| 55 |
+
Validation and test sets are composed of 200 paragraphs each: 100 classified as 'Géographie' and 100 from another knowledge domain.
|
| 56 |
+
The datasets have the following breakdown of tokens and spans/entities.
|
| 57 |
+
|
| 58 |
+
| | Train | Validation | Test|
|
| 59 |
+
|---|:---:|:---:|:---:|
|
| 60 |
+
| Place | 707 | 125 | 147|
|
| 61 |
+
| Person | 123 | 22 | 26 |
|
| 62 |
+
| Misc | 197 | 35 | 41 |
|
| 63 |
+
|
| 64 |
+
|
| 65 |
|
| 66 |
## How to Get Started with the Model
|
| 67 |
|
|
|
|
| 84 |
for sample in samples:
|
| 85 |
print(pipe(sample))
|
| 86 |
|
|
|
|
| 87 |
# Output
|
| 88 |
[{'label': 'Place', 'score': 0.9984947443008423}]
|
| 89 |
[{'label': 'Person', 'score': 0.9661000370979309}]
|