Update README.md
Browse files
README.md
CHANGED
|
@@ -20,7 +20,19 @@ model-index:
|
|
| 20 |
- name: NER F Score
|
| 21 |
type: f_score
|
| 22 |
value: 0.776119403
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
| Feature | Description |
|
| 25 |
| --- | --- |
|
| 26 |
| **Name** | `fr_lexical_death` |
|
|
@@ -53,4 +65,18 @@ model-index:
|
|
| 53 |
| `ENTS_P` | 82.54 |
|
| 54 |
| `ENTS_R` | 73.24 |
|
| 55 |
| `TRANSFORMER_LOSS` | 51778.17 |
|
| 56 |
-
| `NER_LOSS` | 41163.78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
- name: NER F Score
|
| 21 |
type: f_score
|
| 22 |
value: 0.776119403
|
| 23 |
+
license: agpl-3.0
|
| 24 |
+
widget:
|
| 25 |
+
- example 1: "Il faut pas sortir, vous reviendrez pas vivantes."
|
| 26 |
+
- example 2: "Les morts ne parlents pas."
|
| 27 |
+
- example 3: "Les Ambulances garés, les cortèges de defunts, les cadavres qu'on sortait des décombres"
|
| 28 |
---
|
| 29 |
+
|
| 30 |
+
## Description
|
| 31 |
+
|
| 32 |
+
This model was built to compute detect the lexical field of death. It's main purpose was to automate annotation on a specific dataset.
|
| 33 |
+
There is no waranty that it will work on any others dataset. We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
|
| 34 |
+
|
| 35 |
+
|
| 36 |
| Feature | Description |
|
| 37 |
| --- | --- |
|
| 38 |
| **Name** | `fr_lexical_death` |
|
|
|
|
| 65 |
| `ENTS_P` | 82.54 |
|
| 66 |
| `ENTS_R` | 73.24 |
|
| 67 |
| `TRANSFORMER_LOSS` | 51778.17 |
|
| 68 |
+
| `NER_LOSS` | 41163.78 |
|
| 69 |
+
|
| 70 |
+
### Training
|
| 71 |
+
|
| 72 |
+
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
|
| 73 |
+
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters,
|
| 74 |
+
and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation,
|
| 75 |
+
the evaluation sequences were taken from documents that were not used during the training.
|
| 76 |
+
|
| 77 |
+
Tain dataset 147 labels for MORT_EXPLICITE
|
| 78 |
+
|
| 79 |
+
Test dataset is 35 labels for MORT_EXPLICITE
|
| 80 |
+
|
| 81 |
+
Valid dataset is 18 labels for MORT_EXPLICITE
|
| 82 |
+
|