Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,27 @@ model-index:
|
|
| 20 |
- name: NER F Score
|
| 21 |
type: f_score
|
| 22 |
value: 0.8873563218
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
---
|
| 24 |
| Feature | Description |
|
| 25 |
| --- | --- |
|
|
@@ -30,7 +51,7 @@ model-index:
|
|
| 30 |
| **Components** | `transformer`, `ner` |
|
| 31 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
| 32 |
| **Sources** | n/a |
|
| 33 |
-
| **License** |
|
| 34 |
| **Author** | [n/a]() |
|
| 35 |
|
| 36 |
### Label Scheme
|
|
@@ -52,5 +73,17 @@ model-index:
|
|
| 52 |
| `ENTS_F` | 88.74 |
|
| 53 |
| `ENTS_P` | 85.40 |
|
| 54 |
| `ENTS_R` | 92.34 |
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
- name: NER F Score
|
| 21 |
type: f_score
|
| 22 |
value: 0.8873563218
|
| 23 |
+
widget:
|
| 24 |
+
- text: "On m'a attrapé par la main !"
|
| 25 |
+
example_title: "on quelqu'un"
|
| 26 |
+
- text: "En France, on parle français."
|
| 27 |
+
example_title: "on générique"
|
| 28 |
+
- text: "On est allé manger des glaces puis on est allé à la plage."
|
| 29 |
+
example_title: "on nous"
|
| 30 |
+
|
| 31 |
+
license: agpl-3.0
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Description
|
| 35 |
+
|
| 36 |
+
This model was built to compute detect diffferent value of *on* in French (them). It's main purpose was to automate annotation on a specific dataset.
|
| 37 |
+
There is no waranty that it will work on any others dataset.
|
| 38 |
+
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
|
| 39 |
+
Some pronouns can have different meanings according to their context, the generic pronoun plays an important role in trauma narratives.
|
| 40 |
+
In our study, we differentiate the different values of the *on* pronoun. It can be used as *we*, for example: “On est entré au Bataclan à 20h45” ("We entered the Bataclan at 8:45 pm").
|
| 41 |
+
But it can also be used as a synonym for someone: “On m’a marché dessus” (“Someone stepped on me").
|
| 42 |
+
Finally, it can be used generically: “on est jamais mieux servi que par que par soi même” ("you are never better served than by yourself".)
|
| 43 |
+
|
| 44 |
---
|
| 45 |
| Feature | Description |
|
| 46 |
| --- | --- |
|
|
|
|
| 51 |
| **Components** | `transformer`, `ner` |
|
| 52 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
| 53 |
| **Sources** | n/a |
|
| 54 |
+
| **License** | agpl-3.0 |
|
| 55 |
| **Author** | [n/a]() |
|
| 56 |
|
| 57 |
### Label Scheme
|
|
|
|
| 73 |
| `ENTS_F` | 88.74 |
|
| 74 |
| `ENTS_P` | 85.40 |
|
| 75 |
| `ENTS_R` | 92.34 |
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
### training
|
| 79 |
+
|
| 80 |
+
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
|
| 81 |
+
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
|
| 82 |
+
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
|
| 83 |
+
|
| 84 |
+
| label | train | test | valid |
|
| 85 |
+
| --- | --- |--- |--- |
|
| 86 |
+
| `ON_GENERIQUE`| 189 | 57 | 49 |
|
| 87 |
+
| `ON_NOUS`| 1006 | 320 | 229 |
|
| 88 |
+
| `ON_QUELQU_UN`|90 | 42 | 19|
|
| 89 |
+
|