binbin83 commited on
Commit
5e4cd86
·
1 Parent(s): cd043d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -3
README.md CHANGED
@@ -20,6 +20,27 @@ model-index:
20
  - name: NER F Score
21
  type: f_score
22
  value: 0.8873563218
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
  | Feature | Description |
25
  | --- | --- |
@@ -30,7 +51,7 @@ model-index:
30
  | **Components** | `transformer`, `ner` |
31
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
32
  | **Sources** | n/a |
33
- | **License** | n/a |
34
  | **Author** | [n/a]() |
35
 
36
  ### Label Scheme
@@ -52,5 +73,17 @@ model-index:
52
  | `ENTS_F` | 88.74 |
53
  | `ENTS_P` | 85.40 |
54
  | `ENTS_R` | 92.34 |
55
- | `TRANSFORMER_LOSS` | 41130.92 |
56
- | `NER_LOSS` | 43689.79 |
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - name: NER F Score
21
  type: f_score
22
  value: 0.8873563218
23
+ widget:
24
+ - text: "On m'a attrapé par la main !"
25
+ example_title: "on quelqu'un"
26
+ - text: "En France, on parle français."
27
+ example_title: "on générique"
28
+ - text: "On est allé manger des glaces puis on est allé à la plage."
29
+ example_title: "on nous"
30
+
31
+ license: agpl-3.0
32
+ ---
33
+
34
+ ## Description
35
+
36
+ This model was built to compute detect diffferent value of *on* in French (them). It's main purpose was to automate annotation on a specific dataset.
37
+ There is no waranty that it will work on any others dataset.
38
+ We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
39
+ Some pronouns can have different meanings according to their context, the generic pronoun plays an important role in trauma narratives.
40
+ In our study, we differentiate the different values of the *on* pronoun. It can be used as *we*, for example: “On est entré au Bataclan à 20h45” ("We entered the Bataclan at 8:45 pm").
41
+ But it can also be used as a synonym for someone: “On m’a marché dessus” (“Someone stepped on me").
42
+ Finally, it can be used generically: “on est jamais mieux servi que par que par soi même” ("you are never better served than by yourself".)
43
+
44
  ---
45
  | Feature | Description |
46
  | --- | --- |
 
51
  | **Components** | `transformer`, `ner` |
52
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
53
  | **Sources** | n/a |
54
+ | **License** | agpl-3.0 |
55
  | **Author** | [n/a]() |
56
 
57
  ### Label Scheme
 
73
  | `ENTS_F` | 88.74 |
74
  | `ENTS_P` | 85.40 |
75
  | `ENTS_R` | 92.34 |
76
+
77
+
78
+ ### training
79
+
80
+ We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
81
+ The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
82
+ In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
83
+
84
+ | label | train | test | valid |
85
+ | --- | --- |--- |--- |
86
+ | `ON_GENERIQUE`| 189 | 57 | 49 |
87
+ | `ON_NOUS`| 1006 | 320 | 229 |
88
+ | `ON_QUELQU_UN`|90 | 42 | 19|
89
+