binbin83
/

fr_on_value

Token Classification

Eval Results (legacy)

Model card Files Files and versions

binbin83 commited on Oct 5, 2023

Commit

5e4cd86

·

1 Parent(s): cd043d6

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -20,6 +20,27 @@ model-index:
     - name: NER F Score
       type: f_score
       value: 0.8873563218
 ---
 | Feature | Description |
 | --- | --- |
@@ -30,7 +51,7 @@ model-index:
 | **Components** | `transformer`, `ner` |
 | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 | **Sources** | n/a |
-| **License** | n/a |
 | **Author** | [n/a]() |
 ### Label Scheme
@@ -52,5 +73,17 @@ model-index:
 | `ENTS_F` | 88.74 |
 | `ENTS_P` | 85.40 |
 | `ENTS_R` | 92.34 |
-| `TRANSFORMER_LOSS` | 41130.92 |
-| `NER_LOSS` | 43689.79 |

     - name: NER F Score
       type: f_score
       value: 0.8873563218
+widget:
+- text: "On m'a attrapé par la main !"
+  example_title: "on quelqu'un"
+- text: "En France, on parle français."
+  example_title: "on générique"
+- text: "On est allé manger des glaces puis on est allé à la plage."
+  example_title: "on nous"
+license: agpl-3.0
+---
+## Description
+This model was built to compute detect diffferent value of *on* in French (them). It's main purpose was to automate annotation on a specific dataset.
+There is no waranty that it  will work on any others dataset.
+We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
+Some pronouns can have different meanings according to their context, the generic pronoun plays an important role in trauma narratives.
+In our study, we differentiate the different values of the *on* pronoun. It can be used as *we*, for example: “On est entré au Bataclan à 20h45” ("We entered the Bataclan at 8:45 pm").
+But it can also be used as a synonym for someone: “On m’a marché dessus” (“Someone stepped on me").
+Finally, it can be used generically: “on est jamais mieux servi que par que par soi même” ("you are never better served than by yourself".)
 ---
 | Feature | Description |
 | --- | --- |
 | **Components** | `transformer`, `ner` |
 | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 | **Sources** | n/a |
+| **License** | agpl-3.0 |
 | **Author** | [n/a]() |
 ### Label Scheme
 | `ENTS_F` | 88.74 |
 | `ENTS_P` | 85.40 |
 | `ENTS_R` | 92.34 |
+### training
+We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
+The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
+In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
+| label | train | test | valid |
+| --- | --- |--- |--- |
+| `ON_GENERIQUE`| 189 | 57 | 49 |
+| `ON_NOUS`| 1006 | 320 | 229 |
+| `ON_QUELQU_UN`|90 | 42 | 19|