Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,28 @@ model-index:
|
|
| 20 |
- name: NER F Score
|
| 21 |
type: f_score
|
| 22 |
value: 0.7862429256
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
---
|
| 24 |
| Feature | Description |
|
| 25 |
| --- | --- |
|
|
@@ -30,7 +52,7 @@ model-index:
|
|
| 30 |
| **Components** | `transformer`, `ner` |
|
| 31 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
| 32 |
| **Sources** | n/a |
|
| 33 |
-
| **License** |
|
| 34 |
| **Author** | [n/a]() |
|
| 35 |
|
| 36 |
### Label Scheme
|
|
@@ -52,5 +74,17 @@ model-index:
|
|
| 52 |
| `ENTS_F` | 78.62 |
|
| 53 |
| `ENTS_P` | 77.58 |
|
| 54 |
| `ENTS_R` | 79.70 |
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
- name: NER F Score
|
| 21 |
type: f_score
|
| 22 |
value: 0.7862429256
|
| 23 |
+
|
| 24 |
+
widget:
|
| 25 |
+
- text: "Le 2 décembre, c'est un vendredi, on avait un concert. On se retrouve avec des amis chez moi."
|
| 26 |
+
example_title: "present historique"
|
| 27 |
+
- text: "On danse toute la nuit et la vous vous dites qu c'est la meilleure manière de vivre."
|
| 28 |
+
example_title: "present génrique"
|
| 29 |
+
- text: "Je me souviens d'avoir vu un enfant danser sur le toît du monde !"
|
| 30 |
+
example_title: "présent ennonciation"
|
| 31 |
+
|
| 32 |
+
license: agpl-3.0
|
| 33 |
+
---
|
| 34 |
+
|
| 35 |
+
## Description
|
| 36 |
+
|
| 37 |
+
This model was built to compute detect diffferent value of *present tense* in French (them). It's main purpose was to automate annotation on a specific dataset.
|
| 38 |
+
There is no waranty that it will work on any others dataset.
|
| 39 |
+
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
|
| 40 |
+
Tthe present tense might have different meanings depending on the context. It can have a historical value, referring to the past, and it also makes the speech more alive.
|
| 41 |
+
Another meaning is generic, to express general truths like definitions or properties. Finally, it can have an enunciation value by referring to the present moment, to describe an ongoing action.
|
| 42 |
+
These different values of the present tense can only be differentiated by the context.
|
| 43 |
+
This is the reason why models based on contextual embedding (BERT like) should be relevant to differentiate them.
|
| 44 |
+
|
| 45 |
---
|
| 46 |
| Feature | Description |
|
| 47 |
| --- | --- |
|
|
|
|
| 52 |
| **Components** | `transformer`, `ner` |
|
| 53 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
| 54 |
| **Sources** | n/a |
|
| 55 |
+
| **License** | agpl-3.0 |
|
| 56 |
| **Author** | [n/a]() |
|
| 57 |
|
| 58 |
### Label Scheme
|
|
|
|
| 74 |
| `ENTS_F` | 78.62 |
|
| 75 |
| `ENTS_P` | 77.58 |
|
| 76 |
| `ENTS_R` | 79.70 |
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
### training
|
| 80 |
+
|
| 81 |
+
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
|
| 82 |
+
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
|
| 83 |
+
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
|
| 84 |
+
|
| 85 |
+
| label | train | test | valid |
|
| 86 |
+
| --- | --- |--- |--- |
|
| 87 |
+
| `PRESENT_ENNONCIATION`| 2069 | 673 | 438 |
|
| 88 |
+
| `PRESENT_GENERIQUE`| 704 | 177 | 147 |
|
| 89 |
+
| `PRESENT_HISTORIQUE`|1005 | 289 | 285|
|
| 90 |
+
|