This model was built to compute detect the lexical field of body, physical sensation and perception.
It's main purpose was to automate annotation on a specific dataset.
There is no waranty that it will work on any others dataset.
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
| Feature |
Description |
| Name |
fr_sensations_and_body |
| Version |
0.0.1 |
| spaCy |
>=3.4.4,<3.5.0 |
| Default Pipeline |
transformer, ner |
| Components |
transformer, ner |
| Vectors |
0 keys, 0 unique vectors (0 dimensions) |
| Sources |
n/a |
| License |
n/a |
| Author |
n/a |
Label Scheme
View label scheme (4 labels for 1 components)
| Component |
Labels |
ner |
CORPS, MOTS_PERCEPTIONS_SENSORIELLES, SENSATIONS_PHYSIQUES, VERB_PERCEPTIONS_SENSORIELLES |
Accuracy
| Type |
Score |
ENTS_F |
85.46 |
ENTS_P |
85.37 |
ENTS_R |
85.56 |
Training
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
| label |
train |
test |
valid |
CORPS |
523 |
152 |
106 |
MOTS_PERCEPTIONS_SENSORIELLES |
250 |
108 |
82 |
SENSATIONS_PHYSIQUES |
91 |
38 |
31 |
VERB_PERCEPTIONS_SENSORIELLES |
617 |
162 |
137 |