File size: 3,210 Bytes
f425222
 
bbb26ca
 
 
 
 
1dec4a7
00be093
bbb26ca
f425222
bbb26ca
 
 
 
 
 
 
 
 
 
f425222
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50fc0c3
98f4fb4
8961115
98f4fb4
 
 
d0bb4a3
98f4fb4
 
f425222
50fc0c3
 
f425222
 
1b2deb2
5eb88d5
 
 
1b2deb2
5eb88d5
75e17ca
 
f519434
75e17ca
f519434
1b2deb2
 
f425222
 
 
5eb88d5
f425222
 
 
98f4fb4
f425222
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0

widget:
- text: "Ayer dormí la siesta durante 3 horas"
- text: "Recuerda tu cita con el médico el lunes a las 8 de la tarde"
- text: "Recuerda tomar la medicación cada noche"
- text: "Last day I slept for three hours"
- text: "Remember your doctor´s appointment on Monday at 6am"

tags:
- LABEL-0 = NONE  
- LABEL-1 = B-DATE  
- LABEL-2 = I-DATE
- LABEL-3 = B-TIME
- LABEL-4 = I-TIME
- LABEL-5 = B-DURATION
- LABEL-6 = I-DURATION
- LABEL-7 = B-SET  
- LABEL-8 = I-SET

metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: Bio-RoBERTime
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Bio-RoBERTime

This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-biomedical-clinical-es](https://huggingface.co/PlanTL-GOB-ES/roberta-base-biomedical-clinical-es) on the [E3C](https://github.com/hltfbk/E3C-Corpus) and Timebank datasets.

It achieves the following results on the [E3C corpus](https://github.com/hltfbk/E3C-Corpus) test set following the TempEval-3 evaluation metrics:

| E3C        | Strict  | Relaxed | type   |
|------------|:-------:|--------:|-------:|
| RoBERTime  |  **0.7606** |  **0.9108** | **0.8357** |
| Heideltime |  0.5945 |  0.7558 | 0.6083 |
| Annotador  |  0.6006 |  0.7347 | 0.5598 |

RoBERTime is a token classification model, it labels each token into one of the 9 posible labels. We follow the BIO label schema, so each class has two posible values: Begining or Interior. For more Details on the implementation and evaluation refer to the paper: ["RoBERTime: A novel model for the detection of temporal expressions in Spanish" ](https://rua.ua.es/dspace/handle/10045/133235)

## Model description

- **Developed by**: Alejandro Sánchez de Castro, Juan Martínez Romo, Lourdes Araujo

This model is the result of the paper "RoBERTime: A novel model for the detection of temporal expressions in Spanish"

- **Cite as**:

      @article{sanchez2023robertime,
        title={RoBERTime: A novel model for the detection of temporal expressions in Spanish},
        author={Sánchez-de-Castro-Fernández, Alejandro and Araujo Serna, Lourdes and Martínez Romo, Juan},
        year={2023},
        publisher={Sociedad Española para el Procesamiento del Lenguaje Natural}
      }


## Intended uses & limitations

This model is prepared for the detection of temporal expressions extension in Spanish. It may work in other languages due to RoBERTa multilingual capabilities. This model does not normalize the expression value. This is considered to be a separate task.

## Training and evaluation data

This model has been trained on the Spanish Timebank corpus and E3C corpus

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 72
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 24

### Framework versions

- Transformers 4.24.0
- Pytorch 1.12.1+cu113
- Datasets 2.7.0
- Tokenizers 0.13.2