mrovera
/

eventnet-ita

Italian

Frame Parsing

Event Extraction

Model card Files Files and versions

xet

Community

mrovera commited on Feb 9, 2024

Commit

12c487d

verified ·

1 Parent(s): 530b7a1

Updated README

Browse files

Files changed (1) hide show

README.md +107 -13

README.md CHANGED Viewed

@@ -2,46 +2,140 @@
 license: agpl-3.0
 language:
 - it
 ---
 # EventNet-ITA
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-### Model Sources
-<!-- Provide the basic links for the model. -->
-## Uses
-### Direct Use
-Multi-label text classification of Italian legislative acts.
-## Training Details
 ### Training Data
 ## Evaluation
-### Results
-## Citation
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-```
 ```

 license: agpl-3.0
 language:
 - it
+task_categories:
+- token-classification
+datasets:
+- mrovera/eventnet-ita
+tags:
+- Frame Parsing
+- Event Extraction
 ---
 # EventNet-ITA
+The model is a full-text frame parser for events in Italian and it has been trained on [EventNet-ITA](https://huggingface.co/datasets/mrovera/eventnet-ita).
+The model can be used for _full-text_ Frame Parsing and Event Extraction.
 ## Model Details
 ### Model Description
+In its current version, EventNet-ITA is able to recognize and classifiy 205 semantic frames and their (specific) frame elements. The unit of analysis is the sentence.
+### Direct Use
+Provided with an input sequence of tokens, the model labels each token with the corresponding frame and/or frame element label(s).
+```
+La				B-ENTITY*BEING_LOCATED|B-THEME*CONQUERING
+cittadina		I-ENTITY*BEING_LOCATED|I-THEME*CONQUERING
+,				O
+posta			B-BEING_LOCATED
+a				B-RELATIVE_LOCATION*BEING_LOCATED
+est				I-RELATIVE_LOCATION*BEING_LOCATED
+del				I-RELATIVE_LOCATION*BEING_LOCATED
+corso			I-RELATIVE_LOCATION*BEING_LOCATED
+d'				I-RELATIVE_LOCATION*BEING_LOCATED
+acqua			I-RELATIVE_LOCATION*BEING_LOCATED
+,				O
+venne			O
+conquistata		B-CONQUERING
+,				O
+ma				O
+il				B-EXPLOSIVE*DETONATE_EXPLOSIVE
+ponte			I-EXPLOSIVE*DETONATE_EXPLOSIVE
+sul				I-EXPLOSIVE*DETONATE_EXPLOSIVE
+fiume			I-EXPLOSIVE*DETONATE_EXPLOSIVE
+era				O
+già				O
+stato			O
+fatto			B-DETONATE_EXPLOSIVE
+saltare			I-DETONATE_EXPLOSIVE
+regolarmente	    O
+dai				B-AGENT*DETONATE_EXPLOSIVE
+genieri			I-AGENT*DETONATE_EXPLOSIVE
+francesi		I-AGENT*DETONATE_EXPLOSIVE
+.				O
+```
+## Training Details
+The model has been trained using [MaChAmp](https://github.com/machamp-nlp/machamp), a Python tookit supporting a variety of NLP tasks, by fine-tuning [this Italian BERT pretrained model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased).
+Training hyperparameters:
+- Batch size: 64
+- Learning rate: 1.5e-3
+All other hyperparameters have been left unchanged w.r.t. the default MaChAmp configuration for the multi-sequential token classification task.
 ### Training Data
+Please refer to the [dataset repo](https://huggingface.co/datasets/mrovera/eventnet-ita).
+### Model Re-training
+In order to re-train the model, download the [dataset](https://huggingface.co/datasets/mrovera/eventnet-ita) and follow the instructions for training a [multiseq task](https://github.com/machamp-nlp/machamp/blob/master/docs/multiseq.md) in MaChAmp.
+### Inference
+EventNet-ITA's model can be used for Frame Parsing on new texts.
+In order to do so, you have to follow a few simple steps.
+1. Clone the github repo: `git clone https://github.com/machamp-nlp/machamp.git`
+2. Download EventNet-ITA's model from this repo (450 MB) and move it into the `machamp` folder (where is up to you, by default MaChAmp saves trained models in the logs folder)
+3. Save the data you want to use for prediction in a two-column tsv file, one word per line, with a placeholder in column 1, each sentence separated by a blank line (without placeholder), like this:
+```
+This	_
+is	_
+the	_
+first	_
+sentence	_
+.	_
+This	_
+is	_
+the	_
+second	_
+one	_
+.	_
+```
+4. Follow the instruction for predicting with [MaChAmp](https://github.com/machamp-nlp/machamp) (see section "Prediction") using a fine-tuned model.
 ## Evaluation
+The model has been evaluated on three folds, each time with a stratified split of the dataset, with a 80/10/10 train/dev/test ratio. Please see the paper for further details. Hereafter we report the synthetic values obtained by averaging the Precision, Recall and F1-score values of the three splits.
+**Token-based** (**_relaxed_**) performance:
+|                            |    P   |    R    |   F1    |
+|----------------------------|--------|---------|---------|
+|Frames                      |  0.904 |  0.914  |  **0.907**  |
+|Frames (weighted)           |  0.909 |  0.919  |  0.913  |
+|Frame Elements              |  0.841 |  0.724  |  **0.761**  |
+|Frames Elements (weighted)  |  0.850 |  0.779  |  0.804  |
+**Span-based** (**_strict_**) performance:
+|                            |    P   |    R    |   F1   |
+|----------------------------|--------|---------|--------|
+|Frames                      |  0.906 |  0.899  |  **0.901** |
+|Frames (weighted)           |  0.909 |  0.903  |  0.905 |
+|Frame Elements              |  0.829 |  0.666  |  **0.724** |
+|Frames Elements (weighted)  |  0.853 |  0.711  |  0.768 |
+### Citation Information
+If you use EventNet-ITA, please cite the following paper:
+```
+@article{rovera2023eventnet,
+  title={EventNet-ITA: Italian Frame Parsing for Events},
+  author={Rovera, Marco},
+  journal={arXiv preprint arXiv:2305.10892},
+  year={2023}
+}
 ```