Update README.md
Browse files
README.md
CHANGED
|
@@ -4,9 +4,10 @@ tags:
|
|
| 4 |
- token-classification
|
| 5 |
language:
|
| 6 |
- es
|
|
|
|
| 7 |
license: mit
|
| 8 |
model-index:
|
| 9 |
-
- name:
|
| 10 |
results:
|
| 11 |
- task:
|
| 12 |
name: NER
|
|
@@ -21,54 +22,26 @@ model-index:
|
|
| 21 |
- name: NER F Score
|
| 22 |
type: f_score
|
| 23 |
value: 0.6911764706
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
type: token-classification
|
| 27 |
-
metrics:
|
| 28 |
-
- name: POS (UPOS) Accuracy
|
| 29 |
-
type: accuracy
|
| 30 |
-
value: 0.0
|
| 31 |
-
- task:
|
| 32 |
-
name: MORPH
|
| 33 |
-
type: token-classification
|
| 34 |
-
metrics:
|
| 35 |
-
- name: Morph (UFeats) Accuracy
|
| 36 |
-
type: accuracy
|
| 37 |
-
value: 0.0
|
| 38 |
-
- task:
|
| 39 |
-
name: LEMMA
|
| 40 |
-
type: token-classification
|
| 41 |
-
metrics:
|
| 42 |
-
- name: Lemma Accuracy
|
| 43 |
-
type: accuracy
|
| 44 |
-
value: 0.0
|
| 45 |
-
- task:
|
| 46 |
-
name: UNLABELED_DEPENDENCIES
|
| 47 |
-
type: token-classification
|
| 48 |
-
metrics:
|
| 49 |
-
- name: Unlabeled Attachment Score (UAS)
|
| 50 |
-
type: f_score
|
| 51 |
-
value: 0.0
|
| 52 |
-
- task:
|
| 53 |
-
name: LABELED_DEPENDENCIES
|
| 54 |
-
type: token-classification
|
| 55 |
-
metrics:
|
| 56 |
-
- name: Labeled Attachment Score (LAS)
|
| 57 |
-
type: f_score
|
| 58 |
-
value: 0.0
|
| 59 |
-
- task:
|
| 60 |
-
name: SENTS
|
| 61 |
-
type: token-classification
|
| 62 |
-
metrics:
|
| 63 |
-
- name: Sentences F-Score
|
| 64 |
-
type: f_score
|
| 65 |
-
value: 0.0
|
| 66 |
---
|
| 67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
| Feature | Description |
|
| 70 |
| --- | --- |
|
| 71 |
-
| **Name** | `
|
| 72 |
| **Version** | `1.0.0` |
|
| 73 |
| **spaCy** | `>=3.2.3,<4.0.0` |
|
| 74 |
| **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
|
|
@@ -96,18 +69,7 @@ This is a Spacy multilingual anonimization model, for use with BSC's Anonymizati
|
|
| 96 |
|
| 97 |
| Type | Score |
|
| 98 |
| --- | --- |
|
| 99 |
-
| `POS_ACC` | 0.00 |
|
| 100 |
-
| `MORPH_ACC` | 0.00 |
|
| 101 |
-
| `MORPH_PER_FEAT` | 0.00 |
|
| 102 |
-
| `DEP_UAS` | 0.00 |
|
| 103 |
-
| `DEP_LAS` | 0.00 |
|
| 104 |
-
| `DEP_LAS_PER_TYPE` | 0.00 |
|
| 105 |
-
| `SENTS_P` | 0.00 |
|
| 106 |
-
| `SENTS_R` | 0.00 |
|
| 107 |
-
| `SENTS_F` | 0.00 |
|
| 108 |
-
| `LEMMA_ACC` | 0.00 |
|
| 109 |
| `ENTS_F` | 69.12 |
|
| 110 |
| `ENTS_P` | 74.60 |
|
| 111 |
| `ENTS_R` | 64.38 |
|
| 112 |
-
| `
|
| 113 |
-
| `NER_LOSS` | 26573.78 |
|
|
|
|
| 4 |
- token-classification
|
| 5 |
language:
|
| 6 |
- es
|
| 7 |
+
- ca
|
| 8 |
license: mit
|
| 9 |
model-index:
|
| 10 |
+
- name: ca_anonimization_core_lg
|
| 11 |
results:
|
| 12 |
- task:
|
| 13 |
name: NER
|
|
|
|
| 22 |
- name: NER F Score
|
| 23 |
type: f_score
|
| 24 |
value: 0.6911764706
|
| 25 |
+
widget:
|
| 26 |
+
- text: "La matrícula del coche es 8560 JXK y el nombre del propietario es Jon Permanyer Ugartemendia, DNI 362-69-58-6n. Tel: 628539864. Calle Pasteur 46 Bajos, 08024 Barcelona"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
---
|
| 28 |
+
|
| 29 |
+
This is a Spacy multilingual (Catalan & Spanish) anonimization model, for use with BSC's AnonymizationPipeline at:
|
| 30 |
+
|
| 31 |
+
https://github.com/TeMU-BSC/AnonymizationPipeline.
|
| 32 |
+
|
| 33 |
+
pip install https://huggingface.co/PlanTL-GOB-ES/es_anonimization_core_lg/resolve/main/es_anonimization_core_lg-any-py3-none-any.whl
|
| 34 |
+
|
| 35 |
+
The anonymization pipeline is a library for performing sensitive data identification and ultimately anonymization of the detected data in Spanish and Catalan user generated plain text.
|
| 36 |
+
|
| 37 |
+
This is not a standalone model and is meant to work within the pipeline.
|
| 38 |
+
|
| 39 |
+
The model can detect the following entities: `EMAIL`, `FINANCIAL`, `ID`, `LOC`, `MISC`, `ORG`, `PER`, `TELEPHONE`, `VEHICLE`, `ZIP`
|
| 40 |
+
|
| 41 |
|
| 42 |
| Feature | Description |
|
| 43 |
| --- | --- |
|
| 44 |
+
| **Name** | `ca_anonimization_core_lg` |
|
| 45 |
| **Version** | `1.0.0` |
|
| 46 |
| **spaCy** | `>=3.2.3,<4.0.0` |
|
| 47 |
| **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
|
|
|
|
| 69 |
|
| 70 |
| Type | Score |
|
| 71 |
| --- | --- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
| `ENTS_F` | 69.12 |
|
| 73 |
| `ENTS_P` | 74.60 |
|
| 74 |
| `ENTS_R` | 64.38 |
|
| 75 |
+
| `NER_LOSS` | 26573.78 |
|
|
|