HCSCRheuma
/

Occupations

Token Classification

Spanish

Model card Files Files and versions

xet

Community

HCSCRheuma commited on Aug 1, 2023

Commit

84165a6

1 Parent(s): 43cad7d

Update README.md

Browse files

Files changed (1) hide show

README.md +101 -3

README.md CHANGED Viewed

@@ -58,7 +58,7 @@ Lima-López, S., Farré-Maduell, E., Miranda-Escalada, A., Brivá-Iglesias, V.,
 - **Developed by:** Alfredo Madrid
 - **Language(s) (NLP):** Spanish
-- **License:** CC4.0
 - **Finetuned from model [optional]:** PlanTL-GOB-ES/roberta-base-biomedical-es
 ### Model Sources
@@ -70,7 +70,105 @@ Lima-López, S., Farré-Maduell, E., Miranda-Escalada, A., Brivá-Iglesias, V.,
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 [More Information Needed]

 - **Developed by:** Alfredo Madrid
 - **Language(s) (NLP):** Spanish
+- **License:** CC BY-SA 4.0
 - **Finetuned from model [optional]:** PlanTL-GOB-ES/roberta-base-biomedical-es
 ### Model Sources
 ## Uses
+**Model 1**
+```
+import torch
+import pandas as pd
+import numpy as np
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+model = AutoModelForTokenClassification.from_pretrained("MEDDO_FINAL_ROBERTA_ner_sentencia_510_8_10_2e-05_1e-08")
+tokenizer = AutoTokenizer.from_pretrained("MEDDO_FINAL_ROBERTA_ner_sentencia_510_8_10_2e-05_1e-08")
+```
+```
+note = "El paciente trabaja en una empresa de construccion los jueves"
+tokenized_sentence = tokenizer.encode(note, truncation=True)
+tokenized_words_ids = tokenizer(note, truncation=True)
+word_ids = tokenized_words_ids.word_ids
+input_ids = torch.tensor([tokenized_sentence])
+with torch.no_grad():
+    output = model(input_ids)
+label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
+tokens = tokenizer.convert_ids_to_tokens(input_ids.numpy()[0])
+label_indices
+```
+```
+df = pd.DataFrame(zip(tokens, label_indices[0], word_ids(0)), columns=["labels", "tokens", "relation"])
+df['labels'] = df['labels'].str.replace('##', '')
+df['tokens'] = df['tokens'].map({0: 'B-PROFESION', 1: 'B-SITUACION_LABORAL', 2: 'I-SITUACION_LABORAL', 3: 'I-ACTIVIDAD', 4: 'I-PROFESION', 5: 'O', 6: 'B-ACTIVIDAD', 7: 'PAD'})
+df = df[1:-1]
+df['relation'] = df['relation'].astype('int')
+df['labels'] = df.groupby('relation')['labels'].transform(lambda x: ''.join(x))
+df = df.groupby('relation').first()
+df
+```
+**Output**
+| relation |     labels    |    tokens   |
+|:--------:|:-------------:|:-----------:|
+|     0    |      ĠEl      |      O      |
+|     1    |   Ġpaciente   |      O      |
+|     2    |    Ġtrabaja   | B-PROFESION |
+|     3    |      Ġen      | I-PROFESION |
+|     4    |      Ġuna     | I-PROFESION |
+|     5    |    Ġempresa   | I-PROFESION |
+|     6    |      Ġde      | I-PROFESION |
+|     7    | Ġconstruccion | I-PROFESION |
+|     8    |      Ġlos     |      O      |
+|     9    |    Ġjueves    |      O      |
+**Model 2**
+```
+import torch
+import pandas as pd
+import numpy as np
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+model = AutoModelForTokenClassification.from_pretrained("MEDDO_FINAL_ROBERTA_class_sentencia_510_8_10_2e-05_1e-08")
+tokenizer = AutoTokenizer.from_pretrained("MEDDO_FINAL_ROBERTA_class_sentencia_510_8_10_2e-05_1e-08")
+```
+```
+note = "El paciente trabaja en una empresa de construccion los jueves"
+tokenized_sentence = tokenizer.encode(note, truncation=True)
+tokenized_words_ids = tokenizer(note, truncation=True)
+word_ids = tokenized_words_ids.word_ids
+input_ids = torch.tensor([tokenized_sentence])
+with torch.no_grad():
+    output = model(input_ids)
+label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
+tokens = tokenizer.convert_ids_to_tokens(input_ids.to('cpu').numpy()[0])
+label_indices
+```
+```
+df = pd.DataFrame(zip(tokens, label_indices[0], word_ids(0)), columns=["labels", "tokens", "relation"])
+df['labels'] = df['labels'].str.replace('##', '')
+df['tokens'] = df['tokens'].map({0: 'B-FAMILIAR', 1: 'I-PACIENTE', 2: 'I-OTROS', 3: 'B-SANITARIO', 4: 'B-PACIENTE', 5: 'I-FAMILIAR', 6: 'O', 7: 'B-OTROS', 8: 'I-SANITARIO', 9: 'PAD'}
+)
+df = df[1:-1]
+df['relation'] = df['relation'].astype('int')
+df['labels'] = df.groupby('relation')['labels'].transform(lambda x: ''.join(x))
+df = df.groupby('relation').first()
+df
+```
+**Output**
+| relation |     labels    |    tokens   |
+|:--------:|:-------------:|:-----------:|
+|     0    |      ĠEl      |      O      |
+|     1    |   Ġpaciente   |      O      |
+|     2    |    Ġtrabaja   | B-PACIENTE |
+|     3    |      Ġen      | I-PACIENTE |
+|     4    |      Ġuna     | I-PACIENTE |
+|     5    |    Ġempresa   | I-PACIENTE |
+|     6    |      Ġde      | I-PACIENTE |
+|     7    | Ġconstruccion | I-PACIENTE |
+|     8    |      Ġlos     |      O      |
+|     9    |    Ġjueves    |      O      |
 [More Information Needed]