Dr. Jorge Abreu Vicente commited on
Commit
f8f191b
·
1 Parent(s): 307014a

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -41
README.md CHANGED
@@ -19,53 +19,23 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  This model was trained from scratch on the source_data_nlp dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 0.0064
23
- - Accuracy Score: 0.9982
24
- - Precision: 0.9689
25
- - Recall: 0.9905
26
- - F1: 0.9795
27
 
28
  ## Model description
29
 
30
- It separates figure captions into their constituent panels.
31
 
32
  ## Intended uses & limitations
33
 
34
- To be used to separate figure captions on its constituent panels.
35
-
36
- It will not load with the default HuggingFace pipelines. Must be load in the following way,
37
- after installing the [soda-roberta](https://github.com/source-data/soda-roberta) library:
38
-
39
- ```python
40
- from smtag.excell_roberta.modeling_excell_roberta import EXcellRobertaForTokenClassification
41
- from transformers import AutoTokenizer
42
- from datasets import load_dataset
43
-
44
- ds = load_dataset("EMBO/sd-nlp-non-tokenized","PANELIZATION")
45
- SENTENCE = """Figure 2A. HEK293T cells were transfected with MYC-FOXP3 and FLAG-USP44 encoding expression constructs using Polyethylenimine. 48hrs post-transfection, cells were harvested, lysed, and anti-FLAG or anti-MYC antibody coated beads were used to immunoprecipitate the given labeled protein along with its bi\nnding partner. Co-IP' ed proteins were subjected to SDS
46
- PAGE followed by immunoblot analysis. Antibodies recognizing FLAG or MYC tags were used to probe for USP44 and FOXP3, respectively. B. Endogenous co-IP of USP44 and FOXP3 in murine iTregs. iTregs were generated as in Fig. 1 from naïve CD4+T cells FACS isolated from pooled suspensions of the lymph node and\n spleen cells of wild type C57BL/6 mice (n = 2-3 / exp
47
- eriment). iTregs were lysed and key proteins were immunoprecipitated using either anti-USP44 (right panel) or anti-FOXP3 (left panel) antibody. Proteins pulled-down in this experiment were then resolved and analyzed by immunoblot using anti-FOXP3 or anti-USP44 antibodies. C. Endogenous co-IP of USP44 and FO\nXP3 in murine nTregs. nTregs (CD4+CD25high) isolated
48
- by FACS were activated by anti-CD3 and anti-CD28 (1 and 4 ug/ml, respectively) overnight in the presence of IL-2 (100 U/ml). The cells were lysed and proteins were immunoprecipitated using either anti-Foxp3 (left panel) or anti-Usp44 (right panel). Proteins pulled down in this experiment were then resolved a\nnd identified with the indicated antibodies. D . N
49
- aïve murine CD4+T cells were isolated by FACS from lymph node and spleen cell suspension of USP44fl/fl CD4Cre+ mice and that of their wild type littermates (USP44fl/fl CD4Cre-mice; n = 2-3 / group / experiment) . iTreg cells were generated from these mice as described for Fig. 1 before incubation on a microscop\ne slide pre-coated with poly-L lysine for 1h. Ad
50
- hered cells were then fixed by PFA for 0.5 followed by blocking with 1% BSA for 1h, then incubation with the specified antibodies. Representative confocal microscopy images (40X) were visualized for endogenous USP44 (red) and FOXP3 Baxter et al (). DAPI was used to visualize cell nuclei (blue); scale bar 50μm."""
51
-
52
-
53
- model = EXcellRobertaForTokenClassification.from_pretrained("EMBO/sd-panelization-v2")
54
- tokenizer = AutoTokenizer.from_pretrained("EMBO/sd-panelization-v2", is_pretokenized=False, add_prefix_space=True)
55
-
56
- outputs = model(**tokenizer(SENTENCE, return_tensors="pt"))
57
-
58
- logits = outputs[0].cpu() # B x L H
59
- proba = logits.softmax(-1) # B x L x H
60
- labels = logits.argmax(-1) # B x L
61
-
62
- for label, token in zip(labels[0], tokenizer(SENTENCE, return_tensors="pt")["input_ids"][0]):
63
- print(f"{model.id2label.get(label.item())}\t{tokenizer.decode(token)}")
64
- ```
65
 
66
  ## Training and evaluation data
67
 
68
- Trained in in the [SourceData](https://huggingface.co/datasets/EMBO/sd-nlp-non-tokenized) dataset.
69
 
70
  ## Training procedure
71
 
@@ -78,14 +48,13 @@ The following hyperparameters were used during training:
78
  - seed: 42
79
  - optimizer: Adafactor
80
  - lr_scheduler_type: linear
81
- - num_epochs: 2.0
82
 
83
  ### Training results
84
 
85
  | Training Loss | Epoch | Step | Validation Loss | Accuracy Score | Precision | Recall | F1 |
86
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:---------:|:------:|:------:|
87
- | 0.0074 | 1.0 | 216 | 0.0085 | 0.9977 | 0.9670 | 0.9785 | 0.9727 |
88
- | 0.0049 | 2.0 | 432 | 0.0064 | 0.9982 | 0.9689 | 0.9905 | 0.9795 |
89
 
90
 
91
  ### Framework versions
 
19
 
20
  This model was trained from scratch on the source_data_nlp dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.0118
23
+ - Accuracy Score: 0.9970
24
+ - Precision: 0.9524
25
+ - Recall: 0.9865
26
+ - F1: 0.9691
27
 
28
  ## Model description
29
 
30
+ More information needed
31
 
32
  ## Intended uses & limitations
33
 
34
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Training and evaluation data
37
 
38
+ More information needed
39
 
40
  ## Training procedure
41
 
 
48
  - seed: 42
49
  - optimizer: Adafactor
50
  - lr_scheduler_type: linear
51
+ - num_epochs: 1.0
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss | Accuracy Score | Precision | Recall | F1 |
56
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:---------:|:------:|:------:|
57
+ | 0.0078 | 1.0 | 216 | 0.0118 | 0.9970 | 0.9524 | 0.9865 | 0.9691 |
 
58
 
59
 
60
  ### Framework versions