EMBO
/

sd-panelization-v2

@@ -19,53 +19,23 @@ should probably proofread and complete it, then remove this comment. -->
 This model was trained from scratch on the source_data_nlp dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0064
-- Accuracy Score: 0.9982
-- Precision: 0.9689
-- Recall: 0.9905
-- F1: 0.9795
 ## Model description
-It separates figure captions into their constituent panels.
 ## Intended uses & limitations
-To be used to separate figure captions on its constituent panels.
-It will not load with the default HuggingFace pipelines. Must be load in the following way,
-after installing the [soda-roberta](https://github.com/source-data/soda-roberta) library:
-```python
-from smtag.excell_roberta.modeling_excell_roberta import EXcellRobertaForTokenClassification
-from transformers import AutoTokenizer
-from datasets import load_dataset
-ds = load_dataset("EMBO/sd-nlp-non-tokenized","PANELIZATION")
-SENTENCE = """Figure 2A. HEK293T cells were transfected with MYC-FOXP3 and FLAG-USP44 encoding expression constructs using Polyethylenimine. 48hrs post-transfection, cells were harvested, lysed, and anti-FLAG or anti-MYC antibody coated beads were used to immunoprecipitate the given labeled protein along with its bi\nnding partner. Co-IP' ed proteins were subjected to SDS
-PAGE followed by immunoblot analysis. Antibodies recognizing FLAG or MYC tags were used to probe for USP44 and FOXP3, respectively. B. Endogenous co-IP of USP44 and FOXP3 in murine iTregs. iTregs were generated as in Fig. 1 from naïve CD4+T cells FACS isolated from pooled suspensions of the lymph node and\n spleen cells of wild type C57BL/6 mice (n = 2-3 / exp
-eriment). iTregs were lysed and key proteins were immunoprecipitated using either anti-USP44 (right panel) or anti-FOXP3 (left panel) antibody. Proteins pulled-down in this experiment were then resolved and analyzed by immunoblot using anti-FOXP3 or anti-USP44 antibodies. C. Endogenous co-IP of USP44 and FO\nXP3 in murine nTregs. nTregs (CD4+CD25high) isolated
- by FACS were activated by anti-CD3 and anti-CD28 (1 and 4 ug/ml, respectively) overnight in the presence of IL-2 (100 U/ml). The cells were lysed and proteins were immunoprecipitated using either anti-Foxp3 (left panel) or anti-Usp44 (right panel). Proteins pulled down in this experiment were then resolved a\nnd identified with the indicated antibodies. D . N
-aïve murine CD4+T cells were isolated by FACS from lymph node and spleen cell suspension of USP44fl/fl CD4Cre+ mice and that of their wild type littermates (USP44fl/fl CD4Cre-mice; n = 2-3 / group / experiment) . iTreg cells were generated from these mice as described for Fig. 1 before incubation on a microscop\ne slide pre-coated with poly-L lysine for 1h. Ad
-hered cells were then fixed by PFA for 0.5 followed by blocking with 1% BSA for 1h, then incubation with the specified antibodies. Representative confocal microscopy images (40X) were visualized for endogenous USP44 (red) and FOXP3 Baxter et al (). DAPI was used to visualize cell nuclei (blue); scale bar 50μm."""
-model = EXcellRobertaForTokenClassification.from_pretrained("EMBO/sd-panelization-v2")
-tokenizer = AutoTokenizer.from_pretrained("EMBO/sd-panelization-v2", is_pretokenized=False, add_prefix_space=True)
-outputs = model(**tokenizer(SENTENCE, return_tensors="pt"))
-logits = outputs[0].cpu()  # B x L H
-proba = logits.softmax(-1)  # B x L x H
-labels = logits.argmax(-1)  # B x L
-for label, token in zip(labels[0], tokenizer(SENTENCE, return_tensors="pt")["input_ids"][0]):
-    print(f"{model.id2label.get(label.item())}\t{tokenizer.decode(token)}")
-```
 ## Training and evaluation data
-Trained in in the [SourceData](https://huggingface.co/datasets/EMBO/sd-nlp-non-tokenized) dataset.
 ## Training procedure
@@ -78,14 +48,13 @@ The following hyperparameters were used during training:
 - seed: 42
 - optimizer: Adafactor
 - lr_scheduler_type: linear
-- num_epochs: 2.0
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy Score | Precision | Recall | F1     |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:---------:|:------:|:------:|
-| 0.0074        | 1.0   | 216  | 0.0085          | 0.9977         | 0.9670    | 0.9785 | 0.9727 |
-| 0.0049        | 2.0   | 432  | 0.0064          | 0.9982         | 0.9689    | 0.9905 | 0.9795 |
 ### Framework versions

 This model was trained from scratch on the source_data_nlp dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0118
+- Accuracy Score: 0.9970
+- Precision: 0.9524
+- Recall: 0.9865
+- F1: 0.9691
 ## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 - seed: 42
 - optimizer: Adafactor
 - lr_scheduler_type: linear
+- num_epochs: 1.0
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Accuracy Score | Precision | Recall | F1     |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:---------:|:------:|:------:|
+| 0.0078        | 1.0   | 216  | 0.0118          | 0.9970         | 0.9524    | 0.9865 | 0.9691 |
 ### Framework versions