EMBO
/

sd-panelization

Token Classification

Model card Files Files and versions

tlemberger commited on Mar 27, 2022

Commit

6c42a83

·

1 Parent(s): 6bbac4f

so many typos

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ metrics:
 This model is a [RoBERTa base model](https://huggingface.co/roberta-base) that was further trained using a masked language modeling task on a compendium of english scientific textual examples from the life sciences using the [BioLang dataset](https://huggingface.co/datasets/EMBO/biolang). It was then fine-tuned for token classification on the SourceData [sd-figures](https://huggingface.co/datasets/EMBO/sd-figures) dataset with the `PANELIZATION` task to perform 'parsing' or 'segmentation' of figure legends into fragments corresponding to sub-panels.
-Figures are usually composite representations of results obtained with heterogenous experimental approaches and systems.  Breaking figures into  panels allows to identify more coherent descriptions of individual scientific experiments.
 ## Intended uses & limitations
@@ -44,15 +44,15 @@ The model must be used with the `roberta-base` tokenizer.
 ## Training data
-The model was trained for token classification using the [EMBO/sd-figures `PANELIZATION`](https://huggingface.co/datasets/EMBO/sd-panels) dataset wich includes manually annotated examples.
 ## Training procedure
-The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
 Training code is available at https://github.com/source-data/soda-roberta
-- Model fine-tuned: EMMBO/bio-lm
 - Tokenizer vocab size: 50265
 - Training data: EMBO/sd-figures
 - Dataset configuration: PANELIZATION

 This model is a [RoBERTa base model](https://huggingface.co/roberta-base) that was further trained using a masked language modeling task on a compendium of english scientific textual examples from the life sciences using the [BioLang dataset](https://huggingface.co/datasets/EMBO/biolang). It was then fine-tuned for token classification on the SourceData [sd-figures](https://huggingface.co/datasets/EMBO/sd-figures) dataset with the `PANELIZATION` task to perform 'parsing' or 'segmentation' of figure legends into fragments corresponding to sub-panels.
+Figures are usually composite representations of results obtained with heterogeneous experimental approaches and systems. Breaking figures into panels allows identifying more coherent descriptions of individual scientific experiments.
 ## Intended uses & limitations
 ## Training data
+The model was trained for token classification using the [`EMBO/sd-figures PANELIZATION`](https://huggingface.co/datasets/EMBO/sd-figures) dataset which includes manually annotated examples.
 ## Training procedure
+The training was run on an NVIDIA DGX Station with 4XTesla V100 GPUs.
 Training code is available at https://github.com/source-data/soda-roberta
+- Model fine-tuned: EMBO/bio-lm
 - Tokenizer vocab size: 50265
 - Training data: EMBO/sd-figures
 - Dataset configuration: PANELIZATION